Tag: algorithms

  • The Limits of Analysis Pt. 2

    The Limits of Analysis Pt. 2

    The Conditions of Analyzability

    In Part 1, I argued that not everything can be turned into data by means of analysis. Some experiences resist detection, some collapse under measurement, and others vanish once we try to replicate them. But when analysis is possible, it depends on certain hidden preconditions, which I’ll call the five conditions of analyzability. We will look at 3 here today, and 2 in the next article. Some, if not all, of these points might come off as truisms at first. But the point of this part of the series is to take a deeper look at the hidden assumptions that lie at the heart of analysis. These conditions serve as thresholds: they are not guarantees of insight, but rather the minimum requirements for analysis to even begin. Without them, statistics and machine learning alike produce empty formalisms, outputs untethered from reality.

    The first condition we will discuss is detectability. In short, the signal in question must exist, but not merely exist, it must exist in a context that gives off a detectable trace for us to observe. If there is no signal to detect, then there is no starting point for our analysis to take place. A perfect example from statistics for this condition is the concept of hypothesis testing in statistics. Hypothesis testing depends on being able to distinguish between random noise and a real effect. If the effect is completely buried in noise, inference collapses. And suppose we apply this to machine learning. In that case, we can see that with this viewpoint on this condition, the ability of a system to learn about a signal is dependent on the signal’s appearance in the distribution of data you’re analyzing. Back to the fraud example from part 1 of the series, we can see that fraud can exist, but by its nature, it tends to escape detection by deliberately avoiding creating a detectable signal.

    Beyond the detection of the signal, we need to be able to quantify the signal. In general, we can think of this condition as measurability, in terms analogous to the magnitude or strength of the signal. We must be able to structure what we detect into a form that can be compared, ordered, or aggregated. This is manifest in statistics with your choices of “measurement scale”. Not all numbers mean the same thing; the way something is measured determines what kinds of comparisons are valid. Going back to the customer satisfaction example from part 1 of this series, we can see how something like “on a scale of 1 to 5, how satisfied with your visit were you?” converts a complex feeling into a narrow range of integers. But when analysis is performed on the satisfaction data, you’ll often see results like an average of 3.5 out of 5. When you apply quantitative analysis to qualitative measurements, you risk producing results that look precise but aren’t truly quantitative. Treating qualitative judgments as if they exist on an interval scale creates an illusion of structure where there may be none. When it comes to machine learning, this is mirrored by means of embeddings, normalizations, and encodings of the data. All of these are attempts at taking your real-world messy data and organizing it all into signals of measurable and consistent quantities, which we can then learn from. The translations from these real-world data sets to the cleaner and learnable sets will always be a reduction, be it quantitative or qualitative data. But every translation is also a reduction: nuance becomes structure, texture becomes number. Whether in statistics or machine learning, measurability always involves a trade-off between richness and regularity, between what is real and what is processable.

    Next, let’s talk about the sufficiency of data to create generalizable results as a condition of analysis. Because even with measurable data, scarcity limits inference. Again, if we look at our two main frameworks of application for this philosophical analysis, machine learning and statistics, we can see great examples of this condition represented. In statistics, samples produce unstable estimates, wide confidence intervals, and high variance. The law of large numbers only works when “large” actually applies. The law of large numbers is a crucial and pivotal result of statistics that says more or less that the more data you have, the closer to the “actual” results you’ll observe in the data. A concrete example is flipping a coin, flipping it twice, and you might get heads twice, but that doesn’t mean that the coin will flip heads 100% of the time. You need to flip the coin more and more times to be more confident about the odds of a specific outcome. In machine learning, sparse data makes models overfit, which is when models come up with relationships that work for the present data but don’t generalize to a wider variety of data. Sparse data can trick the model into learning “quirks” about the specific data set, rather than general rules.

    These first three conditions, detectability, measurability, and sufficiency, set the stage for everything that follows. They define the threshold where analysis becomes possible at all. Without something to detect, a way to measure it, and enough variation to generalize from, analysis risks turning into form without substance. In the next part, I’ll look at the final two conditions of analyzability, learnability, and replicability, and explore what happens when even well-structured data begins to fail those tests.

  • The Limits of Analysis Pt 1

    The Limits of Analysis Pt 1

    The Preconditions of Knowing

    This is Part 1 of a 3-part series on what makes a quantity analyzable. In this opening piece, I set the stage by exploring the limits of analysis itself, why some things can’t be turned into data at all. Part 2 will unpack the five conditions that make analysis possible, and Part 3 will reflect on what happens when those conditions fail.

    When it comes to machine learning and statistics, the impulse to model, predict, or explain is often very compelling. But, before any of that is possible, there is a more fundamental assumption we need to analyze, is the thing we want to analyze even analyzable? Not every phenomenon can be subsumed into the realm of data. Some things resist measurements, while others produce too little evidence to generalize from, and still others dissolve entirely when scrutinized for reproducibility. My main claim is that analysis isn’t automatic, it only works within the right conditions.

    I think it would help to give some precise examples of types of data in the real world we might be familiar with that demonstrate this claim. Customer satisfaction is a good example, it is easy to have a conversation about customer satisfaction, but very difficult to measure directly. A single review does not capture the entire situation when it comes to customer satisfaction. Bias also exists in that, people are more likely to leave a review if they have had a negative experience, rather than a positive experience. Fraud detection is another example, fraud does exist, but it only leaves subtle traces behind. If those traces are not detectable, the problem is unsolved by the data. Happiness is another example that empiricists and positivists have struggled to account for. Happiness is a deep human experience, but how do we adequately detect, quantify, and replicate it across individuals?

    As we look across both statistics and machine learning, we see a meta-pattern about analysis. For a quantity to be analyzable, it must meet certain conditions. In part 2 of this series, I will look into these conditions in much more detail, but we’ll start with a preview of the conditions for now.

    • The data must first be detectable, meaning it has to have some signal, however faint, for us to find.

    • The data must be measurable, meaning it should be able to be constructed into a structured form. This seems at the surface nearly identical to the first condition, but the requirement of structure on the data means we need to be able to do basic things like determine if one signal exceeds another signal, for example, or if there is a hierarchy of the signals we detect, something analogous to the strength of the signal.

    • Next, we need to have sufficient data, enough variation to meaningfully capture a representation of the pattern we wish to analyze. This is one reason why sample sizes in statistics are of such importance when making claims with a specific confidence level.

    • The data must also represent something learnable in the first place, about the data structure and the amount of data you have collected. Meaning, the data must be structured so that algorithms or models can generalize from them.

    • Finally, the data has to be replicable. This is another subtle assumption that will take time to analyze in deeper detail. But in general, we expect the data to be stable across samples and systems.

    In the next article, I’ll dive into each of these conditions, showing how they shape both the limits and the possibilities of data science. If Part 1 is about asking whether something can be analyzed, Part 2 is about learning how to test those conditions in practice.

  • The Algorithm Made It So Pt. 2: Why ‘Optimized’ Doesn’t Mean ‘Best’

    The Algorithm Made It So Pt. 2: Why ‘Optimized’ Doesn’t Mean ‘Best’

    In Part 1, we saw how algorithms produce the realities they claim to measure. Here, in Part 2, I want to dig into what we mean when we say an algorithm is ‘optimized’, and why optimization never means a neutral or universal ‘best’. Let’s think more about what we mean when we say an algorithm is “optimized”. There is a branch of philosophy known as post-structuralism. It comes out of a long philosophical history, but they sensibly come from the structuralists. So, let’s look at some of the central ideas or claims that come from the structuralists, and then we can more easily understand what the post-structuralists’ viewpoint is. Structuralists claim that there is a structure inside everything; there’s a structure to the way that languages operate, there’s a structure to the way that, as the central figure in structuralism, Levi-Strauss wrote about in his books, he claims there is a structure to how things like meals are constructed in specific cultures. He analyzed the specific underlying structure, such as opposites in different contexts, for example, cooked/raw, nature/culture. The idea of opposites is a structural aspect. When it comes to Natural Language processing, for example, machine learning models don’t understand the words in the traditional sense. Instead, they analyze the statistical relations between words across a large amount of training data. To a structuralist, this is evidence of a “latent structure” of language. Furthermore, these relationships are reinforced with sentiment analysis, which is based on structural logic; words like ‘good’ and ‘bad’ are understood by their position in opposition. It’s this structure, the opposites, for example, that are under scrutiny by the structuralist. Optimization in this context is then “discovering” the most efficient or probable solution that already exists inside the structure.


    In contrast to this, the post-structuralists deny that optimization is neutral. Post-structuralists claim that every act of “optimization” constructs the very field it claims to reveal. It isn’t finding the optimal path in a stable system; it’s deciding what counts as optimal in the first place. Let’s look at these two viewpoints when we apply them to a modern social media site like TikTok. The structuralist reading would indicate that the algorithm discovers which videos are inherently engaging. While the post-structuralist reading would indicate that the algorithm produces the conditions for engagement by privileging specific media content/formatting of said video. What counts as engaging, things like comment count or view duration, are a construction of the algorithm and drive the content to fit that mold to appear as engaging to the algorithm. Because categories like engagement, best, or optimized aren’t natural, they’re definitions chosen by developers or institutions. Optimized for whom? Optimized for what?


    From a post-structuralist lens, this means optimization isn’t universal; it only ever exists in a context, and it is goal and data-dependent. Along with this, all of these models are constructed by specific groups of people, with their specific purposes. Following that logic, it’s fair to say that the “best” or “optimized” result is the one that meets their chosen metric within the limits of their data. This is not always nefarious; sometimes it’s the result of technical trade-offs for the performance of the model. But these trade-offs are still decisions based on perspectives embedded in the “why” those trade-offs were made. We must ask ourselves again, optimized for whom, optimized for what?


    If we take a closer look at exactly how these algorithms’ results go from prediction to prescription to create feedback loops. Going back to our first article covering predictive policing, the authors cover how systems that use what is known as “batch analysis”. These are systems that train on data, then later, once newer data has been collected, the model is updated by training with the new data. Systems such as this are susceptible to falling into feedback loops because the newest data you’re training the model on will have been influenced by your redistribution of resources after the first results of the model. Because of this, your model is being trained on data that has been reinforced by the data of the model. A more readily available example might be social media ranking content that optimizes for clicks, even if it has adverse effects on the diversity of ideas. Post or videos that get lots of views get pushed to get even more views, and the feedback loop is already kicked off. These types of effects push reality towards what the model can measure and reward. These feedback loops actively produce the reality they are supposed to reflect. These feedback loops don’t just change outcomes; they change what we think is possible. In Part 3, I’ll explore how this narrowing of possibility connects to ideas from Baudrillard, Debord, and Foucault: optimization as simulation, spectacle, and even governance.