# How Facebook’s Algorithm Suppresses Content Diversity (Modestly) and How the Newsfeed Rules Your Clicks (The Message)

Zeynep Tufekci on May 7, 2015

Today, three researchers at Facebook published an article in Science on how Facebook’s newsfeed algorithm suppresses the amount of “cross-cutting” (i.e. likely to cause disagreement) news articles a person sees. I read a lot of academic research, and usually, the researchers are at a pains to highlight their findings. This one buries them as deep as it could, using a mix of convoluted language and irrelevant comparisons. So, first order of business is spelling out what they found. Also, for another important evaluation — with some overlap to this one — go read this post by University of Michigan professor Christian Sandvig.

The most important finding, if you ask me, is buried in an appendix. Here’s the chart showing that the higher an item is in the newsfeed, the more likely it is clicked on.

Notice how steep the curve is. The higher the link, more (a lot more) likely it will be clicked on. You live and die by placement, determined by the newsfeed algorithm. (The effect, as Sean J. Taylor correctly notes, is a combination of placement, and the fact that the algorithm is guessing what you would like). This was already known, mostly, but it’s great to have it confirmed by Facebook researchers (the study was solely authored by Facebook employees).

The most important caveat that is buried is that this study is not about all of Facebook users, despite language at the end that’s quite misleading. The researchers end their paper with: “Finally, we conclusively establish that on average in the context of Facebook…” No. The research was conducted on a small, skewed subset of Facebook users who chose to self-identify their political affiliation on Facebook and regularly log on to Facebook, about ~4% of the population available for the study. This is super important because this sampling confounds the dependent variable.

The gold standard of sampling is random, where every unit has equal chance of selection, which allows us to do amazing things like predict elections with tiny samples of thousands. Sometimes, researchers use convenience samples — whomever they can find easily — and those can be okay, or not, depending on how typical the sample ends up being compared to the universe. Sometimes, in cases like this, the sampling affects behavior: people who self-identify their politics are almost certainly going to behave quite differently, on average, than people who do not, when it comes to the behavior in question which is sharing and clicking through ideologically challenging content. So, everything in this study applies only to that small subsample of unusual people. (Here’s a post by the always excellent Eszter Hargittai unpacking the sampling issue further.) The study is still interesting, and important, but it is not a study that can generalize to Facebook users. Hopefully that can be a future study.

What does the study actually say?

• Here’s the key finding: Facebook researchers conclusively show that Facebook’s newsfeed algorithm decreases ideologically diverse, cross-cutting content people see from their social networks on Facebook by a measurable amount. The researchers report that exposure to diverse content is suppressed by Facebook’s algorithm by 8% for self-identified liberals and by 5% for self-identified conservatives. Or, as Christian Sandvig puts it, “the algorithm filters out 1 in 20 cross-cutting hard news stories that a self-identified conservative sees (or 5%) and 1 in 13cross-cutting hard news stories that a self-identified liberal sees (8%).” You are seeing fewer news items that you’d disagree with which are shared by your friends because the algorithm is not showing them to you.
• Now, here’s the part which will likely confuse everyone, but it should not. The researchers also report a separate finding that individual choice to limit exposure through clicking behavior results in exposure to 6% less diverse content for liberals and 17% less diverse content for conservatives.

Are you with me? One novel finding is that the newsfeed algorithm (modestly) suppresses diverse content, and another crucial and also novel finding is that placement in the feed is (strongly) influential of click-through rates.

Researchers then replicate and confirm a well-known, uncontested and long-established finding which is that people have a tendency to avoid content that challenges their beliefs. Then, confusingly, the researchers compare whether algorithm suppression effect size is stronger than people choosing what to click, and have a lot of language that leads Christian Sandvig to call this the “it’s not our fault” study. I cannot remember a worse apples to oranges comparison I’ve seen recently, especially since these two dynamics, algorithmic suppression and individual choice, have cumulative effects.

Comparing the individual choice to algorithmic suppression is like asking about the amount of trans fatty acids in french fries, a newly-added ingredient to the menu, and being told that hamburgers, which have long been on the menu, also have trans-fatty acids — an undisputed, scientifically uncontested and non-controversial fact. Individual self-selection in news sources long predates the Internet, and is a well-known, long-identified and well-studied phenomenon. Its scientific standing has never been in question. However, the role of Facebook’s algorithm in this process is a new — and important — issue. Just as the medical profession would be concerned about the amount of trans-fatty acids in the new item, french fries, as well as in the existing hamburgers, researchers should obviously be interested in algorithmic effects in suppressing diversity, in addition to long-standing research on individual choice, since the effects are cumulative. An addition, not a comparison, is warranted.

Imagine this (imperfect) analogy where many people were complaining, say, a washing machine has a faulty mechanism that sometimes destroys clothes. Now imagine washing machine company research paper which finds this claim is correct for a small subsample of these washing machines, and quantifies that effect, but also looks into how many people throw out their clothes before they are totally worn out, a well-established, undisputed fact in the scientific literature. The correct headline would not be “people throwing out used clothes damages more dresses than the the faulty washing machine mechanism.” And if this subsample was drawn from one small factory located everywhere else than all the other factories that manufacture the same brand, and produced only 4% of the devices, the headline would not refer to all washing machines, and the paper would not (should not) conclude with a claim about the average washing machine.

Also, in passing the paper’s conclusion appears misstated. Even though the comparison between personal choice and algorithmic effects is not very relevant, the result is mixed, rather than “conclusively establish[ing] that on average in the context of Facebook individual choices more than algorithms limit exposure to attitude-challenging content”. For self-identified liberals, the algorithm was a stronger suppressor of diversity (8% vs. 6%) while for self-identified conservatives, it was a weaker one (5% vs 17%).)