A ideia da inteligência artificial derrubar a humanidade tem sido discutida por muitas décadas, e os cientistas acabaram de dar seu veredicto sobre se seríamos capazes de controlar uma superinteligência de computador de alto nível. A resposta? Quase definitivamente não.
O problema é que controlar uma superinteligência muito além da compreensão humana exigiria uma simulação dessa superinteligência que podemos analisar. Mas se não formos capazes de compreendê-lo, é impossível criar tal simulação.
Regras como ‘não causar danos aos humanos’ não podem ser definidas se não entendermos o tipo de cenário que uma IA irá criar, sugerem os pesquisadores. Uma vez que um sistema de computador está trabalhando em um nível acima do escopo de nossos programadores, não podemos mais estabelecer limites.
“Uma superinteligência apresenta um problema fundamentalmente diferente daqueles normalmente estudados sob a bandeira da ‘ética do robô’”, escrevem os pesquisadores.
“Isso ocorre porque uma superinteligência é multifacetada e, portanto, potencialmente capaz de mobilizar uma diversidade de recursos para atingir objetivos que são potencialmente incompreensíveis para os humanos, quanto mais controláveis.”
Parte do raciocínio da equipe vem do problema da parada apresentado por Alan Turing em 1936. O problema centra-se em saber se um programa de computador chegará ou não a uma conclusão e responderá (para que seja interrompido), ou simplesmente ficar em um loop eterno tentando encontrar uma.
Como Turing provou por meio de uma matemática inteligente, embora possamos saber isso para alguns programas específicos, é logicamente impossível encontrar uma maneira que nos permita saber isso para cada programa potencial que poderia ser escrito. Isso nos leva de volta à IA, que, em um estado superinteligente, poderia armazenar todos os programas de computador possíveis em sua memória de uma vez.
Qualquer programa escrito para impedir que a IA prejudique humanos e destrua o mundo, por exemplo, pode chegar a uma conclusão (e parar) ou não – é matematicamente impossível para nós estarmos absolutamente seguros de qualquer maneira, o que significa que não pode ser contido.
“Na verdade, isso torna o algoritmo de contenção inutilizável”, diz o cientista da computação Iyad Rahwan, do Instituto Max-Planck para o Desenvolvimento Humano, na Alemanha.
A alternativa de ensinar alguma ética à IA e dizer a ela para não destruir o mundo – algo que nenhum algoritmo pode ter certeza absoluta de fazer, dizem os pesquisadores – é limitar as capacidades da superinteligência. Ele pode ser cortado de partes da Internet ou de certas redes, por exemplo.
O novo estudo também rejeita essa ideia, sugerindo que isso limitaria o alcance da inteligência artificial – o argumento é que se não vamos usá-la para resolver problemas além do escopo dos humanos, então por que criá-la?
Se vamos avançar com a inteligência artificial, podemos nem saber quando chega uma superinteligência além do nosso controle, tal é a sua incompreensibilidade. Isso significa que precisamos começar a fazer algumas perguntas sérias sobre as direções que estamos tomando.
“Uma máquina superinteligente que controla o mundo parece ficção científica”, diz o cientista da computação Manuel Cebrian, do Instituto Max-Planck para o Desenvolvimento Humano. “Mas já existem máquinas que executam certas tarefas importantes de forma independente, sem que os programadores entendam totalmente como as aprenderam.”
“Portanto, surge a questão de saber se isso poderia em algum momento se tornar incontrolável e perigoso para a humanidade.”
Introduction: Sensors everywhere. Infinite storage. Clouds of processors. Our ability to capture, warehouse, and understand massive amounts of data is changing science, medicine, business, and technology. As our collection of facts and figures grows, so will the opportunity to find answers to fundamental questions. Because in the era of big data, more isn’t just more. […]
Sensors everywhere. Infinite storage. Clouds of processors. Our ability to capture, warehouse, and understand massive amounts of data is changing science, medicine, business, and technology. As our collection of facts and figures grows, so will the opportunity to find answers to fundamental questions. Because in the era of big data, more isn’t just more. More is different.
Does big data have the answers? Maybe some, but not all, says Mark Graham
In 2008, Chris Anderson, then editor of Wired, wrote a provocative piece titled The End of Theory. Anderson was referring to the ways that computers, algorithms, and big data can potentially generate more insightful, useful, accurate, or true results than specialists or domain experts who traditionally craft carefully targeted hypotheses and research strategies.
This revolutionary notion has now entered not just the popular imagination, but also the research practices of corporations, states, journalists and academics. The idea being that the data shadows and information trails of people, machines, commodities and even nature can reveal secrets to us that we now have the power and prowess to uncover.
In other words, we no longer need to speculate and hypothesise; we simply need to let machines lead us to the patterns, trends, and relationships in social, economic, political, and environmental relationships.
It is quite likely that you yourself have been the unwitting subject of a big data experiment carried out by Google, Facebook and many other large Web platforms. Google, for instance, has been able to collect extraordinary insights into what specific colours, layouts, rankings, and designs make people more efficient searchers. They do this by slightly tweaking their results and website for a few million searches at a time and then examining the often subtle ways in which people react.
Most large retailers similarly analyse enormous quantities of data from their databases of sales (which are linked to you by credit card numbers and loyalty cards) in order to make uncanny predictions about your future behaviours. In a now famous case, the American retailer, Target, upset a Minneapolis man by knowing more about his teenage daughter’s sex life than he did. Target was able to predict his daughter’s pregnancy by monitoring her shopping patterns and comparing that information to an enormous database detailing billions of dollars of sales. This ultimately allows the company to make uncanny predictions about its shoppers.
More significantly, national intelligence agencies are mining vast quantities of non-public Internet data to look for weak signals that might indicate planned threats or attacks.
There can by no denying the significant power and potentials of big data. And the huge resources being invested in both the public and private sectors to study it are a testament to this.
However, crucially important caveats are needed when using such datasets: caveats that, worryingly, seem to be frequently overlooked.
The raw informational material for big data projects is often derived from large user-generated or social media platforms (e.g. Twitter or Wikipedia). Yet, in all such cases we are necessarily only relying on information generated by an incredibly biased or skewed user-base.
Gender, geography, race, income, and a range of other social and economic factors all play a role in how information is produced and reproduced. People from different places and different backgrounds tend to produce different sorts of information. And so we risk ignoring a lot of important nuance if relying on big data as a social/economic/political mirror.
We can of course account for such bias by segmenting our data. Take the case of using Twitter to gain insights into last summer’s London riots. About a third of all UK Internet users have a twitter profile; a subset of that group are the active tweeters who produce the bulk of content; and then a tiny subset of that group (about 1%) geocode their tweets (essential information if you want to know about where your information is coming from).
Despite the fact that we have a database of tens of millions of data points, we are necessarily working with subsets of subsets of subsets. Big data no longer seems so big. Such data thus serves to amplify the information produced by a small minority (a point repeatedly made by UCL’s Muki Haklay), and skew, or even render invisible, ideas, trends, people, and patterns that aren’t mirrored or represented in the datasets that we work with.
Big data is undoubtedly useful for addressing and overcoming many important issues face by society. But we need to ensure that we aren’t seduced by the promises of big data to render theory unnecessary.
We may one day get to the point where sufficient quantities of big data can be harvested to answer all of the social questions that most concern us. I doubt it though. There will always be digital divides; always be uneven data shadows; and always be biases in how information and technology are used and produced.
And so we shouldn’t forget the important role of specialists to contextualise and offer insights into what our data do, and maybe more importantly, don’t tell us.
Illustration: Marian Bantjes“All models are wrong, but some are useful.”
So proclaimed statistician George Box 30 years ago, and he was right. But what choice did we have? Only models, from cosmological equations to theories of human behavior, seemed to be able to consistently, if imperfectly, explain the world around us. Until now. Today companies like Google, which have grown up in an era of massively abundant data, don’t have to settle for wrong models. Indeed, they don’t have to settle for models at all.
Sixty years ago, digital computers made information readable. Twenty years ago, the Internet made it reachable. Ten years ago, the first search engine crawlers made it a single database. Now Google and like-minded companies are sifting through the most measured age in history, treating this massive corpus as a laboratory of the human condition. They are the children of the Petabyte Age.
The Petabyte Age is different because more is different. Kilobytes were stored on floppy disks. Megabytes were stored on hard disks. Terabytes were stored in disk arrays. Petabytes are stored in the cloud. As we moved along that progression, we went from the folder analogy to the file cabinet analogy to the library analogy to — well, at petabytes we ran out of organizational analogies.
At the petabyte scale, information is not a matter of simple three- and four-dimensional taxonomy and order but of dimensionally agnostic statistics. It calls for an entirely different approach, one that requires us to lose the tether of data as something that can be visualized in its totality. It forces us to view data mathematically first and establish a context for it later. For instance, Google conquered the advertising world with nothing more than applied mathematics. It didn’t pretend to know anything about the culture and conventions of advertising — it just assumed that better data, with better analytical tools, would win the day. And Google was right.
Google’s founding philosophy is that we don’t know why this page is better than that one: If the statistics of incoming links say it is, that’s good enough. No semantic or causal analysis is required. That’s why Google can translate languages without actually “knowing” them (given equal corpus data, Google can translate Klingon into Farsi as easily as it can translate French into German). And why it can match ads to content without any knowledge or assumptions about the ads or the content.
Speaking at the O’Reilly Emerging Technology Conference this past March, Peter Norvig, Google’s research director, offered an update to George Box’s maxim: “All models are wrong, and increasingly you can succeed without them.”
This is a world where massive amounts of data and applied mathematics replace every other tool that might be brought to bear. Out with every theory of human behavior, from linguistics to sociology. Forget taxonomy, ontology, and psychology. Who knows why people do what they do? The point is they do it, and we can track and measure it with unprecedented fidelity. With enough data, the numbers speak for themselves.
The big target here isn’t advertising, though. It’s science. The scientific method is built around testable hypotheses. These models, for the most part, are systems visualized in the minds of scientists. The models are then tested, and experiments confirm or falsify theoretical models of how the world works. This is the way science has worked for hundreds of years.
Scientists are trained to recognize that correlation is not causation, that no conclusions should be drawn simply on the basis of correlation between X and Y (it could just be a coincidence). Instead, you must understand the underlying mechanisms that connect the two. Once you have a model, you can connect the data sets with confidence. Data without a model is just noise.
But faced with massive data, this approach to science — hypothesize, model, test — is becoming obsolete. Consider physics: Newtonian models were crude approximations of the truth (wrong at the atomic level, but still useful). A hundred years ago, statistically based quantum mechanics offered a better picture — but quantum mechanics is yet another model, and as such it, too, is flawed, no doubt a caricature of a more complex underlying reality. The reason physics has drifted into theoretical speculation about n-dimensional grand unified models over the past few decades (the “beautiful story” phase of a discipline starved of data) is that we don’t know how to run the experiments that would falsify the hypotheses — the energies are too high, the accelerators too expensive, and so on.
Now biology is heading in the same direction. The models we were taught in school about “dominant” and “recessive” genes steering a strictly Mendelian process have turned out to be an even greater simplification of reality than Newton’s laws. The discovery of gene-protein interactions and other aspects of epigenetics has challenged the view of DNA as destiny and even introduced evidence that environment can influence inheritable traits, something once considered a genetic impossibility.
In short, the more we learn about biology, the further we find ourselves from a model that can explain it.
There is now a better way. Petabytes allow us to say: “Correlation is enough.” We can stop looking for models. We can analyze the data without hypotheses about what it might show. We can throw the numbers into the biggest computing clusters the world has ever seen and let statistical algorithms find patterns where science cannot.
The best practical example of this is the shotgun gene sequencing by J. Craig Venter. Enabled by high-speed sequencers and supercomputers that statistically analyze the data they produce, Venter went from sequencing individual organisms to sequencing entire ecosystems. In 2003, he started sequencing much of the ocean, retracing the voyage of Captain Cook. And in 2005 he started sequencing the air. In the process, he discovered thousands of previously unknown species of bacteria and other life-forms.
If the words “discover a new species” call to mind Darwin and drawings of finches, you may be stuck in the old way of doing science. Venter can tell you almost nothing about the species he found. He doesn’t know what they look like, how they live, or much of anything else about their morphology. He doesn’t even have their entire genome. All he has is a statistical blip — a unique sequence that, being unlike any other sequence in the database, must represent a new species.
This sequence may correlate with other sequences that resemble those of species we do know more about. In that case, Venter can make some guesses about the animals — that they convert sunlight into energy in a particular way, or that they descended from a common ancestor. But besides that, he has no better model of this species than Google has of your MySpace page. It’s just data. By analyzing it with Google-quality computing resources, though, Venter has advanced biology more than anyone else of his generation.
This kind of thinking is poised to go mainstream. In February, the National Science Foundation announced the Cluster Exploratory, a program that funds research designed to run on a large-scale distributed computing platform developed by Google and IBM in conjunction with six pilot universities. The cluster will consist of 1,600 processors, several terabytes of memory, and hundreds of terabytes of storage, along with the software, including IBM’s Tivoli and open source versions of Google File System and MapReduce.111 Early CluE projects will include simulations of the brain and the nervous system and other biological research that lies somewhere between wetware and software.
Learning to use a “computer” of this scale may be challenging. But the opportunity is great: The new availability of huge amounts of data, along with the statistical tools to crunch these numbers, offers a whole new way of understanding the world. Correlation supersedes causation, and science can advance even without coherent models, unified theories, or really any mechanistic explanation at all.
There’s no reason to cling to our old ways. It’s time to ask: What can science learn from Google?
Em Nova York, constrói-se, agora, uma distopia: não haverá contato social; as maiorias sobreviverão de trabalhos braçais e subalternos; corporações e Estado controlarão os inseridos. Alternativa: incorporar as novas tecnologias ao Comum
Por Slavoj Žižek – publicado 21/05/2020 às 21:49 – Atualizado 21/05/2020 às 22:06
Por Slavoj Zizek | Tradução de Simone Paz
As funções básicas do Estado de Nova York, muito em breve, poderão ser “reimaginadas” graças à aliança do governador Andrew Cuomo com a Big Tech personificada. Seria este o campo de testes para um futuro distópico sem contato físico?
Parece que a escolha básica que nos resta para lidar com a pandemia se reduz a duas opções: uma é ao estilo de Trump (com uma volta à atividade econômica sob as condições de liberdade de mercado e lucratividade, mesmo que isso traga milhares de mortes a mais); a outra é a que nossa mídia chama de o “jeitinho chinês” (um controle estatal, total e digitalizado, dos indivíduos).
Entretanto, nos EUA, ainda existe uma terceira opção, que vem sendo divulgada pelo governador de Nova York, Andrew Cuomo, e pelo ex-CEO do Google, Eric Schmidt — em conjunto com Michael Bloomberg e Bill Gates e sua esposa Melinda, nos bastidores. Naomi Klein e o The Interceptchamam essa alternativa de Screen New Deal [alusão jocosa ao Green New Deal, que pressupõe uma Virada Sócioambiental. Screen New Deal seria algo como Virada para dentro das Telas] Ele vem com a promessa de manter o indivíduo a salvo das infecções, mantendo todas as liberdades pessoais que interessam aos liberais — mas será que tem chances de funcionar?
Em uma de suas reflexões sobre a morte, o comediante de stand-up Anthony Jeselnik fala sobre sua avó: “Nós achávamos que ela tinha morrido feliz, enquanto dormia. Mas a autópsia revelou uma verdade horrível: ela morreu durante a autópsia”. Esse é o problema da autópsia de Eric Schmidt sobre nossa situação: a autópsia e suas implicações tornam nossa situação muito mais catastrófica do que é para ser.
Cuomo e Schmidt anunciaram um projeto para “reimaginar a realidade pós-Covid do estado de Nova York, com ênfase na integração permanente da tecnologia em todos os aspectos da vida cívica”. Na visão de Klein, isso levará a um “futuro-sem-contato permanente, altamente lucrativo”, no qual não existirá o dinheiro vivo, nem a necessidade de sair de casa para gastá-lo. Todos os serviços e mercadorias possíveis poderão ser encomendados pela internet, entregues por drone, e “compartilhados numa tela, por meio de uma plataforma”. E, para fazer esse futuro funcionar, seria necessário explorar massivamente “trabalhadores anônimos aglomerados em armazéns, data centers, fábricas de moderação de conteúdo, galpões de manufatura de eletrônicos, minas de lítio, fazendas industriais, plantas de processamento de carne, e prisões”. Existem dois aspectos cruciais que chamam a atenção nesta descrição logo de cara.
O primeiro é o paradoxo de que os privilegiados que poderão usufruir de uma vida nos ambientes sem contato serão, também, os mais controlados: toda a vida deles estará nua à verdadeira sede do poder, à combinação do governo com a Big Tech. Está certo que as redes que são a alma de nossa existência estejam nas mãos de empresas privadas como Google, Amazon e Apple? Empresas que, fundidas com agências de segurança estatais, terão a capacidade de censurar e manipular os dados disponíveis para nós ou mesmo nos desconectar do espaço público? Lembre-se de que Schmidt e Cuomo recebem imensos investimentos públicos nessas empresas — então, não deveria o público ter também acesso a elas e poder controlá-las? Em resumo, como propõe Klein, eles não deveriam ser transformados em serviços públicos sem fins lucrativos? Sem um movimento semelhante, a democracia, em qualquer sentido significativo, será de fato abolida, já que o componente básico de nossos bens comuns — o espaço compartilhado de nossa comunicação e interação — estará sob controle privado
O segundo aspecto é que o Screen New Deal intervém na luta de classes num ponto bem específico e preciso. A crise do vírus nos conscientizou completamente do papel crucial daqueles que David Harvey chamou de “nova classe trabalhadora”: cuidadores de todos os tipos, desde enfermeiros até aqueles que entregam comida e outros pacotes, ou os que esvaziam nossas lixeiras, etc. Para nós, que conseguimos nos auto-isolar, esses trabalhadores se tornaram nosso principal contato com outro, em sua forma corpórea, uma fonte de ajuda, mas também de possível contágio. O Screen New Deal não passa de um plano para minimizar o papel visível dessa classe de cuidadores, que deve permanecer não-isolada, praticamente desprotegida, expondo-se ao perigo viral, para que nós, os privilegiados, possamos sobreviver em segurança — alguns até sonham com a possibilidade de que robôs passem a tomar conta dos idosos e lhes façam companhia… Mas esses cuidadores invisíveis podem se rebelar, exigindo maior proteção: na indústria de frigoríficos nos EUA, milhares de trabalhadores tiveram a covid, e dezenas morreram; e coisas semelhantes estão acontecendo na Alemanha. Agora, novas formas de luta de classes vão surgir
Se levarmos esse projeto à sua conclusão hiperbólica, ao final do Screen New Deal existe a ideia de um cérebro conectado, de nossos cérebros compartilhando diretamente experiências em uma Singularidade, uma espécie de autoconsciência coletiva divina. Elon Musk, outro gênio da tecnologia de nossos tempos, recentemente declarou que ele acredita que em questão de 10 anos a linguagem humana estará obsoleta e que, se alguém ainda a utilizar, será “por motivos sentimentais”. Como diretor da Neuralink, ele diz que planeja conectar um dispositivo ao cérebro humano dentro de 12 meses
Esse cenário, quando combinado com a extrapolação do futuro em casa de Naomi Klein, a partir das ambições dos simbiontes de Big Tech de Cuomo, não lembra a situação dos humanos no filme Matrix? Protegidos, fisicamente isolados e sem palavras em nossas bolhas de isolamento, estaremos mais unidos do que nunca, espiritualmente, enquanto os senhores da alta tecnologia lucram e uma multidão de milhões de humanos invisíveis faz o trabalho pesado — uma visão de pesadelo, se é que alguma vez existiu alguma
No Chile, durante os protestos que eclodiram em outubro de 2019, uma pichação num muro dizia: “Outro fim de mundo é possível”. Essa deveria ser nossa resposta para o Screen New Deal: sim, nosso mundo chegou ao fim, mas um futuro-sem-contato não é a única alternativa, outro fim de mundo é possível.
Summary of the article: Strong coronavirus measures today should only last a few weeks, there shouldn’t be a big peak of infections afterwards, and it can all be done for a reasonable cost to society, saving millions of lives along the way. If we don’t take these measures, tens of millions will be infected, many will die, along with anybody else that requires intensive care, because the healthcare system will have collapsed.
Within a week, countries around the world have gone from: “This coronavirus thing is not a big deal” to declaring the state of emergency. Yet many countries are still not doing much. Why?
Every country is asking the same question: How should we respond? The answer is not obvious to them.
Some countries, like France, Spain or Philippines, have since ordered heavy lockdowns. Others, like the US, UK, or Switzerland, have dragged their feet, hesitantly venturing into social distancing measures.
Here’s what we’re going to cover today, again with lots of charts, data and models with plenty of sources:
What’s the current situation?
What options do we have?
What’s the one thing that matters now: Time
What does a good coronavirus strategy look like?
How should we think about the economic and social impacts?
When you’re done reading the article, this is what you’ll take away:
Our healthcare system is already collapsing. Countries have two options: either they fight it hard now, or they will suffer a massive epidemic. If they choose the epidemic, hundreds of thousands will die. In some countries, millions. And that might not even eliminate further waves of infections. If we fight hard now, we will curb the deaths. We will relieve our healthcare system. We will prepare better. We will learn. The world has never learned as fast about anything, ever. And we need it, because we know so little about this virus. All of this will achieve something critical: Buy Us Time.
If we choose to fight hard, the fight will be sudden, then gradual. We will be locked in for weeks, not months. Then, we will get more and more freedoms back. It might not be back to normal immediately. But it will be close, and eventually back to normal. And we can do all that while considering the rest of the economy too.
Ok, let’s do this.
1. What’s the situation?
Last week, I showed this curve:
It showed coronavirus cases across the world outside of China. We could only discern Italy, Iran and South Korea. So I had to zoom in on the bottom right corner to see the emerging countries. My entire point is that they would soon be joining these 3 cases.
Let’s see what has happened since.
As predicted, the number of cases has exploded in dozens of countries. Here, I was forced to show only countries with over 1,000 cases. A few things to note:
Spain, Germany, France and the US all have more cases than Italy when it ordered the lockdown
An additional 16 countries have more cases today than Hubei when it went under lockdown: Japan, Malaysia, Canada, Portugal, Australia, Czechia, Brazil and Qatar have more than Hubei but below 1,000 cases. Switzerland, Sweden, Norway, Austria, Belgium, Netherlands and Denmark all have above 1,000 cases.
Do you notice something weird about this list of countries? Outside of China and Iran, which have suffered massive, undeniable outbreaks, and Brazil and Malaysia, every single country in this list is among the wealthiest in the world.
Do you think this virus targets rich countries? Or is it more likely that rich countries are better able to identify the virus?
It’s unlikely that poorer countries aren’t touched. Warm and humid weather probablyhelps, but doesn’t prevent an outbreak by itself — otherwise Singapore, Malaysia or Brazil wouldn’t be suffering outbreaks.
The most likely interpretations are that the coronavirus either took longer to reach these countries because they’re less connected, or it’s already there but these countries haven’t been able to invest enough on testing to know.
Either way, if this is true, it means that most countries won’t escape the coronavirus. It’s a matter of time before they see outbreaks and need to take measures.
What measures can different countries take?
2. What Are Our Options?
Since the article last week, the conversation has changed and many countries have taken measures. Here are some of the most illustrative examples:
Measures in Spain and France
In one extreme, we have Spain and France. This is the timeline of measures for Spain:
On Thursday, 3/12, the President dismissed suggestions that the Spanish authorities had been underestimating the health threat. On Friday, they declared the State of Emergency. On Saturday, measures were taken:
People can’t leave home except for key reasons: groceries, work, pharmacy, hospital, bank or insurance company (extreme justification)
Specific ban on taking kids out for a walk or seeing friends or family (except to take care of people who need help, but with hygiene and physical distance measures)
All bars and restaurants closed. Only take-home acceptable.
All entertainment closed: sports, movies, museums, municipal celebrations…
Weddings can’t have guests. Funerals can’t have more than a handful of people.
Mass transit remains open
On Monday, land borders were shut.
Some people see this as a great list of measures. Others put their hands up in the air and cry of despair. This difference is what this article will try to reconcile.
France’s timeline of measures is similar, except they took more time to apply them, and they are more aggressive now. For example, rent, taxes and utilities are suspended for small businesses.
Measures in the US and UK
The US and UK, like countries such as Switzerland, have dragged their feet in implementing measures. Here’s the timeline for the US:
Wednesday 3/11: travel ban.
Friday: National Emergency declared. No social distancing measures
Monday: the government urges the public to avoid restaurants or bars and attend events with more than 10 people. No social distancing measure is actually enforceable. It’s just a suggestion.
Lots of states and cities are taking the initiative and mandating much stricter measures.
The UK has seen a similar set of measures: lots of recommendations, but very few mandates.
These two groups of countries illustrate the two extreme approaches to fight the coronavirus: mitigation and suppression. Let’s understand what they mean.
Option 1: Do Nothing
Before we do that, let’s see what doing nothing would entail for a country like the US:
If we do nothing: Everybody gets infected, the healthcare system gets overwhelmed, the mortality explodes, and ~10 million people die (blue bars). For the back-of-the-envelope numbers: if ~75% of Americans get infected and 4% die, that’s 10 million deaths, or around 25 times the number of US deaths in World War II.
You might wonder: “That sounds like a lot. I’ve heard much less than that!”
So what’s the catch? With all these numbers, it’s easy to get confused. But there’s only two numbers that matter: What share of people will catch the virus and fall sick, and what share of them will die. If only 25% are sick (because the others have the virus but don’t have symptoms so aren’t counted as cases), and the fatality rate is 0.6% instead of 4%, you end up with 500k deaths in the US.
If we don’t do anything, the number of deaths from the coronavirus will probably land between these two numbers. The chasm between these extremes is mostly driven by the fatality rate, so understanding it better is crucial. What really causes the coronavirus deaths?
How Should We Think about the Fatality Rate?
This is the same graph as before, but now looking at hospitalized people instead of infected and dead:
The light blue area is the number of people who would need to go to the hospital, and the darker blue represents those who need to go to the intensive care unit (ICU). You can see that number would peak at above 3 million.
Now compare that to the number of ICU beds we have in the US (50k today, we could double that repurposing other space). That’s the red dotted line.
No, that’s not an error.
That red dotted line is the capacity we have of ICU beds. Everyone above that line would be in critical condition but wouldn’t be able to access the care they need, and would likely die.
This is why people died in droves in Hubei and are now dying in droves in Italy and Iran. The Hubei fatality rate ended up better than it could have been because they built 2 hospitals nearly overnight. Italy and Iran can’t do the same; few, if any, other countries can. We’ll see what ends up happening there.
So why is the fatality rate close to 4%?
If 5% of your cases require intensive care and you can’t provide it, most of those people die. As simple as that.
These numbers only show people dying from coronavirus. But what happens if all your healthcare system is collapsed by coronavirus patients? Others also die from other ailments.
What happens if you have a heart attack but the ambulance takes 50 minutes to come instead of 8 (too many coronavirus cases) and once you arrive, there’s no ICU and no doctor available? You die.
There are 4 million admissions to the ICU in the US every year, and 500k (~13%) of them die. Without ICU beds, that share would likely go much closer to 80%. Even if only 50% died, in a year-long epidemic you go from 500k deaths a year to 2M, so you’re adding 1.5M deaths, just with collateral damage.
If the coronavirus is left to spread, the US healthcare system will collapse, and the deaths will be in the millions, maybe more than 10 million.
The same thinking is true for most countries. The number of ICU beds and ventilators and healthcare workers are usually similar to the US or lower in most countries. Unbridled coronavirus means healthcare system collapse, and that means mass death.
Unbridled coronavirus means healthcare systems collapse, and that means mass death.
By now, I hope it’s pretty clear we should act. The two options that we have are mitigation and suppression. Both of them propose to “flatten the curve”, but they go about it very differently.
Option 2: Mitigation Strategy
Mitigation goes like this: “It’s impossible to prevent the coronavirus now, so let’s just have it run its course, while trying to reduce the peak of infections. Let’s just flatten the curve a little bit to make it more manageable for the healthcare system.”
This chart appears in a very important paper published over the weekend from the Imperial College London. Apparently, it pushed the UK and US governments to change course.
It’s a very similar graph as the previous one. Not the same, but conceptually equivalent. Here, the “Do Nothing” situation is the black curve. Each one of the other curves are what would happen if we implemented tougher and tougher social distancing measures. The blue one shows the toughest social distancing measures: isolating infected people, quarantining people who might be infected, and secluding old people. This blue line is broadly the current UK coronavirus strategy, although for now they’re just suggesting it, not mandating it.
Here, again, the red line is the capacity for ICUs, this time in the UK. Again, that line is very close to the bottom. All that area of the curve on top of that red line represents coronavirus patients who would mostly die because of the lack of ICU resources.
Not only that, but by flattening the curve, the ICUs will collapse for months, increasing collateral damage.
You should be shocked. When you hear: “We’re going to do some mitigation” what they’re really saying is: “We will knowingly overwhelm the healthcare system, driving the fatality rate up by a factor of 10x at least.”
You would imagine this is bad enough. But we’re not done yet. Because one of the key assumptions of this strategy is what’s called “Herd Immunity”.
Herd Immunity and Virus Mutation
The idea is that all the people who are infected and then recover are now immune to the virus. This is at the core of this strategy: “Look, I know it’s going to be hard for some time, but once we’re done and a few million people die, the rest of us will be immune to it, so this virus will stop spreading and we’ll say goodbye to the coronavirus. Better do it at once and be done with it, because our alternative is to do social distancing for up to a year and risk having this peak happen later anyways.”
Except this assumes one thing: the virus doesn’t change too much. If it doesn’t change much, then lots of people do get immunity, and at some point the epidemic dies down
How likely is this virus to mutate? It seems it already has.
This graph represents the different mutations of the virus. You can see that the initial strains started in purple in China and then spread. Each time you see a branching on the left graph, that is a mutation leading to a slightly different variant of the virus.
This should not be surprising: RNA-based viruses like the coronavirus or the flu tend to mutate around 100 times faster than DNA-based ones—although the coronavirus mutates more slowly than influenza viruses.
Not only that, but the best way for this virus to mutate is to have millions of opportunities to do so, which is exactly what a mitigation strategy would provide: hundreds of millions of people infected.
That’s why you have to get a flu shot every year. Because there are so many flu strains, with new ones always evolving, the flu shot can never protect against all strains.
Put in another way: the mitigation strategy not only assumes millions of deaths for a country like the US or the UK. It also gambles on the fact that the virus won’t mutate too much — which we know it does. And it will give it the opportunity to mutate. So once we’re done with a few million deaths, we could be ready for a few million more — every year. This corona virus could become a recurring fact of life, like the flu, but many times deadlier.
The best way for this virus to mutate is to have millions of opportunities to do so, which is exactly what a mitigation strategy would provide.
So if neither doing nothing and mitigation will work, what’s the alternative? It’s called suppression.
Option 3: Suppression Strategy
The Mitigation Strategy doesn’t try to contain the epidemic, just flatten the curve a bit. Meanwhile, the Suppression Strategy tries to apply heavy measures to quickly get the epidemic under control. Specifically:
Go hard right now. Order heavy social distancing. Get this thing under control.
Then, release the measures, so that people can gradually get back their freedoms and something approaching normal social and economic life can resume.
What does that look like?
Under a suppression strategy, after the first wave is done, the death toll is in the thousands, and not in the millions.
Why? Because not only do we cut the exponential growth of cases. We also cut the fatality rate since the healthcare system is not completely overwhelmed. Here, I used a fatality rate of 0.9%, around what we’re seeing in South Korea today, which has been most effective at following Suppression Strategy.
Said like this, it sounds like a no-brainer. Everybody should follow the Suppression Strategy.
So why do some governments hesitate?
They fear three things:
This first lockdown will last for months, which seems unacceptable for many people.
A months-long lockdown would destroy the economy.
It wouldn’t even solve the problem, because we would be just postponing the epidemic: later on, once we release the social distancing measures, people will still get infected in the millions and die.
Here is how the Imperial College team modeled suppressions. The green and yellow lines are different scenarios of Suppression. You can see that doesn’t look good: We still get huge peaks, so why bother?
We’ll get to these questions in a moment, but there’s something more important before.
This is completely missing the point.
Presented like these, the two options of Mitigation and Suppression, side by side, don’t look very appealing. Either a lot of people die soon and we don’t hurt the economy today, or we hurt the economy today, just to postpone the deaths.
This ignores the value of time.
3. The Value of Time
In our previous post, we explained the value of time in saving lives. Every day, every hour we waited to take measures, this exponential threat continued spreading. We saw how a single day could reduce the total cases by 40% and the death toll by even more.
But time is even more valuable than that.
We’re about to face the biggest wave of pressure on the healthcare system ever seen in history. We are completely unprepared, facing an enemy we don’t know. That is not a good position for war.
What if you were about to face your worst enemy, of which you knew very little, and you had two options: Either you run towards it, or you escape to buy yourself a bit of time to prepare. Which one would you choose?
This is what we need to do today. The world has awakened. Every single day we delay the coronavirus, we can get better prepared. The next sections detail what that time would buy us:
Lower the Number of Cases
With effective suppression, the number of true cases would plummet overnight, as we saw in Hubei last week.
As of today, there are 0 daily new cases of coronavirus in the entire 60 million-big region of Hubei.
The diagnostics would keep going up for a couple of weeks, but then they would start going down. With fewer cases, the fatality rate starts dropping too. And the collateral damage is also reduced: fewer people would die from non-coronavirus-related causes because the healthcare system is simply overwhelmed.
Suppression would get us:
Fewer total cases of Coronavirus
Immediate relief for the healthcare system and the humans who run it
Reduction in fatality rate
Reduction in collateral damage
Ability for infected, isolated and quarantined healthcare workers to get better and back to work. In Italy, healthcare workers represent 8% of all contagions.
Understand the True Problem: Testing and Tracing
Right now, the UK and the US have no idea about their true cases. We don’t know how many there are. We just know the official number is not right, and the true one is in the tens of thousands of cases. This has happened because we’re not testing, and we’re not tracing.
With a few more weeks, we could get our testing situation in order, and start testing everybody. With that information, we would finally know the true extent of the problem, where we need to be more aggressive, and what communities are safe to be released from a lockdown.
We could also set up a tracing operation like the ones they have in China or other East Asia countries, where they can identify all the people that every sick person met, and can put them in quarantine. This would give us a ton of intelligence to release later on our social distancing measures: if we know where the virus is, we can target these places only. This is not rocket science: it’s the basics of how East Asia Countries have been able to control this outbreak without the kind of draconian social distancing that is increasingly essential in other countries.
The measures from this section (testing and tracing) single-handedly curbed the growth of the coronavirus in South Korea and got the epidemic under control, without a strong imposition of social distancing measures.
Build Up Capacity
The US (and presumably the UK) are about to go to war without armor.
We have masks for just two weeks, few personal protective equipments (“PPE”), not enough ventilators, not enough ICU beds, not enough ECMOs (blood oxygenation machines)… This is why the fatality rate would be so high in a mitigation strategy.
But if we buy ourselves some time, we can turn this around:
We have more time to buy equipment we will need for a future wave
We can quickly build up our production of masks, PPEs, ventilators, ECMOs, and any other critical device to reduce fatality rate.
Put in another way: we don’t need years to get our armor, we need weeks. Let’s do everything we can to get our production humming now. Countries are mobilized. People are being inventive, such as using 3D printing for ventilator parts. We can do it. We just need more time. Would you wait a few weeks to get yourself some armor before facing a mortal enemy?
This is not the only capacity we need. We will need health workers as soon as possible. Where will we get them? We need to train people to assist nurses, and we need to get medical workers out of retirement. Many countries have already started, but this takes time. We can do this in a few weeks, but not if everything collapses.
Lower Public Contagiousness
The public is scared. The coronavirus is new. There’s so much we don’t know how to do yet! People haven’t learned to stop hand-shaking. They still hug. They don’t open doors with their elbow. They don’t wash their hands after touching a door knob. They don’t disinfect tables before sitting.
Once we have enough masks, we can use them outside of the healthcare system too. Right now, it’s better to keep them for healthcare workers. But if they weren’t scarce, people should wear them in their daily lives, making it less likely that they infect other people when sick, and with proper training also reducing the likelihood that the wearers get infected. (In the meantime, wearing something is better than nothing.)
All of these are pretty cheap ways to reduce the transmission rate. The less this virus propagates, the fewer measures we’ll need in the future to contain it. But we need time to educate people on all these measures and equip them.
Understand the Virus
We know very very little about the virus. But every week, hundreds of new papers are coming.
The world is finally united against a common enemy. Researchers around the globe are mobilizing to understand this virus better.
How does the virus spread? How can contagion be slowed down? What is the share of asymptomatic carriers? Are they contagious? How much? What are good treatments? How long does it survive? On what surfaces? How do different social distancing measures impact the transmission rate? What’s their cost? What are tracing best practices? How reliable are our tests?
Clear answers to these questions will help make our response as targeted as possible while minimizing collateral economic and social damage. And they will come in weeks, not years.
Not only that, but what if we found a treatment in the next few weeks? Any day we buy gets us closer to that. Right now, there are already several candidates, such as Favipiravir, Chloroquine, or Chloroquine combined with Azithromycin. What if it turned out that in two months we discovered a treatment for the coronavirus? How stupid would we look if we already had millions of deaths following a mitigation strategy?
Understand the Cost-Benefits
All of the factors above can help us save millions of lives. That should be enough. Unfortunately, politicians can’t only think about the lives of the infected. They must think about all the population, and heavy social distancing measures have an impact on others.
Right now we have no idea how different social distancing measures reduce transmission. We also have no clue what their economic and social costs are.
Isn’t it a bit difficult to decide what measures we need for the long term if we don’t know their cost or benefit?
A few weeks would give us enough time to start studying them, understand them, prioritize them, and decide which ones to follow.
Fewer cases, more understanding of the problem, building up assets, understanding the virus, understanding the cost-benefit of different measures, educating the public… These are some core tools to fight the virus, and we just need a few weeks to develop many of them. Wouldn’t it be dumb to commit to a strategy that throws us instead, unprepared, into the jaws of our enemy?
4. The Hammer and the Dance
Now we know that the Mitigation Strategy is probably a terrible choice, and that the Suppression Strategy has a massive short-term advantage.
But people have rightful concerns about this strategy:
How long will it actually last?
How expensive will it be?
Will there be a second peak as big as if we didn’t do anything?
Here, we’re going to look at what a true Suppression Strategy would look like. We can call it the Hammer and the Dance.
First, you act quickly and aggressively. For all the reasons we mentioned above, given the value of time, we want to quench this thing as soon as possible.
One of the most important questions is: How long will this last?
The fear that everybody has is that we will be locked inside our homes for months at a time, with the ensuing economic disaster and mental breakdowns. This idea was unfortunately entertained in the famous Imperial College paper:
Do you remember this chart? The light blue area that goes from end of March to end of August is the period that the paper recommends as the Hammer, the initial suppression that includes heavy social distancing.
If you’re a politician and you see that one option is to let hundreds of thousands or millions of people die with a mitigation strategy and the other is to stop the economy for five months before going through the same peak of cases and deaths, these don’t sound like compelling options.
But this doesn’t need to be so. This paper, driving policy today, has been brutally criticized for core flaws: They ignore contact tracing (at the core of policies in South Korea, China or Singapore among others) or travel restrictions (critical in China), ignore the impact of big crowds…
The time needed for the Hammer is weeks, not months.
This graph shows the new cases in the entire Hubei region (60 million people) every day since 1/23. Within 2 weeks, the country was starting to get back to work. Within ~5 weeks it was completely under control. And within 7 weeks the new diagnostics was just a trickle. Let’s remember this was the worst region in China.
Remember again that these are the orange bars. The grey bars, the true cases, had plummeted much earlier (see Chart 9).
The measures they took were pretty similar to the ones taken in Italy, Spain or France: isolations, quarantines, people had to stay at home unless there was an emergency or had to buy food, contact tracing, testing, more hospital beds, travel bans…
Details matter, however.
China’s measures were stronger. For example, people were limited to one person per household allowed to leave home every three days to buy food. Also, their enforcement was severe. It is likely that this severity stopped the epidemic faster.
In Italy, France and Spain, measures were not as drastic, and their implementation is not as tough. People still walk on the streets, many without masks. This is likely to result in a slower Hammer: more time to fully control the epidemic.
Some people interpret this as “Democracies will never be able to replicate this reduction in cases”. That’s wrong.
For several weeks, South Korea had the worst epidemic outside of China. Now, it’s largely under control. And they did it without asking people to stay home. They achieved it mostly with very aggressive testing, contact tracing, and enforced quarantines and isolations.
The following table gives a good sense of what measures different countries have followed, and how that has impacted them (this is a work-in-progress. Feedback welcome.)
This shows how countries who were prepared, with stronger epidemiological authority, education on hygiene and social distancing, and early detection and isolation, didn’t have to pay with heavier measures afterwards.
Conversely, countries like Italy, Spain or France weren’t doing these well, and had to then apply the Hammer with the hard measures at the bottom to catch up.
The lack of measures in the US and UK is in stark contrast, especially in the US. These countries are still not doing what allowed Singapore, South Korea or Taiwan to control the virus, despite their outbreaks growing exponentially. But it’s a matter of time. Either they have a massive epidemic, or they realize late their mistake, and have to overcompensate with a heavier Hammer. There is no escape from this.
But it’s doable. If an outbreak like South Korea’s can be controlled in weeks and without mandated social distancing, Western countries, which are already applying a heavy Hammer with strict social distancing measures, can definitely control the outbreak within weeks. It’s a matter of discipline, execution, and how much the population abides by the rules.
Once the Hammer is in place and the outbreak is controlled, the second phase begins: the Dance.
If you hammer the coronavirus, within a few weeks you’ve controlled it and you’re in much better shape to address it. Now comes the longer-term effort to keep this virus contained until there’s a vaccine.
This is probably the single biggest, most important mistake people make when thinking about this stage: they think it will keep them home for months. This is not the case at all. In fact, it is likely that our lives will go back to close to normal.
In this video, the South Korea Foreign Minister explains how her country did it. It was pretty simple: efficient testing, efficient tracing, travel bans, efficient isolating and efficient quarantining.
Want to guess their measures? The same ones as in South Korea. In their case, they complemented with economic help to those in quarantine and travel bans and delays.
Is it too late for these countries and others? No. By applying the Hammer, they’re getting a new chance, a new shot at doing this right. The more they wait, the heavier and longer the hammer, but it can control the epidemics.
But what if all these measures aren’t enough?
The Dance of R
I call the months-long period between the Hammer and a vaccine or effective treatment the Dance because it won’t be a period during which measures are always the same harsh ones. Some regions will see outbreaks again, others won’t for long periods of time. Depending on how cases evolve, we will need to tighten up social distancing measures or we will be able to release them. That is the dance of R: a dance of measures between getting our lives back on track and spreading the disease, one of economy vs. healthcare.
How does this dance work?
It all turns around the R. If you remember, it’s the transmission rate. Early on in a standard, unprepared country, it’s somewhere between 2 and 3: During the few weeks that somebody is infected, they infect between 2 and 3 other people on average.
If R is above 1, infections grow exponentially into an epidemic. If it’s below 1, they die down.
During the Hammer, the goal is to get R as close to zero, as fast as possible, to quench the epidemic. In Wuhan, it is calculated that R was initially 3.9, and after the lockdown and centralized quarantine, it went down to 0.32.
But once you move into the Dance, you don’t need to do that anymore. You just need your R to stay below 1: a lot of the social distancing measures have true, hard costs on people. They might lose their job, their business, their healthy habits…
You can remain below R=1 with a few simple measures.
This is an approximation of how different types of patients respond to the virus, as well as their contagiousness. Nobody knows the true shape of this curve, but we’ve gathered data from different papers to approximate how it looks like.
Every day after they contract the virus, people have some contagion potential. Together, all these days of contagion add up to 2.5 contagions on average.
It is believed that there are some contagions already happening during the “no symptoms” phase. After that, as symptoms grow, usually people go to the doctor, get diagnosed, and their contagiousness diminishes.
For example, early on you have the virus but no symptoms, so you behave as normal. When you speak with people, you spread the virus. When you touch your nose and then open door knob, the next people to open the door and touch their nose get infected.
The more the virus is growing inside you, the more infectious you are. Then, once you start having symptoms, you might slowly stop going to work, stay in bed, wear a mask, or start going to the doctor. The bigger the symptoms, the more you distance yourself socially, reducing the spread of the virus.
Once you’re hospitalized, even if you are very contagious you don’t tend to spread the virus as much since you’re isolated.
This is where you can see the massive impact of policies like those of Singapore or South Korea:
If people are massively tested, they can be identified even before they have symptoms. Quarantined, they can’t spread anything.
If people are trained to identify their symptoms earlier, they reduce the number of days in blue, and hence their overall contagiousness
If people are isolated as soon as they have symptoms, the contagions from the orange phase disappear.
If people are educated about personal distance, mask-wearing, washing hands or disinfecting spaces, they spread less virus throughout the entire period.
Only when all these fail do we need heavier social distancing measures.
The ROI of Social Distancing
If with all these measures we’re still way above R=1, we need to reduce the average number of people that each person meets.
There are some very cheap ways to do that, like banning events with more than a certain number of people (eg, 50, 500), or asking people to work from home when they can.
Other are much, much more expensive economically, socially and ethically, such as closing schools and universities, asking everybody to stay home, or closing businesses.
This chart is made up because it doesn’t exist today. Nobody has done enough research about this or put together all these measures in a way that can compare them.
It’s unfortunate, because it’s the single most important chart that politicians would need to make decisions. It illustrates what is really going through their minds.
During the Hammer period, politicians want to lower R as much as possible, through measures that remain tolerable for the population. In Hubei, they went all the way to 0.32. We might not need that: maybe just to 0.5 or 0.6.
But during the Dance of the R period, they want to hover as close to 1 as possible, while staying below it over the long term term. That prevents a new outbreak, while eliminating the most drastic measures.
What this means is that, whether leaders realize it or not, what they’re doing is:
List all the measures they can take to reduce R
Get a sense of the benefit of applying them: the reduction in R
Get a sense of their cost: the economic, social, and ethical cost.
Stack-rank the initiatives based on their cost-benefit
Pick the ones that give the biggest R reduction up till 1, for the lowest cost.
Initially, their confidence on these numbers will be low. But that‘s still how they are thinking—and should be thinking about it.
What they need to do is formalize the process: Understand that this is a numbers game in which we need to learn as fast as possible where we are on R, the impact of every measure on reducing R, and their social and economic costs.
Only then will they be able to make a rational decision on what measures they should take.
Conclusion: Buy Us Time
The coronavirus is still spreading nearly everywhere. 152 countries have cases. We are against the clock. But we don’t need to be: there’s a clear way we can be thinking about this.
Some countries, especially those that haven’t been hit heavily yet by the coronavirus, might be wondering: Is this going to happen to me? The answer is: It probably already has. You just haven’t noticed. When it really hits, your healthcare system will be in even worse shape than in wealthy countries where the healthcare systems are strong. Better safe than sorry, you should consider taking action now.
For the countries where the coronavirus is already here, the options are clear.
On one side, countries can go the mitigation route: create a massive epidemic, overwhelm the healthcare system, drive the death of millions of people, and release new mutations of this virus in the wild.
On the other, countries can fight. They can lock down for a few weeks to buy us time, create an educated action plan, and control this virus until we have a vaccine.
Governments around the world today, including some such as the US, the UK or Switzerland have so far chosen the mitigation path.
That means they’re giving up without a fight. They see other countries having successfully fought this, but they say: “We can’t do that!”
What if Churchill had said the same thing? “Nazis are already everywhere in Europe. We can’t fight them. Let’s just give up.” This is what many governments around the world are doing today. They’re not giving you a chance to fight this. You have to demand it.
Share the Word
Unfortunately, millions of lives are still at stake. Share this article—or any similar one—if you think it can change people’s opinion. Leaders need to understand this to avert a catastrophe. The moment to act is now.
This article has been the result of a herculean effort by a group of normal citizens working around the clock to find all the relevant research available to structure it into one piece, in case it can help others process all the information that is out there about the coronavirus.
Special thanks to Dr. Carl Juneau (epidemiologist and translator of the French version), Dr. Brandon Fainstad, Pierre Djian, Jorge Peñalva, John Hsu, Genevieve Gee, Elena Baillie, Chris Martinez, Yasemin Denari, Christine Gibson, Matt Bell, Dan Walsh, Jessica Thompson, Karim Ravji, Annie Hazlehurst, and Aishwarya Khanduja. This has been a team effort.
Thank you also to Berin Szoka, Shishir Mehrotra, QVentus, Illumina, Josephine Gavignet, Mike Kidd, and Nils Barth for your advice. Thank you to my company, Course Hero, for giving me the time and freedom to focus on this.
When a helicopter rushed a 13-year-old girl showing symptoms suggestive of kidney failure to Stanford’s Packard Children’s Hospital, Jennifer Frankovich was the rheumatologist on call. She and a team of other doctors quickly diagnosed lupus, an autoimmune disease. But as they hurried to treat the girl, Frankovich thought that something about the patient’s particular combination of lupus symptoms — kidney problems, inflamed pancreas and blood vessels — rang a bell. In the past, she’d seen lupus patients with these symptoms develop life-threatening blood clots. Her colleagues in other specialties didn’t think there was cause to give the girl anti-clotting drugs, so Frankovich deferred to them. But she retained her suspicions. “I could not forget these cases,” she says.
Back in her office, she found that the scientific literature had no studies on patients like this to guide her. So she did something unusual: She searched a database of all the lupus patients the hospital had seen over the previous five years, singling out those whose symptoms matched her patient’s, and ran an analysis to see whether they had developed blood clots. “I did some very simple statistics and brought the data to everybody that I had met with that morning,” she says. The change in attitude was striking. “It was very clear, based on the database, that she could be at an increased risk for a clot.”
The girl was given the drug, and she did not develop a clot. “At the end of the day, we don’t know whether it was the right decision,” says Chris Longhurst, a pediatrician and the chief medical information officer at Stanford Children’s Health, who is a colleague of Frankovich’s. But they felt that it was the best they could do with the limited information they had.
A large, costly and time-consuming clinical trial with proper controls might someday prove Frankovich’s hypothesis correct. But large, costly and time-consuming clinical trials are rarely carried out for uncommon complications of this sort. In the absence of such focused research, doctors and scientists are increasingly dipping into enormous troves of data that already exist — namely the aggregated medical records of thousands or even millions of patients to uncover patterns that might help steer care.
The Tatonetti Laboratory at Columbia University is a nexus in this search for signal in the noise. There, Nicholas Tatonetti, an assistant professor of biomedical informatics — an interdisciplinary field that combines computer science and medicine — develops algorithms to trawl medical databases and turn up correlations. For his doctoral thesis, he mined the F.D.A.’s records of adverse drug reactions to identify pairs of medications that seemed to cause problems when taken together. He found an interaction between two very commonly prescribed drugs: The antidepressant paroxetine (marketed as Paxil) and the cholesterol-lowering medication pravastatin were connected to higher blood-sugar levels. Taken individually, the drugs didn’t affect glucose levels. But taken together, the side-effect was impossible to ignore. “Nobody had ever thought to look for it,” Tatonetti says, “and so nobody had ever found it.”
The potential for this practice extends far beyond drug interactions. In the past, researchers noticed that being born in certain months or seasons appears to be linked to a higher risk of some diseases. In the Northern Hemisphere, people with multiple sclerosis tend to be born in the spring, while in the Southern Hemisphere they tend to be born in November; people with schizophrenia tend to have been born during the winter. There are numerous correlations like this, and the reasons for them are still foggy — a problem Tatonetti and a graduate assistant, Mary Boland, hope to solve by parsing the data on a vast array of outside factors. Tatonetti describes it as a quest to figure out “how these diseases could be dependent on birth month in a way that’s not just astrology.” Other researchers think data-mining might also be particularly beneficial for cancer patients, because so few types of cancer are represented in clinical trials.
As with so much network-enabled data-tinkering, this research is freighted with serious privacy concerns. If these analyses are considered part of treatment, hospitals may allow them on the grounds of doing what is best for a patient. But if they are considered medical research, then everyone whose records are being used must give permission. In practice, the distinction can be fuzzy and often depends on the culture of the institution. After Frankovich wrote about her experience in The New England Journal of Medicine in 2011, her hospital warned her not to conduct such analyses again until a proper framework for using patient information was in place.
In the lab, ensuring that the data-mining conclusions hold water can also be tricky. By definition, a medical-records database contains information only on sick people who sought help, so it is inherently incomplete. Also, they lack the controls of a clinical study and are full of other confounding factors that might trip up unwary researchers. Daniel Rubin, a professor of bioinformatics at Stanford, also warns that there have been no studies of data-driven medicine to determine whether it leads to positive outcomes more often than not. Because historical evidence is of “inferior quality,” he says, it has the potential to lead care astray.
Yet despite the pitfalls, developing a “learning health system” — one that can incorporate lessons from its own activities in real time — remains tantalizing to researchers. Stefan Thurner, a professor of complexity studies at the Medical University of Vienna, and his researcher, Peter Klimek, are working with a database of millions of people’s health-insurance claims, building networks of relationships among diseases. As they fill in the network with known connections and new ones mined from the data, Thurner and Klimek hope to be able to predict the health of individuals or of a population over time. On the clinical side, Longhurst has been advocating for a button in electronic medical-record software that would allow doctors to run automated searches for patients like theirs when no other sources of information are available.
With time, and with some crucial refinements, this kind of medicine may eventually become mainstream. Frankovich recalls a conversation with an older colleague. “She told me, ‘Research this decade benefits the next decade,’ ” Frankovich says. “That was how it was. But I feel like it doesn’t have to be that way anymore.”
Government by social network? US president Barack Obama with Facebook founder Mark Zuckerberg. Photograph: Mandel Ngan/AFP/Getty Images
On 24 August 1965 Gloria Placente, a 34-year-old resident of Queens, New York, was driving to Orchard Beach in the Bronx. Clad in shorts and sunglasses, the housewife was looking forward to quiet time at the beach. But the moment she crossed the Willis Avenue bridge in her Chevrolet Corvair, Placente was surrounded by a dozen patrolmen. There were also 125 reporters, eager to witness the launch of New York police department’s Operation Corral – an acronym for Computer Oriented Retrieval of Auto Larcenists.
Fifteen months earlier, Placente had driven through a red light and neglected to answer the summons, an offence that Corral was going to punish with a heavy dose of techno-Kafkaesque. It worked as follows: a police car stationed at one end of the bridge radioed the licence plates of oncoming cars to a teletypist miles away, who fed them to a Univac 490 computer, an expensive $500,000 toy ($3.5m in today’s dollars) on loan from the Sperry Rand Corporation. The computer checked the numbers against a database of 110,000 cars that were either stolen or belonged to known offenders. In case of a match the teletypist would alert a second patrol car at the bridge’s other exit. It took, on average, just seven seconds.
Compared with the impressive police gear of today – automatic number plate recognition, CCTV cameras, GPS trackers – Operation Corral looks quaint. And the possibilities for control will only expand. European officials have considered requiring all cars entering the European market to feature a built-in mechanism that allows the police to stop vehicles remotely. Speaking earlier this year, Jim Farley, a senior Ford executive, acknowledged that “we know everyone who breaks the law, we know when you’re doing it. We have GPS in your car, so we know what you’re doing. By the way, we don’t supply that data to anyone.” That last bit didn’t sound very reassuring and Farley retracted his remarks.
As both cars and roads get “smart,” they promise nearly perfect, real-time law enforcement. Instead of waiting for drivers to break the law, authorities can simply prevent the crime. Thus, a 50-mile stretch of the A14 between Felixstowe and Rugby is to be equipped with numerous sensors that would monitor traffic by sending signals to and from mobile phones in moving vehicles. The telecoms watchdog Ofcom envisionsthat such smart roads connected to a centrally controlled traffic system could automatically impose variable speed limits to smooth the flow of traffic but also direct the cars “along diverted routes to avoid the congestion and even [manage] their speed”.
Other gadgets – from smartphones to smart glasses – promise even more security and safety. In April, Apple patented technology that deploys sensors inside the smartphone to analyse if the car is moving and if the person using the phone is driving; if both conditions are met, it simply blocks the phone’s texting feature. Intel and Ford are working on Project Mobil – a face recognition system that, should it fail to recognise the face of the driver, would not only prevent the car being started but also send the picture to the car’s owner (bad news for teenagers).
The car is emblematic of transformations in many other domains, from smart environments for “ambient assisted living” where carpets and walls detect that someone has fallen, to various masterplans for the smart city, where municipal services dispatch resources only to those areas that need them. Thanks to sensors and internet connectivity, the most banal everyday objects have acquired tremendous power to regulate behaviour. Even public toilets are ripe for sensor-based optimisation: the Safeguard Germ Alarm, a smart soap dispenser developed by Procter & Gamble and used in some public WCs in the Philippines, has sensors monitoring the doors of each stall. Once you leave the stall, the alarm starts ringing – and can only be stopped by a push of the soap-dispensing button.
In this context, Google’s latest plan to push its Android operating system on to smart watches, smart cars, smart thermostats and, one suspects, smart everything, looks rather ominous. In the near future, Google will be the middleman standing between you and your fridge, you and your car, you and your rubbish bin, allowing the National Security Agency to satisfy its data addiction in bulk and via a single window.
This “smartification” of everyday life follows a familiar pattern: there’s primary data – a list of what’s in your smart fridge and your bin – and metadata – a log of how often you open either of these things or when they communicate with one another. Both produce interesting insights: cue smart mattresses – one recent model promises to track respiration and heart rates and how much you move during the night – and smart utensils that provide nutritional advice.
In addition to making our lives more efficient, this smart world also presents us with an exciting political choice. If so much of our everyday behaviour is already captured, analysed and nudged, why stick with unempirical approaches to regulation? Why rely on laws when one has sensors and feedback mechanisms? If policy interventions are to be – to use the buzzwords of the day – “evidence-based” and “results-oriented,” technology is here to help.
This new type of governance has a name: algorithmic regulation. In as much as Silicon Valley has a political programme, this is it. Tim O’Reilly, an influential technology publisher, venture capitalist and ideas man (he is to blame for popularising the term “web 2.0”) has been its most enthusiastic promoter. In a recent essay that lays out his reasoning, O’Reilly makes an intriguing case for the virtues of algorithmic regulation – a case that deserves close scrutiny both for what it promises policymakers and the simplistic assumptions it makes about politics, democracy and power.
To see algorithmic regulation at work, look no further than the spam filter in your email. Instead of confining itself to a narrow definition of spam, the email filter has its users teach it. Even Google can’t write rules to cover all the ingenious innovations of professional spammers. What it can do, though, is teach the system what makes a good rule and spot when it’s time to find another rule for finding a good rule – and so on. An algorithm can do this, but it’s the constant real-time feedback from its users that allows the system to counter threats never envisioned by its designers. And it’s not just spam: your bank uses similar methods to spot credit-card fraud.
In his essay, O’Reilly draws broader philosophical lessons from such technologies, arguing that they work because they rely on “a deep understanding of the desired outcome” (spam is bad!) and periodically check if the algorithms are actually working as expected (are too many legitimate emails ending up marked as spam?).
O’Reilly presents such technologies as novel and unique – we are living through a digital revolution after all – but the principle behind “algorithmic regulation” would be familiar to the founders of cybernetics – a discipline that, even in its name (it means “the science of governance”) hints at its great regulatory ambitions. This principle, which allows the system to maintain its stability by constantly learning and adapting itself to the changing circumstances, is what the British psychiatrist Ross Ashby, one of the founding fathers of cybernetics, called “ultrastability”.
To illustrate it, Ashby designed the homeostat. This clever device consisted of four interconnected RAF bomb control units – mysterious looking black boxes with lots of knobs and switches – that were sensitive to voltage fluctuations. If one unit stopped working properly – say, because of an unexpected external disturbance – the other three would rewire and regroup themselves, compensating for its malfunction and keeping the system’s overall output stable.
Ashby’s homeostat achieved “ultrastability” by always monitoring its internal state and cleverly redeploying its spare resources.
Like the spam filter, it didn’t have to specify all the possible disturbances – only the conditions for how and when it must be updated and redesigned. This is no trivial departure from how the usual technical systems, with their rigid, if-then rules, operate: suddenly, there’s no need to develop procedures for governing every contingency, for – or so one hopes – algorithms and real-time, immediate feedback can do a better job than inflexible rules out of touch with reality.
Algorithmic regulation could certainly make the administration of existing laws more efficient. If it can fight credit-card fraud, why not tax fraud? Italian bureaucrats have experimented with the redditometro, or income meter, a tool for comparing people’s spending patterns – recorded thanks to an arcane Italian law – with their declared income, so that authorities know when you spend more than you earn. Spain has expressed interest in a similar tool.
Such systems, however, are toothless against the real culprits of tax evasion – the super-rich families who profit from various offshoring schemes or simply write outrageous tax exemptions into the law. Algorithmic regulation is perfect for enforcing the austerity agenda while leaving those responsible for the fiscal crisis off the hook. To understand whether such systems are working as expected, we need to modify O’Reilly’s question: for whom are they working? If it’s just the tax-evading plutocrats, the global financial institutions interested in balanced national budgets and the companies developing income-tracking software, then it’s hardly a democratic success.
With his belief that algorithmic regulation is based on “a deep understanding of the desired outcome”, O’Reilly cunningly disconnects the means of doing politics from its ends. But the how of politics is as important as the what of politics – in fact, the former often shapes the latter. Everybody agrees that education, health, and security are all “desired outcomes”, but how do we achieve them? In the past, when we faced the stark political choice of delivering them through the market or the state, the lines of the ideological debate were clear. Today, when the presumed choice is between the digital and the analog or between the dynamic feedback and the static law, that ideological clarity is gone – as if the very choice of how to achieve those “desired outcomes” was apolitical and didn’t force us to choose between different and often incompatible visions of communal living.
By assuming that the utopian world of infinite feedback loops is so efficient that it transcends politics, the proponents of algorithmic regulation fall into the same trap as the technocrats of the past. Yes, these systems are terrifyingly efficient – in the same way that Singapore is terrifyingly efficient (O’Reilly, unsurprisingly, praises Singapore for its embrace of algorithmic regulation). And while Singapore’s leaders might believe that they, too, have transcended politics, it doesn’t mean that their regime cannot be assessed outside the linguistic swamp of efficiency and innovation – by using political, not economic benchmarks.
As Silicon Valley keeps corrupting our language with its endless glorification of disruption and efficiency – concepts at odds with the vocabulary of democracy – our ability to question the “how” of politics is weakened. Silicon Valley’s default answer to the how of politics is what I call solutionism: problems are to be dealt with via apps, sensors, and feedback loops – all provided by startups. Earlier this year Google’s Eric Schmidt even promised that startups would provide the solution to the problem of economic inequality: the latter, it seems, can also be “disrupted”. And where the innovators and the disruptors lead, the bureaucrats follow.
The intelligence services embraced solutionism before other government agencies. Thus, they reduced the topic of terrorism from a subject that had some connection to history and foreign policy to an informational problem of identifying emerging terrorist threats via constant surveillance. They urged citizens to accept that instability is part of the game, that its root causes are neither traceable nor reparable, that the threat can only be pre-empted by out-innovating and out-surveilling the enemy with better communications.
Speaking in Athens last November, the Italian philosopher Giorgio Agamben discussed an epochal transformation in the idea of government, “whereby the traditional hierarchical relation between causes and effects is inverted, so that, instead of governing the causes – a difficult and expensive undertaking – governments simply try to govern the effects”.
Governments’ current favourite pyschologist, Daniel Kahneman. Photograph: Richard Saker for the Observer
For Agamben, this shift is emblematic of modernity. It also explains why the liberalisation of the economy can co-exist with the growing proliferation of control – by means of soap dispensers and remotely managed cars – into everyday life. “If government aims for the effects and not the causes, it will be obliged to extend and multiply control. Causes demand to be known, while effects can only be checked and controlled.” Algorithmic regulation is an enactment of this political programme in technological form.
The true politics of algorithmic regulation become visible once its logic is applied to the social nets of the welfare state. There are no calls to dismantle them, but citizens are nonetheless encouraged to take responsibility for their own health. Consider how Fred Wilson, an influential US venture capitalist, frames the subject. “Health… is the opposite side of healthcare,” he said at a conference in Paris last December. “It’s what keeps you out of the healthcare system in the first place.” Thus, we are invited to start using self-tracking apps and data-sharing platforms and monitor our vital indicators, symptoms and discrepancies on our own.
This goes nicely with recent policy proposals to save troubled public services by encouraging healthier lifestyles. Consider a 2013 report by Westminster council and the Local Government Information Unit, a thinktank, calling for the linking of housing and council benefits to claimants’ visits to the gym – with the help of smartcards. They might not be needed: many smartphones are already tracking how many steps we take every day (Google Now, the company’s virtual assistant, keeps score of such data automatically and periodically presents it to users, nudging them to walk more).
The numerous possibilities that tracking devices offer to health and insurance industries are not lost on O’Reilly. “You know the way that advertising turned out to be the native business model for the internet?” he wondered at a recent conference. “I think that insurance is going to be the native business model for the internet of things.” Things do seem to be heading that way: in June, Microsoft struck a deal with American Family Insurance, the eighth-largest home insurer in the US, in which both companies will fund startups that want to put sensors into smart homes and smart cars for the purposes of “proactive protection”.
An insurance company would gladly subsidise the costs of installing yet another sensor in your house – as long as it can automatically alert the fire department or make front porch lights flash in case your smoke detector goes off. For now, accepting such tracking systems is framed as an extra benefit that can save us some money. But when do we reach a point where not using them is seen as a deviation – or, worse, an act of concealment – that ought to be punished with higher premiums?
Or consider a May 2014 report from 2020health, another thinktank, proposing to extend tax rebates to Britons who give up smoking, stay slim or drink less. “We propose ‘payment by results’, a financial reward for people who become active partners in their health, whereby if you, for example, keep your blood sugar levels down, quit smoking, keep weight off, [or] take on more self-care, there will be a tax rebate or an end-of-year bonus,” they state. Smart gadgets are the natural allies of such schemes: they document the results and can even help achieve them – by constantly nagging us to do what’s expected.
The unstated assumption of most such reports is that the unhealthy are not only a burden to society but that they deserve to be punished (fiscally for now) for failing to be responsible. For what else could possibly explain their health problems but their personal failings? It’s certainly not the power of food companies or class-based differences or various political and economic injustices. One can wear a dozen powerful sensors, own a smart mattress and even do a close daily reading of one’s poop – as some self-tracking aficionados are wont to do – but those injustices would still be nowhere to be seen, for they are not the kind of stuff that can be measured with a sensor. The devil doesn’t wear data. Social injustices are much harder to track than the everyday lives of the individuals whose lives they affect.
In shifting the focus of regulation from reining in institutional and corporate malfeasance to perpetual electronic guidance of individuals, algorithmic regulation offers us a good-old technocratic utopia of politics without politics. Disagreement and conflict, under this model, are seen as unfortunate byproducts of the analog era – to be solved through data collection – and not as inevitable results of economic or ideological conflicts.
However, a politics without politics does not mean a politics without control or administration. As O’Reilly writes in his essay: “New technologies make it possible to reduce the amount of regulation while actually increasing the amount of oversight and production of desirable outcomes.” Thus, it’s a mistake to think that Silicon Valley wants to rid us of government institutions. Its dream state is not the small government of libertarians – a small state, after all, needs neither fancy gadgets nor massive servers to process the data – but the data-obsessed and data-obese state of behavioural economists.
The nudging state is enamoured of feedback technology, for its key founding principle is that while we behave irrationally, our irrationality can be corrected – if only the environment acts upon us, nudging us towards the right option. Unsurprisingly, one of the three lonely references at the end of O’Reilly’s essay is to a 2012 speech entitled “Regulation: Looking Backward, Looking Forward” by Cass Sunstein, the prominent American legal scholar who is the chief theorist of the nudging state.
And while the nudgers have already captured the state by making behavioural psychology the favourite idiom of government bureaucracy –Daniel Kahneman is in, Machiavelli is out – the algorithmic regulation lobby advances in more clandestine ways. They create innocuous non-profit organisations like Code for America which then co-opt the state – under the guise of encouraging talented hackers to tackle civic problems.
Airbnb: part of the reputation-driven economy.
Such initiatives aim to reprogramme the state and make it feedback-friendly, crowding out other means of doing politics. For all those tracking apps, algorithms and sensors to work, databases need interoperability – which is what such pseudo-humanitarian organisations, with their ardent belief in open data, demand. And when the government is too slow to move at Silicon Valley’s speed, they simply move inside the government. Thus, Jennifer Pahlka, the founder of Code for America and a protege of O’Reilly, became the deputy chief technology officer of the US government – while pursuing a one-year “innovation fellowship” from the White House.
Cash-strapped governments welcome such colonisation by technologists – especially if it helps to identify and clean up datasets that can be profitably sold to companies who need such data for advertising purposes. Recent clashes over the sale of student and health data in the UK are just a precursor of battles to come: after all state assets have been privatised, data is the next target. For O’Reilly, open data is “a key enabler of the measurement revolution”.
This “measurement revolution” seeks to quantify the efficiency of various social programmes, as if the rationale behind the social nets that some of them provide was to achieve perfection of delivery. The actual rationale, of course, was to enable a fulfilling life by suppressing certain anxieties, so that citizens can pursue their life projects relatively undisturbed. This vision did spawn a vast bureaucratic apparatus and the critics of the welfare state from the left – most prominently Michel Foucault – were right to question its disciplining inclinations. Nonetheless, neither perfection nor efficiency were the “desired outcome” of this system. Thus, to compare the welfare state with the algorithmic state on those grounds is misleading.
But we can compare their respective visions for human fulfilment – and the role they assign to markets and the state. Silicon Valley’s offer is clear: thanks to ubiquitous feedback loops, we can all become entrepreneurs and take care of our own affairs! As Brian Chesky, the chief executive of Airbnb, told the Atlantic last year, “What happens when everybody is a brand? When everybody has a reputation? Every person can become an entrepreneur.”
Under this vision, we will all code (for America!) in the morning, driveUber cars in the afternoon, and rent out our kitchens as restaurants – courtesy of Airbnb – in the evening. As O’Reilly writes of Uber and similar companies, “these services ask every passenger to rate their driver (and drivers to rate their passenger). Drivers who provide poor service are eliminated. Reputation does a better job of ensuring a superb customer experience than any amount of government regulation.”
The state behind the “sharing economy” does not wither away; it might be needed to ensure that the reputation accumulated on Uber, Airbnb and other platforms of the “sharing economy” is fully liquid and transferable, creating a world where our every social interaction is recorded and assessed, erasing whatever differences exist between social domains. Someone, somewhere will eventually rate you as a passenger, a house guest, a student, a patient, a customer. Whether this ranking infrastructure will be decentralised, provided by a giant like Google or rest with the state is not yet clear but the overarching objective is: to make reputation into a feedback-friendly social net that could protect the truly responsible citizens from the vicissitudes of deregulation.
Admiring the reputation models of Uber and Airbnb, O’Reilly wants governments to be “adopting them where there are no demonstrable ill effects”. But what counts as an “ill effect” and how to demonstrate it is a key question that belongs to the how of politics that algorithmic regulation wants to suppress. It’s easy to demonstrate “ill effects” if the goal of regulation is efficiency but what if it is something else? Surely, there are some benefits – fewer visits to the psychoanalyst, perhaps – in not having your every social interaction ranked?
The imperative to evaluate and demonstrate “results” and “effects” already presupposes that the goal of policy is the optimisation of efficiency. However, as long as democracy is irreducible to a formula, its composite values will always lose this battle: they are much harder to quantify.
For Silicon Valley, though, the reputation-obsessed algorithmic state of the sharing economy is the new welfare state. If you are honest and hardworking, your online reputation would reflect this, producing a highly personalised social net. It is “ultrastable” in Ashby’s sense: while the welfare state assumes the existence of specific social evils it tries to fight, the algorithmic state makes no such assumptions. The future threats can remain fully unknowable and fully addressable – on the individual level.
Silicon Valley, of course, is not alone in touting such ultrastable individual solutions. Nassim Taleb, in his best-selling 2012 book Antifragile, makes a similar, if more philosophical, plea for maximising our individual resourcefulness and resilience: don’t get one job but many, don’t take on debt, count on your own expertise. It’s all about resilience, risk-taking and, as Taleb puts it, “having skin in the game”. As Julian Reid and Brad Evans write in their new book, Resilient Life: The Art of Living Dangerously, this growing cult of resilience masks a tacit acknowledgement that no collective project could even aspire to tame the proliferating threats to human existence – we can only hope to equip ourselves to tackle them individually. “When policy-makers engage in the discourse of resilience,” write Reid and Evans, “they do so in terms which aim explicitly at preventing humans from conceiving of danger as a phenomenon from which they might seek freedom and even, in contrast, as that to which they must now expose themselves.”
What, then, is the progressive alternative? “The enemy of my enemy is my friend” doesn’t work here: just because Silicon Valley is attacking the welfare state doesn’t mean that progressives should defend it to the very last bullet (or tweet). First, even leftist governments have limited space for fiscal manoeuvres, as the kind of discretionary spending required to modernise the welfare state would never be approved by the global financial markets. And it’s the ratings agencies and bond markets – not the voters – who are in charge today.
Second, the leftist critique of the welfare state has become only more relevant today when the exact borderlines between welfare and security are so blurry. When Google’s Android powers so much of our everyday life, the government’s temptation to govern us through remotely controlled cars and alarm-operated soap dispensers will be all too great. This will expand government’s hold over areas of life previously free from regulation.
With so much data, the government’s favourite argument in fighting terror – if only the citizens knew as much as we do, they too would impose all these legal exceptions – easily extends to other domains, from health to climate change. Consider a recent academic paper that used Google search data to study obesity patterns in the US, finding significant correlation between search keywords and body mass index levels. “Results suggest great promise of the idea of obesity monitoring through real-time Google Trends data”, note the authors, which would be “particularly attractive for government health institutions and private businesses such as insurance companies.”
If Google senses a flu epidemic somewhere, it’s hard to challenge its hunch – we simply lack the infrastructure to process so much data at this scale. Google can be proven wrong after the fact – as has recently been the case with its flu trends data, which was shown to overestimate the number of infections, possibly because of its failure to account for the intense media coverage of flu – but so is the case with most terrorist alerts. It’s the immediate, real-time nature of computer systems that makes them perfect allies of an infinitely expanding and pre-emption‑obsessed state.
Perhaps, the case of Gloria Placente and her failed trip to the beach was not just a historical oddity but an early omen of how real-time computing, combined with ubiquitous communication technologies, would transform the state. One of the few people to have heeded that omen was a little-known American advertising executive called Robert MacBride, who pushed the logic behind Operation Corral to its ultimate conclusions in his unjustly neglected 1967 book, The Automated State.
At the time, America was debating the merits of establishing a national data centre to aggregate various national statistics and make it available to government agencies. MacBride attacked his contemporaries’ inability to see how the state would exploit the metadata accrued as everything was being computerised. Instead of “a large scale, up-to-date Austro-Hungarian empire”, modern computer systems would produce “a bureaucracy of almost celestial capacity” that can “discern and define relationships in a manner which no human bureaucracy could ever hope to do”.
“Whether one bowls on a Sunday or visits a library instead is [of] no consequence since no one checks those things,” he wrote. Not so when computer systems can aggregate data from different domains and spot correlations. “Our individual behaviour in buying and selling an automobile, a house, or a security, in paying our debts and acquiring new ones, and in earning money and being paid, will be noted meticulously and studied exhaustively,” warned MacBride. Thus, a citizen will soon discover that “his choice of magazine subscriptions… can be found to indicate accurately the probability of his maintaining his property or his interest in the education of his children.” This sounds eerily similar to the recent case of a hapless father who found that his daughter was pregnant from a coupon that Target, a retailer, sent to their house. Target’s hunch was based on its analysis of products – for example, unscented lotion – usually bought by other pregnant women.
For MacBride the conclusion was obvious. “Political rights won’t be violated but will resemble those of a small stockholder in a giant enterprise,” he wrote. “The mark of sophistication and savoir-faire in this future will be the grace and flexibility with which one accepts one’s role and makes the most of what it offers.” In other words, since we are all entrepreneurs first – and citizens second, we might as well make the most of it.
What, then, is to be done? Technophobia is no solution. Progressives need technologies that would stick with the spirit, if not the institutional form, of the welfare state, preserving its commitment to creating ideal conditions for human flourishing. Even some ultrastability is welcome. Stability was a laudable goal of the welfare state before it had encountered a trap: in specifying the exact protections that the state was to offer against the excesses of capitalism, it could not easily deflect new, previously unspecified forms of exploitation.
How do we build welfarism that is both decentralised and ultrastable? A form of guaranteed basic income – whereby some welfare services are replaced by direct cash transfers to citizens – fits the two criteria.
Creating the right conditions for the emergence of political communities around causes and issues they deem relevant would be another good step. Full compliance with the principle of ultrastability dictates that such issues cannot be anticipated or dictated from above – by political parties or trade unions – and must be left unspecified.
What can be specified is the kind of communications infrastructure needed to abet this cause: it should be free to use, hard to track, and open to new, subversive uses. Silicon Valley’s existing infrastructure is great for fulfilling the needs of the state, not of self-organising citizens. It can, of course, be redeployed for activist causes – and it often is – but there’s no reason to accept the status quo as either ideal or inevitable.
Why, after all, appropriate what should belong to the people in the first place? While many of the creators of the internet bemoan how low their creature has fallen, their anger is misdirected. The fault is not with that amorphous entity but, first of all, with the absence of robust technology policy on the left – a policy that can counter the pro-innovation, pro-disruption, pro-privatisation agenda of Silicon Valley. In its absence, all these emerging political communities will operate with their wings clipped. Whether the next Occupy Wall Street would be able to occupy anything in a truly smart city remains to be seen: most likely, they would be out-censored and out-droned.
To his credit, MacBride understood all of this in 1967. “Given the resources of modern technology and planning techniques,” he warned, “it is really no great trick to transform even a country like ours into a smoothly running corporation where every detail of life is a mechanical function to be taken care of.” MacBride’s fear is O’Reilly’s master plan: the government, he writes, ought to be modelled on the “lean startup” approach of Silicon Valley, which is “using data to constantly revise and tune its approach to the market”. It’s this very approach that Facebook has recently deployed to maximise user engagement on the site: if showing users more happy stories does the trick, so be it.
Algorithmic regulation, whatever its immediate benefits, will give us a political regime where technology corporations and government bureaucrats call all the shots. The Polish science fiction writer Stanislaw Lem, in a pointed critique of cybernetics published, as it happens, roughly at the same time as The Automated State, put it best: “Society cannot give up the burden of having to decide about its own fate by sacrificing this freedom for the sake of the cybernetic regulator.”
When subatomic particles smash together at the Large Hadron Collider in Switzerland, they create showers of new particles whose signatures are recorded by four detectors. The LHC captures 5 trillion bits of data — more information than all of the world’s libraries combined — every second. After the judicious application of filtering algorithms, more than 99 percent of those data are discarded, but the four experiments still produce a whopping 25 petabytes (25×1015 bytes) of data per year that must be stored and analyzed. That is a scale far beyond the computing resources of any single facility, so the LHC scientists rely on a vast computing grid of 160 data centers around the world, a distributed network that is capable of transferring as much as 10 gigabytes per second at peak performance.
The LHC’s approach to its big data problem reflects just how dramatically the nature of computing has changed over the last decade. Since Intel co-founder Gordon E. Moore first defined it in 1965, the so-called Moore’s law — which predicts that the number of transistors on integrated circuits will double every two years — has dominated the computer industry. While that growth rate has proved remarkably resilient, for now, at least, “Moore’s law has basically crapped out; the transistors have gotten as small as people know how to make them economically with existing technologies,” said Scott Aaronson, a theoretical computer scientist at the Massachusetts Institute of Technology.
Instead, since 2005, many of the gains in computing power have come from adding more parallelism via multiple cores, with multiple levels of memory. The preferred architecture no longer features a single central processing unit (CPU) augmented with random access memory (RAM) and a hard drive for long-term storage. Even the big, centralized parallel supercomputers that dominated the 1980s and 1990s are giving way to distributed data centers and cloud computing, often networked across many organizations and vast geographical distances.
These days, “People talk about a computing fabric,” said Stanford University electrical engineerStephen Boyd. These changes in computer architecture translate into the need for a different computational approach when it comes to handling big data, which is not only grander in scope than the large data sets of yore but also intrinsically different from them.
The demand for ever-faster processors, while important, isn’t the primary focus anymore. “Processing speed has been completely irrelevant for five years,” Boyd said. “The challenge is not how to solve problems with a single, ultra-fast processor, but how to solve them with 100,000 slower processors.” Aaronson points out that many problems in big data can’t be adequately addressed by simply adding more parallel processing. These problems are “more sequential, where each step depends on the outcome of the preceding step,” he said. “Sometimes, you can split up the work among a bunch of processors, but other times, that’s harder to do.” And often the software isn’t written to take full advantage of the extra processors. “If you hire 20 people to do something, will it happen 20 times faster?” Aaronson said. “Usually not.”
Researchers also face challenges in integrating very differently structured data sets, as well as the difficulty of moving large amounts of data efficiently through a highly distributed network.
Those issues will become more pronounced as the size and complexity of data sets continue to grow faster than computing resources, according to California Institute of Technology physicist Harvey Newman, whose team developed the LHC’s grid of data centers and trans-Atlantic network. He estimates that if current trends hold, the computational needs of big data analysis will place considerable strain on the computing fabric. “It requires us to think about a different kind of system,” he said.
Memory and Movement
Emmanuel Candes, an applied mathematician at Stanford University, was once able to crunch big data problems on his desktop computer. But last year, when he joined a collaboration of radiologists developing dynamic magnetic resonance imaging — whereby one could record a patient’s heartbeat in real time using advanced algorithms to create high-resolution videos from limited MRI measurements — he found that the data no longer fit into his computer’s memory, making it difficult to perform the necessary analysis.
Addressing the storage-capacity challenges of big data is not simply a matter of building more memory, which has never been more plentiful. It is also about managing the movement of data. That’s because, increasingly, the desired data is no longer at people’s fingertips, stored in a single computer; it is distributed across multiple computers in a large data center or even in the “cloud.”There is a hierarchy to data storage, ranging from the slowest, cheapest and most abundant memory to the fastest and most expensive, with the least available space. At the bottom of this hierarchy is so-called “slow memory” such as hard drives and flash drives, the cost of which continues to drop. There is more space on hard drives, compared to the other kinds of memory, but saving and retrieving the data takes longer. Next up this ladder comes RAM, which is must faster than slow memory but offers less space is more expensive. Then there is cache memory — another trade-off of space and price in exchange for faster retrieval speeds — and finally the registers on the microchip itself, which are the fastest of all but the priciest to build, with the least available space. If memory storage were like real estate, a hard drive would be a sprawling upstate farm, RAM would be a medium-sized house in the suburbs, cache memory would be a townhouse on the outskirts of a big city, and the register memory would be a tiny studio in a prime urban location.
Longer commutes for stored data translate into processing delays. “When computers are slow today, it’s not because of the microprocessor,” Aaronson said. “The microprocessor is just treading water waiting for the disk to come back with the data.” Big data researchers prefer to minimize how much data must be moved back and forth from slow memory to fast memory. The problem is exacerbated when the data is distributed across a network or in the cloud, because it takes even longer to move the data back and forth, depending on bandwidth capacity, so that it can be analyzed.
One possible solution to this dilemma is to embrace the new paradigm. In addition to distributed storage, why not analyze the data in a distributed way as well, with each unit (or node) in a network of computers performing a small piece of a computation? Each partial solution is then integrated to find the full result. This approach is similar in concept to the LHC’s, in which one complete copy of the raw data (after filtering) is stored at the CERN research facility in Switzerland that is home to the collider. A second copy is divided into batches that are then distributed to data centers around the world. Each center analyzes its chunk of data and transmits the results to regional computers before moving on to the next batch.
Alon Halevy, a computer scientist at Google, says the biggest breakthroughs in big data are likely to come from data integration.Image: Peter DaSilva for Quanta Magazine
Boyd’s system is based on so-calledconsensus algorithms. “It’s a mathematical optimization problem,” he said of the algorithms. “You are using past data to train the model in hopes that it will work on future data.” Such algorithms are useful for creating an effective SPAM filter, for example, or for detecting fraudulent bank transactions.
This can be done on a single computer, with all the data in one place. Machine learning typically uses many processors, each handling a little bit of the problem. But when the problem becomes too large for a single machine, a consensus optimization approach might work better, in which the data set is chopped into bits and distributed across 1,000 “agents” that analyze their bit of data and each produce a model based on the data they have processed. The key is to require a critical condition to be met: although each agent’s model can be different, all the models must agree in the end — hence the term “consensus algorithms.”
The process by which 1,000 individual agents arrive at a consensus model is similar in concept to the Mechanical Turk crowd-sourcing methodology employed by Amazon — with a twist. With the Mechanical Turk, a person or a business can post a simple task, such as determining which photographs contain a cat, and ask the crowd to complete the task in exchange for gift certificates that can be redeemed for Amazon products, or for cash awards that can be transferred to a personal bank account. It may seem trivial to the human user, but the program learns from this feedback, aggregating all the individual responses into its working model, so it can make better predictions in the future.
In Boyd’s system, the process is iterative, creating a feedback loop. The initial consensus is shared with all the agents, which update their models in light of the new information and reach a second consensus, and so on. The process repeats until all the agents agree. Using this kind of distributed optimization approach significantly cuts down on how much data needs to be transferred at any one time.
The Quantum Question
Late one night, during a swanky Napa Valley conference last year, MIT physicist Seth Lloyd found himself soaking in a hot tub across from Google’s Sergey Brin and Larry Page — any aspiring technology entrepreneur’s dream scenario. Lloyd made his pitch, proposing a quantum version of Google’s search engine whereby users could make queries and receive results without Google knowing which questions were asked. The men were intrigued. But after conferring with their business manager the next day, Brin and Page informed Lloyd that his scheme went against their business plan. “They want to know everything about everybody who uses their products and services,” he joked.
It is easy to grasp why Google might be interested in a quantum computer capable of rapidly searching enormous data sets. A quantum computer, in principle, could offer enormous increases in processing power, running algorithms significantly faster than a classical (non-quantum) machine for certain problems. Indeed, the company just purchased a reportedly $15 million prototype from a Canadian firm called D-Wave Systems, although the jury is still out on whether D-Wave’s product is truly quantum.
“This is not about trying all the possible answers in parallel. It is fundamentally different from parallel processing,” said Aaronson. Whereas a classical computer stores information as bits that can be either 0s or 1s, a quantum computer could exploit an unusual property: the superposition of states. If you flip a regular coin, it will land on heads or tails. There is zero probability that it will be both heads and tails. But if it is a quantum coin, technically, it exists in an indeterminate state of both heads and tails until you look to see the outcome.
A true quantum computer could encode information in so-called qubits that can be 0 and 1 at the same time. Doing so could reduce the time required to solve a difficult problem that would otherwise take several years of computation to mere seconds. But that is easier said than done, not least because such a device would be highly sensitive to outside interference: The slightest perturbation would be equivalent to looking to see if the coin landed heads or tails, and thus undo the superposition.
Data from a seemingly simple query about coffee production across the globe can be surprisingly difficult to integrate. Image: Peter DaSilva for Quanta Magazine
However, Aaronson cautions against placing too much hope in quantum computing to solve big data’s computational challenges, insisting that if and when quantum computers become practical, they will be best suited to very specific tasks, most notably to simulate quantum mechanical systems or to factor large numbers to break codes in classical cryptography. Yet there is one way that quantum computing might be able to assist big data: by searching very large, unsorted data sets — for example, a phone directory in which the names are arranged randomly instead of alphabetically.
It is certainly possible to do so with sheer brute force, using a massively parallel computer to comb through every record. But a quantum computer could accomplish the task in a fraction of the time. That is the thinking behind Grover’s algorithm, which was devised by Bell Labs’ Lov Grover in 1996. However, “to really make it work, you’d need a quantum memory that can be accessed in a quantum superposition,” Aaronson said, but it would need to do so in such a way that the very act of accessing the memory didn’t destroy the superposition, “and that is tricky as hell.”
In short, you need quantum RAM (Q-RAM), and Lloyd has developed a conceptual prototype, along with an accompanying program he calls a Q-App (pronounced “quapp”) targeted to machine learning. He thinks his system could find patterns within data without actually looking at any individual records, thereby preserving the quantum superposition (and the users’ privacy). “You can effectively access all billion items in your database at the same time,” he explained, adding that “you’re not accessing any one of them, you’re accessing common features of all of them.”
For example, if there is ever a giant database storing the genome of every human being on Earth, “you could search for common patterns among different genes” using Lloyd’s quantum algorithm, with Q-RAM and a small 70-qubit quantum processor while still protecting the privacy of the population, Lloyd said. The person doing the search would have access to only a tiny fraction of the individual records, he said, and the search could be done in a short period of time. With the cost of sequencing human genomes dropping and commercial genotyping services rising, it is quite possible that such a database might one day exist, Lloyd said. It could be the ultimate big data set, considering that a single genome is equivalent to 6 billion bits.
Lloyd thinks quantum computing could work well for powerhouse machine-learning algorithms capable of spotting patterns in huge data sets — determining what clusters of data are associated with a keyword, for example, or what pieces of data are similar to one another in some way. “It turns out that many machine-learning algorithms actually work quite nicely in quantum computers, assuming you have a big enough Q-RAM,” he said. “These are exactly the kinds of mathematical problems people try to solve, and we think we could do very well with the quantum version of that.”
The Future Is Integration
“No matter how much you speed up the computers or the way you put computers together, the real issues are at the data level.”
Google’s Alon Halevy believes that the real breakthroughs in big data analysis are likely to come from integration — specifically, integrating across very different data sets. “No matter how much you speed up the computers or the way you put computers together, the real issues are at the data level,” he said. For example, a raw data set could include thousands of different tables scattered around the Web, each one listing crime rates in New York, but each may use different terminology and column headers, known as “schema.” A header of “New York” can describe the state, the five boroughs of New York City, or just Manhattan. You must understand the relationship between the schemas before the data in all those tables can be integrated.
That, in turn, requires breakthroughs in techniques to analyze the semantics of natural language. It is one of the toughest problems in artificial intelligence — if your machine-learning algorithm aspires to perfect understanding of nearly every word. But what if your algorithm needs to understand only enough of the surrounding text to determine whether, for example, a table includes data on coffee production in various countries so that it can then integrate the table with other, similar tables into one common data set? According to Halevy, a researcher could first use a coarse-grained algorithm to parse the underlying semantics of the data as best it could and then adopt a crowd-sourcing approach like a Mechanical Turk to refine the model further through human input. “The humans are training the system without realizing it, and then the system can answer many more questions based on what it has learned,” he said.
Chris Mattmann, a senior computer scientist at NASA’s Jet Propulsion Laboratory and director at theApache Software Foundation, faces just such a complicated scenario with a research project that seeks to integrate two different sources of climate information: remote-sensing observations of the Earth made by satellite instrumentation and computer-simulated climate model outputs. The Intergovernmental Panel on Climate Change would like to be able to compare the various climate models against the hard remote-sensing data to determine which models provide the best fit. But each of those sources stores data in different formats, and there are many different versions of those formats.
Many researchers emphasize the need to develop a broad spectrum of flexible tools that can deal with many different kinds of data. For example, many users are shifting from traditional highly structured relational databases, broadly known as SQL, which represent data in a conventional tabular format, to a more flexible format dubbed NoSQL. “It can be as structured or unstructured as you need it to be,” said Matt LeMay, a product and communications consultant and the former head of consumer products at URL shortening and bookmarking service Bitly, which uses both SQL and NoSQL formats for data storage, depending on the application.
Mattmann cites an Apache software program called Tika that allows the user to integrate data across 1,200 of the most common file formats. But in some cases, some human intervention is still required. Ultimately, Mattmann would like to fully automate this process via intelligent software that can integrate differently structured data sets, much like the Babel Fish in Douglas Adams’ “Hitchhiker’s Guide to the Galaxy” book series enabled someone to understand any language.
Integration across data sets will also require a well-coordinated distributed network system comparable to the one conceived of by Newman’s group at Caltech for the LHC, which monitors tens of thousands of processors and more than 10 major network links. Newman foresees a computational future for big data that relies on this type of automation through well-coordinated armies of intelligent agents, that track the movement of data from one point in the network to another, identifying bottlenecks and scheduling processing tasks. Each might only record what is happening locally but would share the information in such a way as to shed light on the network’s global situation.
“Thousands of agents at different levels are coordinating to help human beings understand what’s going on in a complex and very distributed system,” Newman said. The scale would be even greater in the future, when there would be billions of such intelligent agents, or actors, making up a vast global distributed intelligent entity. “It’s the ability to create those things and have them work on one’s behalf that will reduce the complexity of these operational problems,” he said. “At a certain point, when there’s a complicated problem in such a system, no set of human beings can really understand it all and have access to all the information.”
I have been asked by my superiors to give a brief demonstration of the surprising effectiveness of even the simplest techniques of the new-fangled Social Networke Analysis in the pursuit of those who would seek to undermine the liberty enjoyed by His Majesty’s subjects. This is in connection with the discussion of the role of “metadata” in certain recent events and the assurances of various respectable parties that the government was merely “sifting through this so-called metadata” and that the “information acquired does not include the content of any communications”. I will show how we can use this “metadata” to find key persons involved in terrorist groups operating within the Colonies at the present time. I shall also endeavour to show how these methods work in what might be called a relationalmanner.
The analysis in this report is based on information gathered by our field agent Mr David Hackett Fischer and published in an Appendix to his lengthy report to the government. As you may be aware, Mr Fischer is an expert and respected field Agent with a broad and deep knowledge of the colonies. I, on the other hand, have made my way from Ireland with just a little quantitative training—I placed several hundred rungs below the Senior Wrangler during my time at Cambridge—and I am presently employed as a junior analytical scribe at ye olde National Security Administration. Sorry, I mean the Royal Security Administration. And I should emphasize again that I know nothing of current affairs in the colonies. However, our current Eighteenth Century beta of PRISM has been used to collect and analyze information on more than two hundred and sixty persons (of varying degrees of suspicion) belonging variously to seven different organizations in the Boston area.
Rest assured that we only collected metadata on these people, and no actual conversations were recorded or meetings transcribed. All I know is whether someone was a member of an organization or not. Surely this is but a small encroachment on the freedom of the Crown’s subjects. I have been asked, on the basis of this poor information, to present some names for our field agents in the Colonies to work with. It seems an unlikely task.
The organizations are listed in the columns, and the names in the rows. As you can see, membership is represented by a “1”. So this Samuel Adams person (whoever he is), belongs to the North Caucus, the Long Room Club, the Boston Committee, and the London Enemies List. I must say, these organizational names sound rather belligerent.
Anyway, what can get from these meagre metadata? This table is large and cumbersome. I am a pretty low-level operative at ye olde RSA, so I have to keep it simple. My superiors, I am quite sure, have far more sophisticated analytical techniques at their disposal. I will simply start at the very beginning and follow a technique laid out in a beautiful paper by my brilliant former colleague, Mr Ron Breiger, called ”The Duality of Persons and Groups.” He wrote it as a graduate student at Harvard, some thirty five years ago. (Harvard, you may recall, is what passes for a university in the Colonies. No matter.) The paper describes what we now think of as a basic way to represent information about links between people and some other kind of thing, like attendance at various events, or membership in various groups. The foundational papers in this new science of social networke analysis, in fact, are almost all about what you can tell about people and their social lives based on metadata only, without much reference to the actual content of what they say.
Mr Breiger’s insight was that our table of 254 rows and seven columns is an adjacency matrix, and that a bit of matrix multiplication can bring out information that is in the table but perhaps hard to see. Take this adjacency matrix of people and groups and transpose it—that is, flip it over on its side, so that the rows are now the columns andvice versa. Now we have two tables, or matrices, a 254×7 one showing “People by Groups” and the other a 7×254 one showing “Groups by People”. Call the first one the adjacency matrix A and the second one its transpose, AT. Now, as you will recall there are rules for multiplying matrices together. If you multiply out A(AT), you will get a big matrix with 254 rows and 254 columns. That is, it will be a 254×254 “Person by Person” matrix, where both the rows and columns are people (in the same order) and the cells show the number of organizations any particular pair of people both belonged to. Is that not marvelous? I have always thought this operation is somewhat akin to magick, especially as it involves moving one hand down and the other one across in a manner not wholly removed from an incantation.
I cannot show you the whole Person by Person matrix, because I would have to kill you. I jest, I jest! It is just because it is rather large. But here is a little snippet of it. At this point in the eighteenth century, a 254×254 matrix is what we call ”Bigge Data”. I have an upcoming EDWARDx talk about it. You should come. Anyway:
You can see here that Mr Appleton and Mr John Adams were connected through both being a member of one group, while Mr John Adams and Mr Samuel Adams shared memberships in two of our seven groups. Mr Ash, meanwhile, was not connected through organization membership to any of the first four men on our list. The rest of the table stretches out in both directions.
Notice again, I beg you, what we did there. We did not start with a “social networke” as you might ordinarily think of it, where individuals are connected to other individuals. We started with a list of memberships in various organizations. But now suddenly we do have a social networke of individuals, where a tie is defined by co-membership in an organization. This is a powerful trick.
We are just getting started, however. A thing about multiplying matrices is that the order matters. It is not like multiplying two numbers. If instead of multiplying A(AT) we put the transposed matrix first, and do AT(A), then we get a different result. This time, the result is a 7×7 “Organization by Organization” matrix, where the numbers in the cells represent how many people each organization has in common. Here’s what that looks like. Because it is small we can see the whole table.
Again, interesting! (I beg to venture.) Instead of seeing how (and which) people are linked by their shared membership in organizations, we see which organizations are linked through the people that belong to them both. People are linked through the groups they belong to. Groups are linked through the people they share. This is the “duality of persons and groups” in the title of Mr Breiger’s article.
Rather than relying on tables, we can make a picture of the relationship between the groups, using the number of shared members as an index of the strength of the link between the seditious groups. Here’s what that looks like.
And, of course, we can also do that for the links between the people, using our 254×254 “Person by Person” table. Here is what that looks like.
What a nice picture! The analytical engine has arranged everyone neatly, picking out clusters of individuals and also showing both peripheral individuals and—more intriguingly—people who seem to bridge various groups in ways that might perhaps be relevant to national security. Look at that person right in the middle there. Zoom in if you wish. He seems to bridge several groups in an unusual (though perhaps not unique) way. His name is Paul Revere.
Once again, I remind you that I know nothing of Mr Revere, or his conversations, or his habits or beliefs, his writings (if he has any) or his personal life. All I know is this bit of metadata, based on membership in some organizations. And yet my analytical engine, on the basis of absolutely the most elementary of operations in Social Networke Analysis, seems to have picked him out of our 254 names as being of unusual interest. We do not have to stop here, with just a picture. Now that we have used our simple “Person by Event” table to generate a “Person by Person” matrix, we can do things like calculate centrality scores, or figure out whether there are cliques, or investigate other patterns. For example, we could calculate a betweenness centrality measure for everyone in our matrix, which is roughly the number of “shortest paths” between any two people in our network that pass through the person of interest. It is a way of asking “If I have to get from person a to person z, how likely is it that the quickest way is through person x?” Here are the top betweenness scores for our list of suspected terrorists:
Perhaps I should not say “terrorists” so rashly. But you can see how tempting it is. Anyway, look—there he is again, this Mr Revere! Very interesting. There are fancier ways to measure importance in a network besides this one. There is something called eigenvector centrality, which my friends in Natural Philosophy tell me is a bit of mathematics unlikely ever to have any practical application in the wider world. You can think of it as a measure of centrality weighted by one’s connection to other central people. Here are our top scorers on that measure:
Here our Mr Revere appears to score highly alongside a few other persons of interest. And for one last demonstration, a calculation of Bonacich Power Centrality, another more sophisticated measure. Here the lower score indicates a more central location.
And here again, Mr Revere—along with Messrs Urann, Proctor, and Barber—appears towards the top or our list.
So, there you have it. From a table of membership in different groups we have gotten a picture of a kind of social network between individuals, a sense of the degree of connection between organizations, and some strong hints of who the key players are in this world. And all this—all of it!—from the merest sliver of metadata about a single modality of relationship between people. I do not wish to overstep the remit of my memorandum but I must ask you to imagine what might be possible if we were but able to collect information on very many more people, and also synthesizeinformation from different kinds of ties between people! For the simple methods I have described are quite generalizable in these ways, and their capability only becomes more apparent as the size and scope of the information they are given increases. We would not need to know what was being whispered between individuals, only that they were connected in various ways. The analytical engine would do the rest! I daresay the shape of the real structure of social relations would emerge from our calculations gradually, first in outline only, but eventually with ever-increasing clarity and, at last, in beautiful detail—like a great, silent ship coming out of the gray New England fog.
I admit that, in addition to the possibilities for finding something interesting, there may also be the prospect of discovering suggestive but ultimately incorrect or misleading patterns. But I feel this problem would surely be greatly ameliorated by more and better metadata. At the present time, alas, the technology required to automatically collect the required information is beyond our capacity. But I say again, if a mere scribe such as I—one who knows nearly nothing—can use the very simplest of these methods to pick the name of a traitor like Paul Revere from those of two hundred and fifty four other men, using nothing but a list of memberships and a portable calculating engine, then just think what weapons we might wield in the defense of liberty one or two centuries from now.
Note: After I posted this, Michael Chwe emailed to tell me that Shin-Kap Han has published an article analyzing Fischer’s Revere data in rather more detail. I first came across Fischer’s data when I read Paul Revere’s Ride some years ago. I transcribed it and worked on it a little (making the graphs shown here) when I was asked to give a presentation on the usefulness of Sociological methods to graduate students in Duke’s History department. It’s very nice to see Han’s much fuller published analysis, as he’s an SNA specialist, unlike me.