Arquivo da tag: Big data

Can Big Data Tell Us What Clinical Trials Don’t? (New York Times)

CreditIllustration by Christopher Brand

When a helicopter rushed a 13-year-old girl showing symptoms suggestive of kidney failure to Stanford’s Packard Children’s Hospital, Jennifer Frankovich was the rheumatologist on call. She and a team of other doctors quickly diagnosed lupus, an autoimmune disease. But as they hurried to treat the girl, Frankovich thought that something about the patient’s particular combination of lupus symptoms — kidney problems, inflamed pancreas and blood vessels — rang a bell. In the past, she’d seen lupus patients with these symptoms develop life-threatening blood clots. Her colleagues in other specialties didn’t think there was cause to give the girl anti-clotting drugs, so Frankovich deferred to them. But she retained her suspicions. “I could not forget these cases,” she says.

Back in her office, she found that the scientific literature had no studies on patients like this to guide her. So she did something unusual: She searched a database of all the lupus patients the hospital had seen over the previous five years, singling out those whose symptoms matched her patient’s, and ran an analysis to see whether they had developed blood clots. “I did some very simple statistics and brought the data to everybody that I had met with that morning,” she says. The change in attitude was striking. “It was very clear, based on the database, that she could be at an increased risk for a clot.”

The girl was given the drug, and she did not develop a clot. “At the end of the day, we don’t know whether it was the right decision,” says Chris Longhurst, a pediatrician and the chief medical information officer at Stanford Children’s Health, who is a colleague of Frankovich’s. But they felt that it was the best they could do with the limited information they had.

A large, costly and time-consuming clinical trial with proper controls might someday prove Frankovich’s hypothesis correct. But large, costly and time-consuming clinical trials are rarely carried out for uncommon complications of this sort. In the absence of such focused research, doctors and scientists are increasingly dipping into enormous troves of data that already exist — namely the aggregated medical records of thousands or even millions of patients to uncover patterns that might help steer care.

The Tatonetti Laboratory at Columbia University is a nexus in this search for signal in the noise. There, Nicholas Tatonetti, an assistant professor of biomedical informatics — an interdisciplinary field that combines computer science and medicine — develops algorithms to trawl medical databases and turn up correlations. For his doctoral thesis, he mined the F.D.A.’s records of adverse drug reactions to identify pairs of medications that seemed to cause problems when taken together. He found an interaction between two very commonly prescribed drugs: The antidepressant paroxetine (marketed as Paxil) and the cholesterol-lowering medication pravastatin were connected to higher blood-sugar levels. Taken individually, the drugs didn’t affect glucose levels. But taken together, the side-effect was impossible to ignore. “Nobody had ever thought to look for it,” Tatonetti says, “and so nobody had ever found it.”

The potential for this practice extends far beyond drug interactions. In the past, researchers noticed that being born in certain months or seasons appears to be linked to a higher risk of some diseases. In the Northern Hemisphere, people with multiple sclerosis tend to be born in the spring, while in the Southern Hemisphere they tend to be born in November; people with schizophrenia tend to have been born during the winter. There are numerous correlations like this, and the reasons for them are still foggy — a problem Tatonetti and a graduate assistant, Mary Boland, hope to solve by parsing the data on a vast array of outside factors. Tatonetti describes it as a quest to figure out “how these diseases could be dependent on birth month in a way that’s not just astrology.” Other researchers think data-mining might also be particularly beneficial for cancer patients, because so few types of cancer are represented in clinical trials.

As with so much network-enabled data-tinkering, this research is freighted with serious privacy concerns. If these analyses are considered part of treatment, hospitals may allow them on the grounds of doing what is best for a patient. But if they are considered medical research, then everyone whose records are being used must give permission. In practice, the distinction can be fuzzy and often depends on the culture of the institution. After Frankovich wrote about her experience in The New England Journal of Medicine in 2011, her hospital warned her not to conduct such analyses again until a proper framework for using patient information was in place.

In the lab, ensuring that the data-mining conclusions hold water can also be tricky. By definition, a medical-records database contains information only on sick people who sought help, so it is inherently incomplete. Also, they lack the controls of a clinical study and are full of other confounding factors that might trip up unwary researchers. Daniel Rubin, a professor of bioinformatics at Stanford, also warns that there have been no studies of data-driven medicine to determine whether it leads to positive outcomes more often than not. Because historical evidence is of “inferior quality,” he says, it has the potential to lead care astray.

Yet despite the pitfalls, developing a “learning health system” — one that can incorporate lessons from its own activities in real time — remains tantalizing to researchers. Stefan Thurner, a professor of complexity studies at the Medical University of Vienna, and his researcher, Peter Klimek, are working with a database of millions of people’s health-insurance claims, building networks of relationships among diseases. As they fill in the network with known connections and new ones mined from the data, Thurner and Klimek hope to be able to predict the health of individuals or of a population over time. On the clinical side, Longhurst has been advocating for a button in electronic medical-record software that would allow doctors to run automated searches for patients like theirs when no other sources of information are available.

With time, and with some crucial refinements, this kind of medicine may eventually become mainstream. Frankovich recalls a conversation with an older colleague. “She told me, ‘Research this decade benefits the next decade,’ ” Frankovich says. “That was how it was. But I feel like it doesn’t have to be that way anymore.”

Anúncios

The rise of data and the death of politics (The Guardian)

Tech pioneers in the US are advocating a new data-based approach to governance – ‘algorithmic regulation’. But if technology provides the answers to society’s problems, what happens to governments?

The Observer, Sunday 20 July 2014

US president Barack Obama with Facebook founder Mark Zuckerberg

Government by social network? US president Barack Obama with Facebook founder Mark Zuckerberg. Photograph: Mandel Ngan/AFP/Getty Images

On 24 August 1965 Gloria Placente, a 34-year-old resident of Queens, New York, was driving to Orchard Beach in the Bronx. Clad in shorts and sunglasses, the housewife was looking forward to quiet time at the beach. But the moment she crossed the Willis Avenue bridge in her Chevrolet Corvair, Placente was surrounded by a dozen patrolmen. There were also 125 reporters, eager to witness the launch of New York police department’s Operation Corral – an acronym for Computer Oriented Retrieval of Auto Larcenists.

Fifteen months earlier, Placente had driven through a red light and neglected to answer the summons, an offence that Corral was going to punish with a heavy dose of techno-Kafkaesque. It worked as follows: a police car stationed at one end of the bridge radioed the licence plates of oncoming cars to a teletypist miles away, who fed them to a Univac 490 computer, an expensive $500,000 toy ($3.5m in today’s dollars) on loan from the Sperry Rand Corporation. The computer checked the numbers against a database of 110,000 cars that were either stolen or belonged to known offenders. In case of a match the teletypist would alert a second patrol car at the bridge’s other exit. It took, on average, just seven seconds.

Compared with the impressive police gear of today – automatic number plate recognition, CCTV cameras, GPS trackers – Operation Corral looks quaint. And the possibilities for control will only expand. European officials have considered requiring all cars entering the European market to feature a built-in mechanism that allows the police to stop vehicles remotely. Speaking earlier this year, Jim Farley, a senior Ford executive, acknowledged that “we know everyone who breaks the law, we know when you’re doing it. We have GPS in your car, so we know what you’re doing. By the way, we don’t supply that data to anyone.” That last bit didn’t sound very reassuring and Farley retracted his remarks.

As both cars and roads get “smart,” they promise nearly perfect, real-time law enforcement. Instead of waiting for drivers to break the law, authorities can simply prevent the crime. Thus, a 50-mile stretch of the A14 between Felixstowe and Rugby is to be equipped with numerous sensors that would monitor traffic by sending signals to and from mobile phones in moving vehicles. The telecoms watchdog Ofcom envisionsthat such smart roads connected to a centrally controlled traffic system could automatically impose variable speed limits to smooth the flow of traffic but also direct the cars “along diverted routes to avoid the congestion and even [manage] their speed”.

Other gadgets – from smartphones to smart glasses – promise even more security and safety. In April, Apple patented technology that deploys sensors inside the smartphone to analyse if the car is moving and if the person using the phone is driving; if both conditions are met, it simply blocks the phone’s texting feature. Intel and Ford are working on Project Mobil – a face recognition system that, should it fail to recognise the face of the driver, would not only prevent the car being started but also send the picture to the car’s owner (bad news for teenagers).

The car is emblematic of transformations in many other domains, from smart environments for “ambient assisted living” where carpets and walls detect that someone has fallen, to various masterplans for the smart city, where municipal services dispatch resources only to those areas that need them. Thanks to sensors and internet connectivity, the most banal everyday objects have acquired tremendous power to regulate behaviour. Even public toilets are ripe for sensor-based optimisation: the Safeguard Germ Alarm, a smart soap dispenser developed by Procter & Gamble and used in some public WCs in the Philippines, has sensors monitoring the doors of each stall. Once you leave the stall, the alarm starts ringing – and can only be stopped by a push of the soap-dispensing button.

In this context, Google’s latest plan to push its Android operating system on to smart watches, smart cars, smart thermostats and, one suspects, smart everything, looks rather ominous. In the near future, Google will be the middleman standing between you and your fridge, you and your car, you and your rubbish bin, allowing the National Security Agency to satisfy its data addiction in bulk and via a single window.

This “smartification” of everyday life follows a familiar pattern: there’s primary data – a list of what’s in your smart fridge and your bin – and metadata – a log of how often you open either of these things or when they communicate with one another. Both produce interesting insights: cue smart mattresses – one recent model promises to track respiration and heart rates and how much you move during the night – and smart utensils that provide nutritional advice.

In addition to making our lives more efficient, this smart world also presents us with an exciting political choice. If so much of our everyday behaviour is already captured, analysed and nudged, why stick with unempirical approaches to regulation? Why rely on laws when one has sensors and feedback mechanisms? If policy interventions are to be – to use the buzzwords of the day – “evidence-based” and “results-oriented,” technology is here to help.

This new type of governance has a name: algorithmic regulation. In as much as Silicon Valley has a political programme, this is it. Tim O’Reilly, an influential technology publisher, venture capitalist and ideas man (he is to blame for popularising the term “web 2.0”) has been its most enthusiastic promoter. In a recent essay that lays out his reasoning, O’Reilly makes an intriguing case for the virtues of algorithmic regulation – a case that deserves close scrutiny both for what it promises policymakers and the simplistic assumptions it makes about politics, democracy and power.

To see algorithmic regulation at work, look no further than the spam filter in your email. Instead of confining itself to a narrow definition of spam, the email filter has its users teach it. Even Google can’t write rules to cover all the ingenious innovations of professional spammers. What it can do, though, is teach the system what makes a good rule and spot when it’s time to find another rule for finding a good rule – and so on. An algorithm can do this, but it’s the constant real-time feedback from its users that allows the system to counter threats never envisioned by its designers. And it’s not just spam: your bank uses similar methods to spot credit-card fraud.

In his essay, O’Reilly draws broader philosophical lessons from such technologies, arguing that they work because they rely on “a deep understanding of the desired outcome” (spam is bad!) and periodically check if the algorithms are actually working as expected (are too many legitimate emails ending up marked as spam?).

O’Reilly presents such technologies as novel and unique – we are living through a digital revolution after all – but the principle behind “algorithmic regulation” would be familiar to the founders of cybernetics – a discipline that, even in its name (it means “the science of governance”) hints at its great regulatory ambitions. This principle, which allows the system to maintain its stability by constantly learning and adapting itself to the changing circumstances, is what the British psychiatrist Ross Ashby, one of the founding fathers of cybernetics, called “ultrastability”.

To illustrate it, Ashby designed the homeostat. This clever device consisted of four interconnected RAF bomb control units – mysterious looking black boxes with lots of knobs and switches – that were sensitive to voltage fluctuations. If one unit stopped working properly – say, because of an unexpected external disturbance – the other three would rewire and regroup themselves, compensating for its malfunction and keeping the system’s overall output stable.

Ashby’s homeostat achieved “ultrastability” by always monitoring its internal state and cleverly redeploying its spare resources.

Like the spam filter, it didn’t have to specify all the possible disturbances – only the conditions for how and when it must be updated and redesigned. This is no trivial departure from how the usual technical systems, with their rigid, if-then rules, operate: suddenly, there’s no need to develop procedures for governing every contingency, for – or so one hopes – algorithms and real-time, immediate feedback can do a better job than inflexible rules out of touch with reality.

Algorithmic regulation could certainly make the administration of existing laws more efficient. If it can fight credit-card fraud, why not tax fraud? Italian bureaucrats have experimented with the redditometro, or income meter, a tool for comparing people’s spending patterns – recorded thanks to an arcane Italian law – with their declared income, so that authorities know when you spend more than you earn. Spain has expressed interest in a similar tool.

Such systems, however, are toothless against the real culprits of tax evasion – the super-rich families who profit from various offshoring schemes or simply write outrageous tax exemptions into the law. Algorithmic regulation is perfect for enforcing the austerity agenda while leaving those responsible for the fiscal crisis off the hook. To understand whether such systems are working as expected, we need to modify O’Reilly’s question: for whom are they working? If it’s just the tax-evading plutocrats, the global financial institutions interested in balanced national budgets and the companies developing income-tracking software, then it’s hardly a democratic success.

With his belief that algorithmic regulation is based on “a deep understanding of the desired outcome”, O’Reilly cunningly disconnects the means of doing politics from its ends. But the how of politics is as important as the what of politics – in fact, the former often shapes the latter. Everybody agrees that education, health, and security are all “desired outcomes”, but how do we achieve them? In the past, when we faced the stark political choice of delivering them through the market or the state, the lines of the ideological debate were clear. Today, when the presumed choice is between the digital and the analog or between the dynamic feedback and the static law, that ideological clarity is gone – as if the very choice of how to achieve those “desired outcomes” was apolitical and didn’t force us to choose between different and often incompatible visions of communal living.

By assuming that the utopian world of infinite feedback loops is so efficient that it transcends politics, the proponents of algorithmic regulation fall into the same trap as the technocrats of the past. Yes, these systems are terrifyingly efficient – in the same way that Singapore is terrifyingly efficient (O’Reilly, unsurprisingly, praises Singapore for its embrace of algorithmic regulation). And while Singapore’s leaders might believe that they, too, have transcended politics, it doesn’t mean that their regime cannot be assessed outside the linguistic swamp of efficiency and innovation – by using political, not economic benchmarks.

As Silicon Valley keeps corrupting our language with its endless glorification of disruption and efficiency – concepts at odds with the vocabulary of democracy – our ability to question the “how” of politics is weakened. Silicon Valley’s default answer to the how of politics is what I call solutionism: problems are to be dealt with via apps, sensors, and feedback loops – all provided by startups. Earlier this year Google’s Eric Schmidt even promised that startups would provide the solution to the problem of economic inequality: the latter, it seems, can also be “disrupted”. And where the innovators and the disruptors lead, the bureaucrats follow.

The intelligence services embraced solutionism before other government agencies. Thus, they reduced the topic of terrorism from a subject that had some connection to history and foreign policy to an informational problem of identifying emerging terrorist threats via constant surveillance. They urged citizens to accept that instability is part of the game, that its root causes are neither traceable nor reparable, that the threat can only be pre-empted by out-innovating and out-surveilling the enemy with better communications.

Speaking in Athens last November, the Italian philosopher Giorgio Agamben discussed an epochal transformation in the idea of government, “whereby the traditional hierarchical relation between causes and effects is inverted, so that, instead of governing the causes – a difficult and expensive undertaking – governments simply try to govern the effects”.

Nobel laureate Daniel Kahneman

Governments’ current favourite pyschologist, Daniel Kahneman. Photograph: Richard Saker for the Observer

For Agamben, this shift is emblematic of modernity. It also explains why the liberalisation of the economy can co-exist with the growing proliferation of control – by means of soap dispensers and remotely managed cars – into everyday life. “If government aims for the effects and not the causes, it will be obliged to extend and multiply control. Causes demand to be known, while effects can only be checked and controlled.” Algorithmic regulation is an enactment of this political programme in technological form.

The true politics of algorithmic regulation become visible once its logic is applied to the social nets of the welfare state. There are no calls to dismantle them, but citizens are nonetheless encouraged to take responsibility for their own health. Consider how Fred Wilson, an influential US venture capitalist, frames the subject. “Health… is the opposite side of healthcare,” he said at a conference in Paris last December. “It’s what keeps you out of the healthcare system in the first place.” Thus, we are invited to start using self-tracking apps and data-sharing platforms and monitor our vital indicators, symptoms and discrepancies on our own.

This goes nicely with recent policy proposals to save troubled public services by encouraging healthier lifestyles. Consider a 2013 report by Westminster council and the Local Government Information Unit, a thinktank, calling for the linking of housing and council benefits to claimants’ visits to the gym – with the help of smartcards. They might not be needed: many smartphones are already tracking how many steps we take every day (Google Now, the company’s virtual assistant, keeps score of such data automatically and periodically presents it to users, nudging them to walk more).

The numerous possibilities that tracking devices offer to health and insurance industries are not lost on O’Reilly. “You know the way that advertising turned out to be the native business model for the internet?” he wondered at a recent conference. “I think that insurance is going to be the native business model for the internet of things.” Things do seem to be heading that way: in June, Microsoft struck a deal with American Family Insurance, the eighth-largest home insurer in the US, in which both companies will fund startups that want to put sensors into smart homes and smart cars for the purposes of “proactive protection”.

An insurance company would gladly subsidise the costs of installing yet another sensor in your house – as long as it can automatically alert the fire department or make front porch lights flash in case your smoke detector goes off. For now, accepting such tracking systems is framed as an extra benefit that can save us some money. But when do we reach a point where not using them is seen as a deviation – or, worse, an act of concealment – that ought to be punished with higher premiums?

Or consider a May 2014 report from 2020health, another thinktank, proposing to extend tax rebates to Britons who give up smoking, stay slim or drink less. “We propose ‘payment by results’, a financial reward for people who become active partners in their health, whereby if you, for example, keep your blood sugar levels down, quit smoking, keep weight off, [or] take on more self-care, there will be a tax rebate or an end-of-year bonus,” they state. Smart gadgets are the natural allies of such schemes: they document the results and can even help achieve them – by constantly nagging us to do what’s expected.

The unstated assumption of most such reports is that the unhealthy are not only a burden to society but that they deserve to be punished (fiscally for now) for failing to be responsible. For what else could possibly explain their health problems but their personal failings? It’s certainly not the power of food companies or class-based differences or various political and economic injustices. One can wear a dozen powerful sensors, own a smart mattress and even do a close daily reading of one’s poop – as some self-tracking aficionados are wont to do – but those injustices would still be nowhere to be seen, for they are not the kind of stuff that can be measured with a sensor. The devil doesn’t wear data. Social injustices are much harder to track than the everyday lives of the individuals whose lives they affect.

In shifting the focus of regulation from reining in institutional and corporate malfeasance to perpetual electronic guidance of individuals, algorithmic regulation offers us a good-old technocratic utopia of politics without politics. Disagreement and conflict, under this model, are seen as unfortunate byproducts of the analog era – to be solved through data collection – and not as inevitable results of economic or ideological conflicts.

However, a politics without politics does not mean a politics without control or administration. As O’Reilly writes in his essay: “New technologies make it possible to reduce the amount of regulation while actually increasing the amount of oversight and production of desirable outcomes.” Thus, it’s a mistake to think that Silicon Valley wants to rid us of government institutions. Its dream state is not the small government of libertarians – a small state, after all, needs neither fancy gadgets nor massive servers to process the data – but the data-obsessed and data-obese state of behavioural economists.

The nudging state is enamoured of feedback technology, for its key founding principle is that while we behave irrationally, our irrationality can be corrected – if only the environment acts upon us, nudging us towards the right option. Unsurprisingly, one of the three lonely references at the end of O’Reilly’s essay is to a 2012 speech entitled “Regulation: Looking Backward, Looking Forward” by Cass Sunstein, the prominent American legal scholar who is the chief theorist of the nudging state.

And while the nudgers have already captured the state by making behavioural psychology the favourite idiom of government bureaucracy –Daniel Kahneman is in, Machiavelli is out – the algorithmic regulation lobby advances in more clandestine ways. They create innocuous non-profit organisations like Code for America which then co-opt the state – under the guise of encouraging talented hackers to tackle civic problems.

Airbnb's homepage.

Airbnb: part of the reputation-driven economy.

Such initiatives aim to reprogramme the state and make it feedback-friendly, crowding out other means of doing politics. For all those tracking apps, algorithms and sensors to work, databases need interoperability – which is what such pseudo-humanitarian organisations, with their ardent belief in open data, demand. And when the government is too slow to move at Silicon Valley’s speed, they simply move inside the government. Thus, Jennifer Pahlka, the founder of Code for America and a protege of O’Reilly, became the deputy chief technology officer of the US government – while pursuing a one-year “innovation fellowship” from the White House.

Cash-strapped governments welcome such colonisation by technologists – especially if it helps to identify and clean up datasets that can be profitably sold to companies who need such data for advertising purposes. Recent clashes over the sale of student and health data in the UK are just a precursor of battles to come: after all state assets have been privatised, data is the next target. For O’Reilly, open data is “a key enabler of the measurement revolution”.

This “measurement revolution” seeks to quantify the efficiency of various social programmes, as if the rationale behind the social nets that some of them provide was to achieve perfection of delivery. The actual rationale, of course, was to enable a fulfilling life by suppressing certain anxieties, so that citizens can pursue their life projects relatively undisturbed. This vision did spawn a vast bureaucratic apparatus and the critics of the welfare state from the left – most prominently Michel Foucault – were right to question its disciplining inclinations. Nonetheless, neither perfection nor efficiency were the “desired outcome” of this system. Thus, to compare the welfare state with the algorithmic state on those grounds is misleading.

But we can compare their respective visions for human fulfilment – and the role they assign to markets and the state. Silicon Valley’s offer is clear: thanks to ubiquitous feedback loops, we can all become entrepreneurs and take care of our own affairs! As Brian Chesky, the chief executive of Airbnb, told the Atlantic last year, “What happens when everybody is a brand? When everybody has a reputation? Every person can become an entrepreneur.”

Under this vision, we will all code (for America!) in the morning, driveUber cars in the afternoon, and rent out our kitchens as restaurants – courtesy of Airbnb – in the evening. As O’Reilly writes of Uber and similar companies, “these services ask every passenger to rate their driver (and drivers to rate their passenger). Drivers who provide poor service are eliminated. Reputation does a better job of ensuring a superb customer experience than any amount of government regulation.”

The state behind the “sharing economy” does not wither away; it might be needed to ensure that the reputation accumulated on Uber, Airbnb and other platforms of the “sharing economy” is fully liquid and transferable, creating a world where our every social interaction is recorded and assessed, erasing whatever differences exist between social domains. Someone, somewhere will eventually rate you as a passenger, a house guest, a student, a patient, a customer. Whether this ranking infrastructure will be decentralised, provided by a giant like Google or rest with the state is not yet clear but the overarching objective is: to make reputation into a feedback-friendly social net that could protect the truly responsible citizens from the vicissitudes of deregulation.

Admiring the reputation models of Uber and Airbnb, O’Reilly wants governments to be “adopting them where there are no demonstrable ill effects”. But what counts as an “ill effect” and how to demonstrate it is a key question that belongs to the how of politics that algorithmic regulation wants to suppress. It’s easy to demonstrate “ill effects” if the goal of regulation is efficiency but what if it is something else? Surely, there are some benefits – fewer visits to the psychoanalyst, perhaps – in not having your every social interaction ranked?

The imperative to evaluate and demonstrate “results” and “effects” already presupposes that the goal of policy is the optimisation of efficiency. However, as long as democracy is irreducible to a formula, its composite values will always lose this battle: they are much harder to quantify.

For Silicon Valley, though, the reputation-obsessed algorithmic state of the sharing economy is the new welfare state. If you are honest and hardworking, your online reputation would reflect this, producing a highly personalised social net. It is “ultrastable” in Ashby’s sense: while the welfare state assumes the existence of specific social evils it tries to fight, the algorithmic state makes no such assumptions. The future threats can remain fully unknowable and fully addressable – on the individual level.

Silicon Valley, of course, is not alone in touting such ultrastable individual solutions. Nassim Taleb, in his best-selling 2012 book Antifragile, makes a similar, if more philosophical, plea for maximising our individual resourcefulness and resilience: don’t get one job but many, don’t take on debt, count on your own expertise. It’s all about resilience, risk-taking and, as Taleb puts it, “having skin in the game”. As Julian Reid and Brad Evans write in their new book, Resilient Life: The Art of Living Dangerously, this growing cult of resilience masks a tacit acknowledgement that no collective project could even aspire to tame the proliferating threats to human existence – we can only hope to equip ourselves to tackle them individually. “When policy-makers engage in the discourse of resilience,” write Reid and Evans, “they do so in terms which aim explicitly at preventing humans from conceiving of danger as a phenomenon from which they might seek freedom and even, in contrast, as that to which they must now expose themselves.”

What, then, is the progressive alternative? “The enemy of my enemy is my friend” doesn’t work here: just because Silicon Valley is attacking the welfare state doesn’t mean that progressives should defend it to the very last bullet (or tweet). First, even leftist governments have limited space for fiscal manoeuvres, as the kind of discretionary spending required to modernise the welfare state would never be approved by the global financial markets. And it’s the ratings agencies and bond markets – not the voters – who are in charge today.

Second, the leftist critique of the welfare state has become only more relevant today when the exact borderlines between welfare and security are so blurry. When Google’s Android powers so much of our everyday life, the government’s temptation to govern us through remotely controlled cars and alarm-operated soap dispensers will be all too great. This will expand government’s hold over areas of life previously free from regulation.

With so much data, the government’s favourite argument in fighting terror – if only the citizens knew as much as we do, they too would impose all these legal exceptions – easily extends to other domains, from health to climate change. Consider a recent academic paper that used Google search data to study obesity patterns in the US, finding significant correlation between search keywords and body mass index levels. “Results suggest great promise of the idea of obesity monitoring through real-time Google Trends data”, note the authors, which would be “particularly attractive for government health institutions and private businesses such as insurance companies.”

If Google senses a flu epidemic somewhere, it’s hard to challenge its hunch – we simply lack the infrastructure to process so much data at this scale. Google can be proven wrong after the fact – as has recently been the case with its flu trends data, which was shown to overestimate the number of infections, possibly because of its failure to account for the intense media coverage of flu – but so is the case with most terrorist alerts. It’s the immediate, real-time nature of computer systems that makes them perfect allies of an infinitely expanding and pre-emption‑obsessed state.

Perhaps, the case of Gloria Placente and her failed trip to the beach was not just a historical oddity but an early omen of how real-time computing, combined with ubiquitous communication technologies, would transform the state. One of the few people to have heeded that omen was a little-known American advertising executive called Robert MacBride, who pushed the logic behind Operation Corral to its ultimate conclusions in his unjustly neglected 1967 book, The Automated State.

At the time, America was debating the merits of establishing a national data centre to aggregate various national statistics and make it available to government agencies. MacBride attacked his contemporaries’ inability to see how the state would exploit the metadata accrued as everything was being computerised. Instead of “a large scale, up-to-date Austro-Hungarian empire”, modern computer systems would produce “a bureaucracy of almost celestial capacity” that can “discern and define relationships in a manner which no human bureaucracy could ever hope to do”.

“Whether one bowls on a Sunday or visits a library instead is [of] no consequence since no one checks those things,” he wrote. Not so when computer systems can aggregate data from different domains and spot correlations. “Our individual behaviour in buying and selling an automobile, a house, or a security, in paying our debts and acquiring new ones, and in earning money and being paid, will be noted meticulously and studied exhaustively,” warned MacBride. Thus, a citizen will soon discover that “his choice of magazine subscriptions… can be found to indicate accurately the probability of his maintaining his property or his interest in the education of his children.” This sounds eerily similar to the recent case of a hapless father who found that his daughter was pregnant from a coupon that Target, a retailer, sent to their house. Target’s hunch was based on its analysis of products – for example, unscented lotion – usually bought by other pregnant women.

For MacBride the conclusion was obvious. “Political rights won’t be violated but will resemble those of a small stockholder in a giant enterprise,” he wrote. “The mark of sophistication and savoir-faire in this future will be the grace and flexibility with which one accepts one’s role and makes the most of what it offers.” In other words, since we are all entrepreneurs first – and citizens second, we might as well make the most of it.

What, then, is to be done? Technophobia is no solution. Progressives need technologies that would stick with the spirit, if not the institutional form, of the welfare state, preserving its commitment to creating ideal conditions for human flourishing. Even some ultrastability is welcome. Stability was a laudable goal of the welfare state before it had encountered a trap: in specifying the exact protections that the state was to offer against the excesses of capitalism, it could not easily deflect new, previously unspecified forms of exploitation.

How do we build welfarism that is both decentralised and ultrastable? A form of guaranteed basic income – whereby some welfare services are replaced by direct cash transfers to citizens – fits the two criteria.

Creating the right conditions for the emergence of political communities around causes and issues they deem relevant would be another good step. Full compliance with the principle of ultrastability dictates that such issues cannot be anticipated or dictated from above – by political parties or trade unions – and must be left unspecified.

What can be specified is the kind of communications infrastructure needed to abet this cause: it should be free to use, hard to track, and open to new, subversive uses. Silicon Valley’s existing infrastructure is great for fulfilling the needs of the state, not of self-organising citizens. It can, of course, be redeployed for activist causes – and it often is – but there’s no reason to accept the status quo as either ideal or inevitable.

Why, after all, appropriate what should belong to the people in the first place? While many of the creators of the internet bemoan how low their creature has fallen, their anger is misdirected. The fault is not with that amorphous entity but, first of all, with the absence of robust technology policy on the left – a policy that can counter the pro-innovation, pro-disruption, pro-privatisation agenda of Silicon Valley. In its absence, all these emerging political communities will operate with their wings clipped. Whether the next Occupy Wall Street would be able to occupy anything in a truly smart city remains to be seen: most likely, they would be out-censored and out-droned.

To his credit, MacBride understood all of this in 1967. “Given the resources of modern technology and planning techniques,” he warned, “it is really no great trick to transform even a country like ours into a smoothly running corporation where every detail of life is a mechanical function to be taken care of.” MacBride’s fear is O’Reilly’s master plan: the government, he writes, ought to be modelled on the “lean startup” approach of Silicon Valley, which is “using data to constantly revise and tune its approach to the market”. It’s this very approach that Facebook has recently deployed to maximise user engagement on the site: if showing users more happy stories does the trick, so be it.

Algorithmic regulation, whatever its immediate benefits, will give us a political regime where technology corporations and government bureaucrats call all the shots. The Polish science fiction writer Stanislaw Lem, in a pointed critique of cybernetics published, as it happens, roughly at the same time as The Automated State, put it best: “Society cannot give up the burden of having to decide about its own fate by sacrificing this freedom for the sake of the cybernetic regulator.”

How Quantum Computers and Machine Learning Will Revolutionize Big Data (Wired)

BY JENNIFER OUELLETTE, QUANTA MAGAZINE

10.14.13

Image: infocux Technologies/Flickr

When subatomic particles smash together at the Large Hadron Collider in Switzerland, they create showers of new particles whose signatures are recorded by four detectors. The LHC captures 5 trillion bits of data — more information than all of the world’s libraries combined — every second. After the judicious application of filtering algorithms, more than 99 percent of those data are discarded, but the four experiments still produce a whopping 25 petabytes (25×1015 bytes) of data per year that must be stored and analyzed. That is a scale far beyond the computing resources of any single facility, so the LHC scientists rely on a vast computing grid of 160 data centers around the world, a distributed network that is capable of transferring as much as 10 gigabytes per second at peak performance.

The LHC’s approach to its big data problem reflects just how dramatically the nature of computing has changed over the last decade. Since Intel co-founder Gordon E. Moore first defined it in 1965, the so-called Moore’s law — which predicts that the number of transistors on integrated circuits will double every two years — has dominated the computer industry. While that growth rate has proved remarkably resilient, for now, at least, “Moore’s law has basically crapped out; the transistors have gotten as small as people know how to make them economically with existing technologies,” said Scott Aaronson, a theoretical computer scientist at the Massachusetts Institute of Technology.

Instead, since 2005, many of the gains in computing power have come from adding more parallelism via multiple cores, with multiple levels of memory. The preferred architecture no longer features a single central processing unit (CPU) augmented with random access memory (RAM) and a hard drive for long-term storage. Even the big, centralized parallel supercomputers that dominated the 1980s and 1990s are giving way to distributed data centers and cloud computing, often networked across many organizations and vast geographical distances.

These days, “People talk about a computing fabric,” said Stanford University electrical engineerStephen Boyd. These changes in computer architecture translate into the need for a different computational approach when it comes to handling big data, which is not only grander in scope than the large data sets of yore but also intrinsically different from them.

The demand for ever-faster processors, while important, isn’t the primary focus anymore. “Processing speed has been completely irrelevant for five years,” Boyd said. “The challenge is not how to solve problems with a single, ultra-fast processor, but how to solve them with 100,000 slower processors.” Aaronson points out that many problems in big data can’t be adequately addressed by simply adding more parallel processing. These problems are “more sequential, where each step depends on the outcome of the preceding step,” he said. “Sometimes, you can split up the work among a bunch of processors, but other times, that’s harder to do.” And often the software isn’t written to take full advantage of the extra processors. “If you hire 20 people to do something, will it happen 20 times faster?” Aaronson said. “Usually not.”

Researchers also face challenges in integrating very differently structured data sets, as well as the difficulty of moving large amounts of data efficiently through a highly distributed network.

Those issues will become more pronounced as the size and complexity of data sets continue to grow faster than computing resources, according to California Institute of Technology physicist Harvey Newman, whose team developed the LHC’s grid of data centers and trans-Atlantic network. He estimates that if current trends hold, the computational needs of big data analysis will place considerable strain on the computing fabric. “It requires us to think about a different kind of system,” he said.

Memory and Movement

Emmanuel Candes, an applied mathematician at Stanford University, was once able to crunch big data problems on his desktop computer. But last year, when he joined a collaboration of radiologists developing dynamic magnetic resonance imaging — whereby one could record a patient’s heartbeat in real time using advanced algorithms to create high-resolution videos from limited MRI measurements — he found that the data no longer fit into his computer’s memory, making it difficult to perform the necessary analysis.

Addressing the storage-capacity challenges of big data is not simply a matter of building more memory, which has never been more plentiful. It is also about managing the movement of data. That’s because, increasingly, the desired data is no longer at people’s fingertips, stored in a single computer; it is distributed across multiple computers in a large data center or even in the “cloud.”There is a hierarchy to data storage, ranging from the slowest, cheapest and most abundant memory to the fastest and most expensive, with the least available space. At the bottom of this hierarchy is so-called “slow memory” such as hard drives and flash drives, the cost of which continues to drop. There is more space on hard drives, compared to the other kinds of memory, but saving and retrieving the data takes longer. Next up this ladder comes RAM, which is must faster than slow memory but offers less space is more expensive. Then there is cache memory — another trade-off of space and price in exchange for faster retrieval speeds — and finally the registers on the microchip itself, which are the fastest of all but the priciest to build, with the least available space. If memory storage were like real estate, a hard drive would be a sprawling upstate farm, RAM would be a medium-sized house in the suburbs, cache memory would be a townhouse on the outskirts of a big city, and the register memory would be a tiny studio in a prime urban location.

Longer commutes for stored data translate into processing delays. “When computers are slow today, it’s not because of the microprocessor,” Aaronson said. “The microprocessor is just treading water waiting for the disk to come back with the data.” Big data researchers prefer to minimize how much data must be moved back and forth from slow memory to fast memory. The problem is exacerbated when the data is distributed across a network or in the cloud, because it takes even longer to move the data back and forth, depending on bandwidth capacity, so that it can be analyzed.

One possible solution to this dilemma is to embrace the new paradigm. In addition to distributed storage, why not analyze the data in a distributed way as well, with each unit (or node) in a network of computers performing a small piece of a computation? Each partial solution is then integrated to find the full result. This approach is similar in concept to the LHC’s, in which one complete copy of the raw data (after filtering) is stored at the CERN research facility in Switzerland that is home to the collider. A second copy is divided into batches that are then distributed to data centers around the world. Each center analyzes its chunk of data and transmits the results to regional computers before moving on to the next batch.

Alon Halevy, a computer scientist at Google, says the biggest breakthroughs in big data are likely to come from data integration.Image: Peter DaSilva for Quanta Magazine

Boyd’s system is based on so-calledconsensus algorithms. “It’s a mathematical optimization problem,” he said of the algorithms. “You are using past data to train the model in hopes that it will work on future data.” Such algorithms are useful for creating an effective SPAM filter, for example, or for detecting fraudulent bank transactions.

This can be done on a single computer, with all the data in one place. Machine learning typically uses many processors, each handling a little bit of the problem. But when the problem becomes too large for a single machine, a consensus optimization approach might work better, in which the data set is chopped into bits and distributed across 1,000 “agents” that analyze their bit of data and each produce a model based on the data they have processed. The key is to require a critical condition to be met: although each agent’s model can be different, all the models must agree in the end — hence the term “consensus algorithms.”

The process by which 1,000 individual agents arrive at a consensus model is similar in concept to the Mechanical Turk crowd-sourcing methodology employed by Amazon — with a twist. With the Mechanical Turk, a person or a business can post a simple task, such as determining which photographs contain a cat, and ask the crowd to complete the task in exchange for gift certificates that can be redeemed for Amazon products, or for cash awards that can be transferred to a personal bank account. It may seem trivial to the human user, but the program learns from this feedback, aggregating all the individual responses into its working model, so it can make better predictions in the future.

In Boyd’s system, the process is iterative, creating a feedback loop. The initial consensus is shared with all the agents, which update their models in light of the new information and reach a second consensus, and so on. The process repeats until all the agents agree. Using this kind of distributed optimization approach significantly cuts down on how much data needs to be transferred at any one time.

The Quantum Question

Late one night, during a swanky Napa Valley conference last year, MIT physicist Seth Lloyd found himself soaking in a hot tub across from Google’s Sergey Brin and Larry Page — any aspiring technology entrepreneur’s dream scenario. Lloyd made his pitch, proposing a quantum version of Google’s search engine whereby users could make queries and receive results without Google knowing which questions were asked. The men were intrigued. But after conferring with their business manager the next day, Brin and Page informed Lloyd that his scheme went against their business plan. “They want to know everything about everybody who uses their products and services,” he joked.

It is easy to grasp why Google might be interested in a quantum computer capable of rapidly searching enormous data sets. A quantum computer, in principle, could offer enormous increases in processing power, running algorithms significantly faster than a classical (non-quantum) machine for certain problems. Indeed, the company just purchased a reportedly $15 million prototype from a Canadian firm called D-Wave Systems, although the jury is still out on whether D-Wave’s product is truly quantum.

“This is not about trying all the possible answers in parallel. It is fundamentally different from parallel processing,” said Aaronson. Whereas a classical computer stores information as bits that can be either 0s or 1s, a quantum computer could exploit an unusual property: the superposition of states. If you flip a regular coin, it will land on heads or tails. There is zero probability that it will be both heads and tails. But if it is a quantum coin, technically, it exists in an indeterminate state of both heads and tails until you look to see the outcome.

A true quantum computer could encode information in so-called qubits that can be 0 and 1 at the same time. Doing so could reduce the time required to solve a difficult problem that would otherwise take several years of computation to mere seconds. But that is easier said than done, not least because such a device would be highly sensitive to outside interference: The slightest perturbation would be equivalent to looking to see if the coin landed heads or tails, and thus undo the superposition.

Data from a seemingly simple query about coffee production across the globe can be surprisingly difficult to integrate. Image: Peter DaSilva for Quanta Magazine

However, Aaronson cautions against placing too much hope in quantum computing to solve big data’s computational challenges, insisting that if and when quantum computers become practical, they will be best suited to very specific tasks, most notably to simulate quantum mechanical systems or to factor large numbers to break codes in classical cryptography. Yet there is one way that quantum computing might be able to assist big data: by searching very large, unsorted data sets — for example, a phone directory in which the names are arranged randomly instead of alphabetically.

It is certainly possible to do so with sheer brute force, using a massively parallel computer to comb through every record. But a quantum computer could accomplish the task in a fraction of the time. That is the thinking behind Grover’s algorithm, which was devised by Bell Labs’ Lov Grover in 1996. However, “to really make it work, you’d need a quantum memory that can be accessed in a quantum superposition,” Aaronson said, but it would need to do so in such a way that the very act of accessing the memory didn’t destroy the superposition, “and that is tricky as hell.”

In short, you need quantum RAM (Q-RAM), and Lloyd has developed a conceptual prototype, along with an accompanying program he calls a Q-App (pronounced “quapp”) targeted to machine learning. He thinks his system could find patterns within data without actually looking at any individual records, thereby preserving the quantum superposition (and the users’ privacy). “You can effectively access all billion items in your database at the same time,” he explained, adding that “you’re not accessing any one of them, you’re accessing common features of all of them.”

For example, if there is ever a giant database storing the genome of every human being on Earth, “you could search for common patterns among different genes” using Lloyd’s quantum algorithm, with Q-RAM and a small 70-qubit quantum processor while still protecting the privacy of the population, Lloyd said. The person doing the search would have access to only a tiny fraction of the individual records, he said, and the search could be done in a short period of time. With the cost of sequencing human genomes dropping and commercial genotyping services rising, it is quite possible that such a database might one day exist, Lloyd said. It could be the ultimate big data set, considering that a single genome is equivalent to 6 billion bits.

Lloyd thinks quantum computing could work well for powerhouse machine-learning algorithms capable of spotting patterns in huge data sets — determining what clusters of data are associated with a keyword, for example, or what pieces of data are similar to one another in some way. “It turns out that many machine-learning algorithms actually work quite nicely in quantum computers, assuming you have a big enough Q-RAM,” he said. “These are exactly the kinds of mathematical problems people try to solve, and we think we could do very well with the quantum version of that.”

The Future Is Integration

“No matter how much you speed up the computers or the way you put computers together, the real issues are at the data level.”

Google’s Alon Halevy believes that the real breakthroughs in big data analysis are likely to come from integration — specifically, integrating across very different data sets. “No matter how much you speed up the computers or the way you put computers together, the real issues are at the data level,” he said. For example, a raw data set could include thousands of different tables scattered around the Web, each one listing crime rates in New York, but each may use different terminology and column headers, known as “schema.” A header of “New York” can describe the state, the five boroughs of New York City, or just Manhattan. You must understand the relationship between the schemas before the data in all those tables can be integrated.

That, in turn, requires breakthroughs in techniques to analyze the semantics of natural language. It is one of the toughest problems in artificial intelligence — if your machine-learning algorithm aspires to perfect understanding of nearly every word. But what if your algorithm needs to understand only enough of the surrounding text to determine whether, for example, a table includes data on coffee production in various countries so that it can then integrate the table with other, similar tables into one common data set? According to Halevy, a researcher could first use a coarse-grained algorithm to parse the underlying semantics of the data as best it could and then adopt a crowd-sourcing approach like a Mechanical Turk to refine the model further through human input. “The humans are training the system without realizing it, and then the system can answer many more questions based on what it has learned,” he said.

Chris Mattmann, a senior computer scientist at NASA’s Jet Propulsion Laboratory and director at theApache Software Foundation, faces just such a complicated scenario with a research project that seeks to integrate two different sources of climate information: remote-sensing observations of the Earth made by satellite instrumentation and computer-simulated climate model outputs. The Intergovernmental Panel on Climate Change would like to be able to compare the various climate models against the hard remote-sensing data to determine which models provide the best fit. But each of those sources stores data in different formats, and there are many different versions of those formats.

Many researchers emphasize the need to develop a broad spectrum of flexible tools that can deal with many different kinds of data. For example, many users are shifting from traditional highly structured relational databases, broadly known as SQL, which represent data in a conventional tabular format, to a more flexible format dubbed NoSQL. “It can be as structured or unstructured as you need it to be,” said Matt LeMay, a product and communications consultant and the former head of consumer products at URL shortening and bookmarking service Bitly, which uses both SQL and NoSQL formats for data storage, depending on the application.

Mattmann cites an Apache software program called Tika that allows the user to integrate data across 1,200 of the most common file formats. But in some cases, some human intervention is still required. Ultimately, Mattmann would like to fully automate this process via intelligent software that can integrate differently structured data sets, much like the Babel Fish in Douglas Adams’ “Hitchhiker’s Guide to the Galaxy” book series enabled someone to understand any language.

Integration across data sets will also require a well-coordinated distributed network system comparable to the one conceived of by Newman’s group at Caltech for the LHC, which monitors tens of thousands of processors and more than 10 major network links. Newman foresees a computational future for big data that relies on this type of automation through well-coordinated armies of intelligent agents, that track the movement of data from one point in the network to another, identifying bottlenecks and scheduling processing tasks. Each might only record what is happening locally but would share the information in such a way as to shed light on the network’s global situation.

“Thousands of agents at different levels are coordinating to help human beings understand what’s going on in a complex and very distributed system,” Newman said. The scale would be even greater in the future, when there would be billions of such intelligent agents, or actors, making up a vast global distributed intelligent entity. “It’s the ability to create those things and have them work on one’s behalf that will reduce the complexity of these operational problems,” he said. “At a certain point, when there’s a complicated problem in such a system, no set of human beings can really understand it all and have access to all the information.”