Arquivo da tag: Computação

What Did Neanderthals Leave to Modern Humans? Some Surprises (New York Times)

Geneticists tell us that somewhere between 1 and 5 percent of the genome of modern Europeans and Asians consists of DNA inherited from Neanderthals, our prehistoric cousins.

At Vanderbilt University, John Anthony Capra, an evolutionary genomics professor, has been combining high-powered computation and a medical records databank to learn what a Neanderthal heritage — even a fractional one — might mean for people today.

We spoke for two hours when Dr. Capra, 35, recently passed through New York City. An edited and condensed version of the conversation follows.

Q. Let’s begin with an indiscreet question. How did contemporary people come to have Neanderthal DNA on their genomes?

A. We hypothesize that roughly 50,000 years ago, when the ancestors of modern humans migrated out of Africa and into Eurasia, they encountered Neanderthals. Matings must have occurred then. And later.

One reason we deduce this is because the descendants of those who remained in Africa — present day Africans — don’t have Neanderthal DNA.

What does that mean for people who have it? 

At my lab, we’ve been doing genetic testing on the blood samples of 28,000 patients at Vanderbilt and eight other medical centers across the country. Computers help us pinpoint where on the human genome this Neanderthal DNA is, and we run that against information from the patients’ anonymized medical records. We’re looking for associations.

What we’ve been finding is that Neanderthal DNA has a subtle influence on risk for disease. It affects our immune system and how we respond to different immune challenges. It affects our skin. You’re slightly more prone to a condition where you can get scaly lesions after extreme sun exposure. There’s an increased risk for blood clots and tobacco addiction.

To our surprise, it appears that some Neanderthal DNA can increase the risk for depression; however, there are other Neanderthal bits that decrease the risk. Roughly 1 to 2 percent of one’s risk for depression is determined by Neanderthal DNA. It all depends on where on the genome it’s located.

Was there ever an upside to having Neanderthal DNA?

It probably helped our ancestors survive in prehistoric Europe. When humans migrated into Eurasia, they encountered unfamiliar hazards and pathogens. By mating with Neanderthals, they gave their offspring needed defenses and immunities.

That trait for blood clotting helped wounds close up quickly. In the modern world, however, this trait means greater risk for stroke and pregnancy complications. What helped us then doesn’t necessarily now.

Did you say earlier that Neanderthal DNA increases susceptibility to nicotine addiction?

Yes. Neanderthal DNA can mean you’re more likely to get hooked on nicotine, even though there were no tobacco plants in archaic Europe.

We think this might be because there’s a bit of Neanderthal DNA right next to a human gene that’s a neurotransmitter implicated in a generalized risk for addiction. In this case and probably others, we think the Neanderthal bits on the genome may serve as switches that turn human genes on or off.

Aside from the Neanderthals, do we know if our ancestors mated with other hominids?

We think they did. Sometimes when we’re examining genomes, we can see the genetic afterimages of hominids who haven’t even been identified yet.

A few years ago, the Swedish geneticist Svante Paabo received an unusual fossilized bone fragment from Siberia. He extracted the DNA, sequenced it and realized it was neither human nor Neanderthal. What Paabo found was a previously unknown hominid he named Denisovan, after the cave where it had been discovered. It turned out that Denisovan DNA can be found on the genomes of modern Southeast Asians and New Guineans.

Have you long been interested in genetics?

Growing up, I was very interested in history, but I also loved computers. I ended up majoring in computer science at college and going to graduate school in it; however, during my first year in graduate school, I realized I wasn’t very motivated by the problems that computer scientists worked on.

Fortunately, around that time — the early 2000s — it was becoming clear that people with computational skills could have a big impact in biology and genetics. The human genome had just been mapped. What an accomplishment! We now had the code to what makes you, you, and me, me. I wanted to be part of that kind of work.

So I switched over to biology. And it was there that I heard about a new field where you used computation and genetics research to look back in time — evolutionary genomics.

There may be no written records from prehistory, but genomes are a living record. If we can find ways to read them, we can discover things we couldn’t know any other way.

Not long ago, the two top editors of The New England Journal of Medicine published an editorial questioning “data sharing,” a common practice where scientists recycle raw data other researchers have collected for their own studies. They labeled some of the recycling researchers, “data parasites.” How did you feel when you read that?

I was upset. The data sets we used were not originally collected to specifically study Neanderthal DNA in modern humans. Thousands of patients at Vanderbilt consented to have their blood and their medical records deposited in a “biobank” to find genetic diseases.

Three years ago, when I set up my lab at Vanderbilt, I saw the potential of the biobank for studying both genetic diseases and human evolution. I wrote special computer programs so that we could mine existing data for these purposes.

That’s not being a “parasite.” That’s moving knowledge forward. I suspect that most of the patients who contributed their information are pleased to see it used in a wider way.

What has been the response to your Neanderthal research since you published it last year in the journal Science?

Some of it’s very touching. People are interested in learning about where they came from. Some of it is a little silly. “I have a lot of hair on my legs — is that from Neanderthals?”

But I received racist inquiries, too. I got calls from all over the world from people who thought that since Africans didn’t interbreed with Neanderthals, this somehow justified their ideas of white superiority.

It was illogical. Actually, Neanderthal DNA is mostly bad for us — though that didn’t bother them.

As you do your studies, do you ever wonder about what the lives of the Neanderthals were like?

It’s hard not to. Genetics has taught us a tremendous amount about that, and there’s a lot of evidence that they were much more human than apelike.

They’ve gotten a bad rap. We tend to think of them as dumb and brutish. There’s no reason to believe that. Maybe those of us of European heritage should be thinking, “Let’s improve their standing in the popular imagination. They’re our ancestors, too.’”


Algoritmo quântico mostrou-se mais eficaz do que qualquer análogo clássico (Revista Fapesp)

11 de dezembro de 2015

José Tadeu Arantes | Agência FAPESP – O computador quântico poderá deixar de ser um sonho e se tornar realidade nos próximos 10 anos. A expectativa é que isso traga uma drástica redução no tempo de processamento, já que algoritmos quânticos oferecem soluções mais eficientes para certas tarefas computacionais do que quaisquer algoritmos clássicos correspondentes.

Até agora, acreditava-se que a chave da computação quântica eram as correlações entre dois ou mais sistemas. Exemplo de correlação quântica é o processo de “emaranhamento”, que ocorre quando pares ou grupos de partículas são gerados ou interagem de tal maneira que o estado quântico de cada partícula não pode ser descrito independentemente, já que depende do conjunto (Para mais informações veja

Um estudo recente mostrou, no entanto, que mesmo um sistema quântico isolado, ou seja, sem correlações com outros sistemas, é suficiente para implementar um algoritmo quântico mais rápido do que o seu análogo clássico. Artigo descrevendo o estudo foi publicado no início de outubro deste ano na revista Scientific Reports, do grupo Nature: Computational speed-up with a single qudit.

O trabalho, ao mesmo tempo teórico e experimental, partiu de uma ideia apresentada pelo físico Mehmet Zafer Gedik, da Sabanci Üniversitesi, de Istambul, Turquia. E foi realizado mediante colaboração entre pesquisadores turcos e brasileiros. Felipe Fernandes Fanchini, da Faculdade de Ciências da Universidade Estadual Paulista (Unesp), no campus de Bauru, é um dos signatários do artigo. Sua participação no estudo se deu no âmbito do projeto Controle quântico em sistemas dissipativos, apoiado pela FAPESP.

“Este trabalho traz uma importante contribuição para o debate sobre qual é o recurso responsável pelo poder de processamento superior dos computadores quânticos”, disse Fanchini à Agência FAPESP.

“Partindo da ideia de Gedik, realizamos no Brasil um experimento, utilizando o sistema de ressonância magnética nuclear (RMN) da Universidade de São Paulo (USP) em São Carlos. Houve, então, a colaboração de pesquisadores de três universidades: Sabanci, Unesp e USP. E demonstramos que um circuito quântico dotado de um único sistema físico, com três ou mais níveis de energia, pode determinar a paridade de uma permutação numérica avaliando apenas uma vez a função. Isso é impensável em um protocolo clássico.”

Segundo Fanchini, o que Gedik propôs foi um algoritmo quântico muito simples que, basicamente, determina a paridade de uma sequência. O conceito de paridade é utilizado para informar se uma sequência está em determinada ordem ou não. Por exemplo, se tomarmos os algarismos 1, 2 e 3 e estabelecermos que a sequência 1- 2-3 está em ordem, as sequências 2-3-1 e 3-1-2, resultantes de permutações cíclicas dos algarismos, estarão na mesma ordem.

Isso é fácil de entender se imaginarmos os algarismos dispostos em uma circunferência. Dada a primeira sequência, basta girar uma vez em um sentido para obter a sequência seguinte, e girar mais uma vez para obter a outra. Porém, as sequências 1-3-2, 3-2-1 e 2-1-3 necessitam, para serem criadas, de permutações acíclicas. Então, se convencionarmos que as três primeiras sequências são “pares”, as outras três serão “ímpares”.

“Em termos clássicos, a observação de um único algarismo, ou seja uma única medida, não permite dizer se a sequência é par ou ímpar. Para isso, é preciso realizar ao menos duas observações. O que Gedik demonstrou foi que, em termos quânticos, uma única medida é suficiente para determinar a paridade. Por isso, o algoritmo quântico é mais rápido do que qualquer equivalente clássico. E esse algoritmo pode ser concretizado por meio de uma única partícula. O que significa que sua eficiência não depende de nenhum tipo de correlação quântica”, informou Fanchini.

O algoritmo em pauta não diz qual é a sequência. Mas informa se ela é par ou ímpar. Isso só é possível quando existem três ou mais níveis. Porque, havendo apenas dois níveis, algo do tipo 1-2 ou 2-1, não é possível definir uma sequência par ou ímpar. “Nos últimos tempos, a comunidade voltada para a computação quântica vem explorando um conceito-chave da teoria quântica, que é o conceito de ‘contextualidade’. Como a ‘contextualidade’ também só opera a partir de três ou mais níveis, suspeitamos que ela possa estar por trás da eficácia de nosso algoritmo”, acrescentou o pesquisador.

Conceito de contextulidade

“O conceito de ‘contextualidade’ pode ser melhor entendido comparando-se as ideias de mensuração da física clássica e da física quântica. Na física clássica, supõe-se que a mensuração nada mais faça do que desvelar características previamente possuídas pelo sistema que está sendo medido. Por exemplo, um determinado comprimento ou uma determinada massa. Já na física quântica, o resultado da mensuração não depende apenas da característica que está sendo medida, mas também de como foi organizada a mensuração, e de todas as mensurações anteriores. Ou seja, o resultado depende do contexto do experimento. E a ‘contextualidade’ é a grandeza que descreve esse contexto”, explicou Fanchini.

Na história da física, a “contextualidade” foi reconhecida como uma característica necessária da teoria quântica por meio do famoso Teorema de Bell. Segundo esse teorema, publicado em 1964 pelo físico irlandês John Stewart Bell (1928 – 1990), nenhuma teoria física baseada em variáveis locais pode reproduzir todas as predições da mecânica quântica. Em outras palavras, os fenômenos físicos não podem ser descritos em termos estritamente locais, uma vez que expressam a totalidade.

“É importante frisar que em outro artigo [Contextuality supplies the ‘magic’ for quantum computation] publicado na Nature em junho de 2014, aponta a contextualidade como a possível fonte do poder da computação quântica. Nosso estudo vai no mesmo sentido, apresentando um algoritmo concreto e mais eficiente do que qualquer um jamais imaginável nos moldes clássicos.”

Full-scale architecture for a quantum computer in silicon (Science Daily)

Scalable 3-D silicon chip architecture based on single atom quantum bits provides a blueprint to build operational quantum computers

October 30, 2015
University of New South Wales
Researchers have designed a full-scale architecture for a quantum computer in silicon. The new concept provides a pathway for building an operational quantum computer with error correction.

This picture shows from left to right Dr Matthew House, Sam Hile (seated), Sciential Professor Sven Rogge and Scientia Professor Michelle Simmons of the ARC Centre of Excellence for Quantum Computation and Communication Technology at UNSW. Credit: Deb Smith, UNSW Australia 

Australian scientists have designed a 3D silicon chip architecture based on single atom quantum bits, which is compatible with atomic-scale fabrication techniques — providing a blueprint to build a large-scale quantum computer.

Scientists and engineers from the Australian Research Council Centre of Excellence for Quantum Computation and Communication Technology (CQC2T), headquartered at the University of New South Wales (UNSW), are leading the world in the race to develop a scalable quantum computer in silicon — a material well-understood and favoured by the trillion-dollar computing and microelectronics industry.

Teams led by UNSW researchers have already demonstrated a unique fabrication strategy for realising atomic-scale devices and have developed the world’s most efficient quantum bits in silicon using either the electron or nuclear spins of single phosphorus atoms. Quantum bits — or qubits — are the fundamental data components of quantum computers.

One of the final hurdles to scaling up to an operational quantum computer is the architecture. Here it is necessary to figure out how to precisely control multiple qubits in parallel, across an array of many thousands of qubits, and constantly correct for ‘quantum’ errors in calculations.

Now, the CQC2T collaboration, involving theoretical and experimental researchers from the University of Melbourne and UNSW, has designed such a device. In a study published today in Science Advances, the CQC2T team describes a new silicon architecture, which uses atomic-scale qubits aligned to control lines — which are essentially very narrow wires — inside a 3D design.

“We have demonstrated we can build devices in silicon at the atomic-scale and have been working towards a full-scale architecture where we can perform error correction protocols — providing a practical system that can be scaled up to larger numbers of qubits,” says UNSW Scientia Professor Michelle Simmons, study co-author and Director of the CQC2T.

“The great thing about this work, and architecture, is that it gives us an endpoint. We now know exactly what we need to do in the international race to get there.”

In the team’s conceptual design, they have moved from a one-dimensional array of qubits, positioned along a single line, to a two-dimensional array, positioned on a plane that is far more tolerant to errors. This qubit layer is “sandwiched” in a three-dimensional architecture, between two layers of wires arranged in a grid.

By applying voltages to a sub-set of these wires, multiple qubits can be controlled in parallel, performing a series of operations using far fewer controls. Importantly, with their design, they can perform the 2D surface code error correction protocols in which any computational errors that creep into the calculation can be corrected faster than they occur.

“Our Australian team has developed the world’s best qubits in silicon,” says University of Melbourne Professor Lloyd Hollenberg, Deputy Director of the CQC2T who led the work with colleague Dr Charles Hill. “However, to scale up to a full operational quantum computer we need more than just many of these qubits — we need to be able to control and arrange them in such a way that we can correct errors quantum mechanically.”

“In our work, we’ve developed a blueprint that is unique to our system of qubits in silicon, for building a full-scale quantum computer.”

In their paper, the team proposes a strategy to build the device, which leverages the CQC2T’s internationally unique capability of atomic-scale device fabrication. They have also modelled the required voltages applied to the grid wires, needed to address individual qubits, and make the processor work.

“This architecture gives us the dense packing and parallel operation essential for scaling up the size of the quantum processor,” says Scientia Professor Sven Rogge, Head of the UNSW School of Physics. “Ultimately, the structure is scalable to millions of qubits, required for a full-scale quantum processor.”


In classical computers, data is rendered as binary bits, which are always in one of two states: 0 or 1. However, a qubit can exist in both of these states at once, a condition known as a superposition. A qubit operation exploits this quantum weirdness by allowing many computations to be performed in parallel (a two-qubit system performs the operation on 4 values, a three-qubit system on 8, and so on).

As a result, quantum computers will far exceed today’s most powerful super computers, and offer enormous advantages for a range of complex problems, such as rapidly scouring vast databases, modelling financial markets, optimising huge metropolitan transport networks, and modelling complex biological molecules.

How to build a quantum computer in silicon

How Silicon Valley controls our future (Fear and the Technopanic)

How Silicon Valley controls our future

Jeff Jarvis

Oh, My!

Just 12 hours ago, I posted a brief piece about the continuing Europtechnopanic in Germany and the effort of publishers there to blame their every trouble on Google—even the so-called sin of free content and the price of metaphoric wurst.

Now Germany one-ups even itself with the most amazing specimen of Europtechnopanic I have yet seen. The cover of Der Spiegel, the country’s most important news outlet, makes the titans of Silicon Valley look dark, wicked, and, well—I just don’t know how else to say it—all too much like this.

This must be Spiegel’s Dystopian Special Issue. Note the additional cover billing: “Michel Houellebecq: ‘Humanism and enlightenment are dead.’”

I bought the issue online—you’re welcome—so you can read along with me (and correct my translations, please).

The cover story gets right to the point. Inside, the opening headline warns: “Tomorrowland: In Silicon Valley, a new elite doesn’t just want to determine what we consume but how we live. They want to change the world and accept no regulation. Must we stop them?”

Ah, yes, German publishers want to regulate Google—and now, watch out, Facebook, Apple, Uber, and Yahoo! (Yahoo?), they’re gunning for you next.

Turn the page and the first thing you read is this: “By all accounts, Travis Kalanick, founder and head of Uber, is an asshole.”

Oh, my.

It continues: “Uber is not the only company with plans for such world conquest. That’s how they all think: Google and Facebook, Apple and Airbnb, all those digital giants and thousands of smaller firms in their neighborhood. Their goal is never the niche but always the whole world. They don’t follow delusional fantasies but have thoroughly realistic goals in sight. It’s all made possible by a Dynamic Duo almost unique in economic history: globalization coupled with digitilization.”

Digitalization, you see, is not just a spectre haunting Europe but a dark force overcoming the world. Must it be stopped? We’re merely asking.

Spiegel’s editors next fret that “progress will be faster and bigger, like an avalanche:” iPhone, self-driving cars, the world’s knowledge now digital and retrievable, 70% of stock trading controlled by algorithms, commercial drones, artificial intelligence, robots. “Madness but everyday madness,” Spiegel cries. “No longer science fiction.”

What all this means is misunderstood, Spiegel says, “above all by politicians,” who must decide whether to stand by as spectators while “others organize a global revolution. Because what is happening is much more than the triumph of new technology, much more than an economic phenomenon. It’s not just about ‘the internet’ or ‘social networks,’ not about intelligence and Edward Snowden and the question of what Google does with data.” It’s not just about newspapers shutting down and jobs lost to software. We are in the path of social change, “which in the end no one can escape.” Distinct from the industrial revolution, this time “digitization doesn’t just change industries but how we think and how we live. Only this time the change is controlled centrally by a few hundred people…. They aren’t stumbling into the future, they are ideologues with a clear agenda…. a high-tech doctrine of salvation.”


Oh, fuck!

The article then takes us on a tour of our new world capital, home to our “new Masters of the Universe,” who—perversely, apparently—are not concerned primarily about money. “Power through money isn’t enough for them.” It examines the roots of their philosophy from the “tradition of radical thinkers such as Noam Chomsky, Ayn Rand, and Friedrich Hayek,” leading to a “strange mixture of esoteric hippie-thinking and bare-knuckled capitalism.” Spiegel calls it their Menschheitsbeglückungswerks. I had to ask Twitter WTF that means.

Aha. So must we just go along with having this damned happiness shoved down our throats? “Is now the time for regulation before the world is finally dominated by digital monopolies?” Spiegel demands — I mean, merely asks? “Is this the time for democratic societies to defend themselves?”

Spiegel then visits four Silicon Valley geniuses: singularity man Ray Kurzweil; the conveniently German Sebastian Thrun, he of the self-driving car and online university; the always-good-for-a-WTF Peter Thiel (who was born in Germany but moved away after a year); and Airbnb’s Joe Gebbia. It recounts German President Joachim Gauck telling Thrun, “you scare me.” And it allows Thrun to respond that it’s the optimists, not the naysayers, who change the world.

I feared that these hapless four would be presented as ugly caricatures of the frightening, alien tribe of dark-bearded technopeople. You know what I’m getting at. But I’m relieved to say that’s not the case. What follows all the fear-mongering bluster of the cover story’s start is actual reporting. That is to say, a newsmagazine did what a newsmagazine does: It tops off its journalism with its agenda: frosting on the cupcake. And the agenda here is that of German publishers—some of them, which I explored last night and earlier. They attack Google and enlist politicians to do their bidding with new regulations to disadvantage their big, new, American, technological competitors.

And you know what? The German publishers’ strategy is working. German lawmakers passed a new ancillary copyright (nevermind that Google won that round when publishers gave it permission to quote their snippets) and EU politicians are talking not just about creating new copyright and privacy law but even about breaking up Google. The publishers are bringing Google to heel. The company waited far too long to empathize with publishers’ plight—albeit self-induced—and to recognize their political clout (a dangerous combination: desperation and power, as Google now knows). Now see how Matt Brittin, the head of EMEA for Google, drops birds at Europe’s feet like a willing hund, showing all the good that Google does indeed bring them.

I have also noted that Google is working on initiatives with European publishers to find mutual benefit and I celebrate that. That is why—ever helpful as I am—I wrote this post about what Google could do for news and this one about what news could do for Google. I see real opportunity for enlightened self-interest to take hold both inside Google and among publishers and for innovation and investment to come to news. But I’m one of those silly and apparently dangerous American optimists.

As I’ve often said, the publishers—led by Mathias Döpfner of Axel Springer and Paul-Bernhard Kallen of Burda—are smart. I admire them both. They know what they’re doing, using the power of their presses and thus their political clout to box in even big, powerful Google. It’s a game to them. It’s negotiation. It’s just business. I don’t agree with or much like their message or the tactic. But I get it.

Then comes this Scheißebombe from Der Spiegel. It goes far beyond the publishers’ game. It is nothing less than prewar propaganda, trying to stir up a populace against a boogeyman enemy in hopes of goading politicians to action to stop these people. If anyone would know better, you’d think they would. Schade.

Problem: Your brain (Medium)

I will be talking mainly about development for the web.

Ilya Dorman, Feb 15, 2015

Our puny brain can handle a very limited amount of logic at a time. While programmers proclaim logic as their domain, they are only sometimes and slightly better at managing complexity than the rest of us, mortals. The more logic our app has, the harder it is to change it or introduce new people to it.

The most common mistake programmers do is assuming they write code for a machine to read. While technically that is true, this mindset leads to the hell that is other people’s code.

I have worked in several start-up companies, some of them even considered “lean.” In each, it took me between few weeks to few months to fully understand their code-base, and I have about 6 years of experience with JavaScript. This does not seem reasonable to me at all.

If the code is not easy to read, its structure is already a monument—you can change small things, but major changes—the kind every start-up undergoes on an almost monthly basis—are as fun as a root canal. Once the code reaches a state, that for a proficient programmer, it is harder to read than this article—doom and suffering is upon you.

Why does the code become unreadable? Let’s compare code to plain text: the longer a sentence is, the easier it is for our mind to forget the beginning of it, and once we reach the end, we forget what was the beginning and lose the meaning of the whole sentence. You had to read the previous sentence twice because it was too long to get in one grasp? Exactly! Same with code. Worse, actually—the logic of code can be way more complex than any sentence from a book or a blog post, and each programmer has his own logic which can be total gibberish to another. Not to mention that we also need to remember the logic. Sometimes we come back to it the same day and sometimes after two month. Nobody remembers anything about their code after not looking at it for two month.

To make code readable to other humans we rely on three things:

1. Conventions

Conventions are good, but they are very limited: enforce them too little and the programmer becomes coupled to the code—no one will ever understand what they meant once they are gone. Enforce too much and you will have hour-long debates about every space and colon (true story.) The “habitable zone” is very narrow and easy to miss.


They are probably the most helpful, if done right. Unfortunately many programmers write their comments in the same spirit they write their code—very idiosyncratic. I do not belong to the school claiming good code needs no comments, but even beautifully commented code can still be extremely complicated.

3. “Other people know this programming language as much as I do, so they must understand my writings.”

Well… This is JavaScript:


4. Tests

Tests are a devil in disguise. ”How do we make sure our code is good and readable? We write more code!” I know many of you might quit this post right here, but bear with me for a few more lines: regardless of their benefit, tests are another layer of logic. They are more code to be read and understood. Tests try to solve this exact problem: your code is too complicated to calculate it’s result in your brain? So you say “well, this is what should happen in the end.” And when it doesn’t, you go digging for the problem. Your code should be simple enough to read a function or a line and understand what should be the result of running it.

Your life as a programmer could be so much easier!

Solution: Radical Minimalism

I will break down this approach into practical points, but the main idea is: use LESS logic.

  • Cut 80% of your product’s features

Yes! Just like that. Simplicity, first of all, comes from the product. Make it easy for people to understand and use. Make it do one thing well, and only then add up (if there is still a need.)

  • Use nothing but what you absolutely must

Do not include a single line of code (especially from libraries) that you are not 100% sure you will use and that it is the simplest, most straightforward solution available. Need a simple chat app and use Angular.js because it’s nice with the two-way binding? You deserve those hours and days of debugging and debating about services vs. providers.

Side note: The JavaScript browser api is event-driven, it is made to respond when stuff (usually user input) happens. This means that events change data. Many new frameworks (Angular, Meteor) reverse this direction and make data changes trigger events. If your app is simple, you might live happily with the new mysterious layer, but if not — you get a whole new layer of complexity that you need to understand and your life will get exponentially more miserable. Unless your app constantly manages big amounts of data, Avoid those frameworks.

  • Use simplest logic possible

Say you need show different HTML on different occasions. You can use client-side routing with controllers and data passed to each controller that renders the HTML from a template. Or you can just use static HTML pages with normal browser navigation, and update manually the HTML. Use the second.

  • Make short Javascript files

Limit the length of your JS files to a single editor page, and make each file do one thing. Can’t cramp all your glorious logic into small modules? Good, that means you should have less of it, so that other humans will understand your code in reasonable time.

  • Avoid pre-compilers and task-runners like AIDS

The more layers there are between what you write and what you see, the more logic your mind needs to remember. You might think grunt or gulp help you to simplify stuff but then you have 30 tasks that you need to remember what they do to your code, how to use them, update them, and teach them to any new coder. Not to mention compiling.

Side note #1: CSS pre-compilers are OK because they have very little logic but they help a lot in terms of readable structure, compared to plain CSS. I barely used HTML pre-compilers so you’ll have to decide for yourself.

Side note #2: Task-runners could save you time, so if you do use them, do it wisely keeping the minimalistic mindset.

  • Use Javascript everywhere

This one is quite specific, and I am not absolutely sure about it, but having the same language in client and server can simplify the data management between them.

  • Write more human code

Give your non trivial variables (and functions) descriptive names. Make shorter lines but only if it does not compromise readability.

Treat your code like poetry and take it to the edge of the bare minimum.

Computadores quânticos podem revolucionar teoria da informação (Fapesp)

30 de janeiro de 2015

Por Diego Freire

Agência FAPESP – A perspectiva dos computadores quânticos, com capacidade de processamento muito superior aos atuais, tem levado ao aprimoramento de uma das áreas mais versáteis da ciência, com aplicações nas mais diversas áreas do conhecimento: a teoria da informação. Para discutir essa e outras perspectivas, o Instituto de Matemática, Estatística e Computação Científica (Imecc) da Universidade Estadual de Campinas (Unicamp) realizou, de 19 a 30 de janeiro, a SPCoding School.

O evento ocorreu no âmbito do programa Escola São Paulo de Ciência Avançada (ESPCA), da FAPESP, que oferece recursos para a organização de cursos de curta duração em temas avançados de ciência e tecnologia no Estado de São Paulo.

A base da informação processada pelos computadores largamente utilizados é o bit, a menor unidade de dados que pode ser armazenada ou transmitida. Já os computadores quânticos trabalham com qubits, que seguem os parâmetros da mecânica quântica, ramo da Física que trata das dimensões próximas ou abaixo da escala atômica. Por conta disso, esses equipamentos podem realizar simultaneamente uma quantidade muito maior de cálculos.

“Esse entendimento quântico da informação atribui toda uma complexidade à sua codificação. Mas, ao mesmo tempo em que análises complexas, que levariam décadas, séculos ou até milhares de anos para serem feitas em computadores comuns, poderiam ser executadas em minutos por computadores quânticos, também essa tecnologia ameaçaria o sigilo de informações que não foram devidamente protegidas contra esse tipo de novidade”, disse Sueli Irene Rodrigues Costa, professora do IMECC, à Agência FAPESP.

A maior ameaça dos computadores quânticos à criptografia atual está na sua capacidade de quebrar os códigos usados na proteção de informações importantes, como as de cartão de crédito. Para evitar esse tipo de risco é preciso desenvolver também sistemas criptográficos visando segurança, considerando a capacidade da computação quântica.

“A teoria da informação e a codificação precisam estar um passo à frente do uso comercial da computação quântica”, disse Rodrigues Costa, que coordena o Projeto Temático “Segurança e confiabilidade da informação: teoria e prática”, apoiado pela FAPESP.

“Trata-se de uma criptografia pós-quântica. Como já foi demonstrado no final dos anos 1990, os procedimentos criptográficos atuais não sobreviverão aos computadores quânticos por não serem tão seguros. E essa urgência pelo desenvolvimento de soluções preparadas para a capacidade da computação quântica também impulsiona a teoria da informação a avançar cada vez mais em diversas direções”, disse.

Algumas dessas soluções foram tratadas ao longo da programação da SPCoding School, muitas delas visando sistemas mais eficientes para a aplicação na computação clássica, como o uso de códigos corretores de erros e de reticulados para criptografia. Para Rodrigues Costa, a escalada da teoria da informação em paralelo ao desenvolvimento da computação quântica provocará revoluções em várias áreas do conhecimento.

“A exemplo das múltiplas aplicações da teoria da informação na atualidade, a codificação quântica também elevaria diversas áreas da ciência a novos patamares por possibilitar simulações computacionais ainda mais precisas do mundo físico, lidando com uma quantidade exponencialmente maior de variáveis em comparação aos computadores clássicos”, disse Rodrigues Costa.

A teoria da informação envolve a quantificação da informação e envolve áreas como matemática, engenharia elétrica e ciência da computação. Teve como pioneiro o norte-americano Claude Shannon (1916-2001), que foi o primeiro a considerar a comunicação como um problema matemático.

Revoluções em curso

Enquanto se prepara para os computadores quânticos, a teoria da informação promove grandes modificações na codificação e na transmissão de informações. Amin Shokrollahi, da École Polytechnique Fédérale de Lausanne, na Suíça, apresentou na SPCoding School novas técnicas de codificação para resolver problemas como ruídos na informação e consumo elevado de energia no processamento de dados, inclusive na comunicação chip a chip nos aparelhos.

Shokrollahi é conhecido na área por ter inventado os códigos Raptor e coinventado os códigos Tornado, utilizados em padrões de transmissão móveis de informação, com implementações em sistemas sem fio, satélites e no método de transmissão de sinais televisivos IPTV, que usa o protocolo de internet (IP, na sigla em inglês) para transmitir conteúdo.

“O crescimento do volume de dados digitais e a necessidade de uma comunicação cada vez mais rápida aumentam a susceptibilidade a vários tipos de ruído e o consumo de energia. É preciso buscar novas soluções nesse cenário”, disse.

Shokrollahi também apresentou inovações desenvolvidas na empresa suíça Kandou Bus, da qual é diretor de pesquisa. “Utilizamos algoritmos especiais para codificar os sinais, que são todos transferidos simultaneamente até que um decodificador recupere os sinais originais. Tudo isso é feito evitando que fios vizinhos interfiram entre si, gerando um nível de ruído significativamente menor. Os sistemas também reduzem o tamanho dos chips, aumentam a velocidade de transmissão e diminuem o consumo de energia”, explicou.

De acordo com Rodrigues Costa, soluções semelhantes também estão sendo desenvolvidas em diversas tecnologias largamente utilizadas pela sociedade.

“Os celulares, por exemplo, tiveram um grande aumento de capacidade de processamento e em versatilidade, mas uma das queixas mais frequentes entre os usuários é de que a bateria não dura. Uma das estratégias é descobrir meios de codificar de maneira mais eficiente para economizar energia”, disse.

Aplicações biológicas

Não são só problemas de natureza tecnológica que podem ser abordados ou solucionados por meio da teoria da informação. Professor na City University of New York, nos Estados Unidos, Vinay Vaishampayan coordenou na SPCoding School o painel “Information Theory, Coding Theory and the Real World”, que tratou de diversas aplicações dos códigos na sociedade – entre elas, as biológicas.

“Não existe apenas uma teoria da informação e suas abordagens, entre computacionais e probabilísticas, podem ser aplicadas a praticamente todas as áreas do conhecimento. Nós tratamos no painel das muitas possibilidades de pesquisa à disposição de quem tem interesse em estudar essas interfaces dos códigos com o mundo real”, disse à Agência FAPESP.

Vaishampayan destacou a Biologia como área de grande potencial nesse cenário. “A neurociência apresenta questionamentos importantes que podem ser respondidos com a ajuda da teoria da informação. Ainda não sabemos em profundidade como os neurônios se comunicam entre si, como o cérebro funciona em sua plenitude e as redes neurais são um campo de estudo muito rico também do ponto de vista matemático, assim como a Biologia Molecular”, disse.

Isso porque, de acordo com Max Costa, professor da Faculdade de Engenharia Elétrica e de Computação da Unicamp e um dos palestrantes, os seres vivos também são feitos de informação.

“Somos codificados por meio do DNA das nossas células. Descobrir o segredo desse código, o mecanismo que há por trás dos mapeamentos que são feitos e registrados nesse contexto, é um problema de enorme interesse para a compreensão mais profunda do processo da vida”, disse.

Para Marcelo Firer, professor do Imecc e coordenador da SPCoding School, o evento proporcionou a estudantes e pesquisadores de diversos campos novas possibilidades de pesquisa.

“Os participantes compartilharam oportunidades de engajamento em torno dessas e muitas outras aplicações da Teoria da Informação e da Codificação. Foram oferecidos desde cursos introdutórios, destinados a estudantes com formação matemática consistente, mas não necessariamente familiarizados com codificação, a cursos de maior complexidade, além de palestras e painéis de discussão”, disse Firer, membro da coordenação da área de Ciência e Engenharia da Computação da FAPESP.

Participaram do evento cerca de 120 estudantes de 70 universidades e 25 países. Entre os palestrantes estrangeiros estiveram pesquisadores do California Institute of Technology (Caltech), da Maryland University e da Princeton University, nos Estados Unidos; da Chinese University of Hong Kong, na China; da Nanyang Technological University, em Cingapura; da Technische Universiteit Eindhoven, na Holanda; da Universidade do Porto, em Portugal; e da Tel Aviv University, em Israel.

Mais informações em

Monitoramento e análise de dados – A crise nos mananciais de São Paulo (Probabit)

Situação 25.1.2015

4,2 milímetros de chuva em 24.1.2015 nos reservatórios de São Paulo (média ponderada).

305 bilhões de litros (13,60%) de água em estoque. Em 24 horas, o volume subiu 4,4 bilhões de litros (0,19%).

134 dias até acabar toda a água armazenada, com chuvas de 996 mm/ano e mantida a eficiência corrente do sistema.

66% é a redução no consumo necessária para equilibrar o sistema nas condições atuais e 33% de perdas na distribuição.

Para entender a crise

Como ler este gráfico?

Os pontos no gráfico mostram 4040 intervalos de 1 ano para o acumulado de chuva e a variação no estoque total de água (do dia 1º de janeiro de 2003/2004 até hoje). O padrão mostra que mais chuva faz o estoque variar para cima e menos chuva para baixo, como seria de se esperar.

Este e os demais gráficos desta página consideram sempre a capacidade total de armazenamento de água em São Paulo (2,24 trilhões de litros), isto é, a soma dos reservatórios dos Sistemas Cantareira, Alto Tietê, Guarapiranga, Cotia, Rio Grande e Rio Claro. Quer explorar os dados?

A região de chuva acumulada de 1.400 mm a 1.600 mm ao ano concentra a maioria dos pontos observados de 2003 para cá. É para esse padrão usual de chuvas que o sistema foi projetado. Nessa região, o sistema opera sem grandes desvios de seu equilíbrio: máximo de 15% para cima ou para baixo em um ano. Por usar como referência a variação em 1 ano, esse modo de ver os dados elimina a oscilação sazonal de chuvas e destaca as variações climáticas de maior amplitude. Ver padrões ano a ano.

Uma segunda camada de informação no mesmo gráfico são as zonas de risco. A zona vermelha é delimitada pelo estoque atual de água em %. Todos os pontos dentro dessa área (com frequência indicada à direita) representam, portanto, situações que se repetidas levarão ao colapso do sistema em menos de 1 ano. A zona amarela mostra a incidência de casos que se repetidos levarão à diminuição do estoque. Só haverá recuperação efetiva do sistema se ocorrerem novos pontos acima da faixa amarela.

Para contextualizar o momento atual e dar uma ideia de tendência, pontos interligados em azul destacam a leitura adicionada hoje (acumulado de chuva e variação entre hoje e mesmo dia do ano passado) e as leituras de 30, 60 e 90 atrás (em tons progressivamente mais claros).

Discussão a partir de um modelo simples

O ajuste de um modelo linear aos casos observados mostra que existe uma razoável correlação entre o acumulado de chuva e a variação no estoque hídrico, como o esperado.

Ao mesmo tempo, fica clara a grande dispersão de comportamento do sistema, especialmente na faixa de chuvas entre 1.400 mm e 1.500 mm. Acima de 1.600 mm há dois caminhos bem separados, o inferior corresponde ao perído entre 2009 e 2010 quando os reservatórios ficaram cheios e não foi possível estocar a chuva excedente.

Além de uma gestão deliberadamente mais ou menos eficiente da água disponível, podem contribuir para as flutuações observadas as variações combinadas no consumo, nas perdas e na efetividade da captação de água. Entretanto, não há dados para examinarmos separadamente o efeito de cada uma dessas variáveis.

Simulação 1: Efeito do aumento do estoque de água

Nesta simulação foi hipoteticamente incluído no sistema de abastecimento a reserva adicional da represa Billings, com volume de 998 bilhões de litros (já descontados o braço “potável” do reservatório Rio Grande).

Aumentar o estoque disponível não muda o ponto de equilíbrio, mas altera a inclinação da reta que representa a relação entre a chuva e a variação no estoque. A diferença de inclinação entre a linha azul (simulada) e a vermelha (real) mostra o efeito da ampliação do estoque.

Se a Billings não fosse hoje um depósito gigante de esgotos, poderíamos estar fora da situação crítica. Entretanto, vale enfatizar que o simples aumento de estoque não é capaz de evitar indefinidamente a escassez se a quantidade de chuva persistir abaixo do ponto de equilíbrio.

Simulação 2: Efeito da melhoria na eficiência

O único modo de manter o estoque estável quando as chuvas se tornam mais escassas é mudar a ‘curva de eficiência’ do sistema. Em outras palavras, é preciso consumir menos e se adaptar a uma menor entrada de água no sistema.

A linha azul no gráfico ao lado indica o eixo ao redor do qual os pontos precisariam flutuar para que o sistema se equilibrasse com uma oferta anual de 1.200 mm de chuva.

A melhoria da eficiência pode ser alcançada por redução no consumo, redução nas perdas e melhoria na tecnologia de captação de água (por exemplo pela recuperação das matas ciliares e nascentes em torno dos mananciais).

Se persistir a situação desenhada de 2013 a 2015, com chuvas em torno de 1.000 mm será necessário atingir uma curva de eficiência que está muito distante do que já se conseguiu praticar, acima mesmo dos melhores casos já observados.

Com o equilíbrio de “projeto” em torno de 1.500 mm, a conta é mais ou menos assim: a Sabesp perde 500 mm (33% da água distribuída), a população consume 1.000 mm. Para chegar rapidamente ao equilíbrio em 1.000 mm, o consumo deveria ser de 500 mm, uma vez que as perdas não poderão ser rapidamente evitadas e acontecem antes do consumo.

Se 1/3 da água distribuída não fosse sistematicamente perdida não haveria crise. Os 500 mm de chuva disperdiçados anualmente pela precariedade do sistema de distribução não fazem falta quando chove 1.500 mm, mas com 1.000 mm cada litro jogado fora de um lado é um litro que terá de ser economizado do outro.

Simulação 3: Eficiência corrente e economia necessária

Para estimar a eficiência corrente são usadas as últimas 120 observações do comportamento do sistema.

A curva de eficiência corrente permite estimar o ponto de equilíbrio atual do sistema (ponto vermelho em destaque).

O ponto azul indica a última observação do acumulado anual de chuvas. A diferença entre os dois mede o tamanho do desequilíbrio.

Apenas para estancar a perda de água do sistema, é preciso reduzir em 49% o fluxo de retirada. Como esse fluxo inclui todas as perdas, se depender apenas da redução no consumo, a economia precisa ser de 66% se as perdas forem de 33%, ou de 56% se as perdas forem de 17%.

Parece incrível que a eficiência do sistema esteja tão baixa em meio a uma crise tão grave. A tentativa de contenção no consumo está aumentando o consumo? Volumes menores e mais rasos evaporam mais? As pessoas ainda não perceberam a tamanho do desastre?


Supondo que novos estoques de água não serão incorporados no curto prazo, o prognóstico sobre se e quando a água vai acabar depende da quantidade de chuva e da eficiência do sistema.

O gráfico mostra quantos dias restam de água em função do acumulado de chuva, considerando duas curvas de eficiência: a média e a corrente (estimada a partir dos últimos 120 dias).

O ponto em destaque considera a observação mais recente de chuva acumulada no ano e mostra quantos dias restam de água se persistirem as condições atuais de chuva e de eficiência.

O prognóstico é uma referência que varia de acordo com as novas observações e não tem probabilidade definida. Trata-se de uma projeção para melhor visualizar as condições necessárias para escapar do colapso.

Porém, lembrando que a média histórica de chuvas em São Paulo é de 1.441 mm ao ano, uma curva que cruze esse limite significa um sistema com mais de 50% de chances de colapsar em menos de um ano. Somos capazes de evitar o desastre?

Os dados

O ponto de partida são os dados divulgados diariamente pela Sabesp. A série de dados original atualizada está disponível aqui.

Porém, há duas importantes limitações nesses dados que podem distorcer a interpretação da realidade: 1) a Sabesp usa somente porcentagens para se referir a reservatórios com volumes totais muito diferentes; 2) a entrada de novos volumes não altera a base-de-cálculo sobre o qual essa porcentagem é medida.

Por isso, foi necessário corrigir as porcentagens da série de dados original em relação ao volume total atual, uma vez que os volumes que não eram acessíveis se tornaram acessíveis e, convenhamos, sempre estiveram lá nas represas. A série corrigida pode ser obtida aqui. Ela contém uma coluna adicional com os dados dos volumes reais (em bilhões de litros: hm3)

Além disso, decidimos tratar os dados de forma consolidada, como se toda a água estivesse em um único grande reservatório. A série de dados usada para gerar os gráficos desta página contém apenas a soma ponderada do estoque (%) e da chuva (mm) diários e também está disponível.

As correções realizadas eliminam os picos causados pelas entradas dos volumes mortos e permitem ver com mais clareza o padrão de queda do estoque em 2014.

Padrões ano a ano

Média e quartis do estoque durante o ano

Sobre este estudo

Preocupado com a escassez de água, comecei a estudar o problema ao final de 2014. Busquei uma abordagem concisa e consistente de apresentar os dados, dando destaque para as três variáveis que realmente importam: a chuva, o estoque total e a eficiência do sistema. O site entrou no ar em 16 de janeiro de 2015. Todos os dias, os modelos e os gráficos são refeitos com as novas informações.

Espero que esta página ajude a informar a real dimensão da crise da água em São Paulo e estimule mais ações para o seu enfrentamento.

Mauro Zackiewicz

scientia probabitlaboratório de dados essenciais

The Cathedral of Computation (The Atlantic)

We’re not living in an algorithmic culture so much as a computational theocracy.

Algorithms are everywhere, supposedly. We are living in an “algorithmic culture,” to use the author and communication scholar Ted Striphas’s name for it. Google’s search algorithms determine how we access information. Facebook’s News Feed algorithms determine how we socialize. Netflix’s and Amazon’s collaborative filtering algorithms choose products and media for us. You hear it everywhere. “Google announced a change to its algorithm,” a journalist reports. “We live in a world run by algorithms,” a TED talk exhorts. “Algorithms rule the world,” a news report threatens. Another upgrades rule to dominion: “The 10 Algorithms that Dominate Our World.”

Here’s an exercise: The next time you hear someone talking about algorithms, replace the term with “God” and ask yourself if the meaning changes. Our supposedly algorithmic culture is not a material phenomenon so much as a devotional one, a supplication made to the computers people have allowed to replace gods in their minds, even as they simultaneously claim that science has made us impervious to religion.

It’s part of a larger trend. The scientific revolution was meant to challenge tradition and faith, particularly a faith in religious superstition. But today, Enlightenment ideas like reason and science are beginning to flip into their opposites. Science and technology have become so pervasive and distorted, they have turned into a new type of theology.

The worship of the algorithm is hardly the only example of the theological reversal of the Enlightenment—for another sign, just look at the surfeit of nonfiction books promising insights into “The Science of…” anything, from laughter to marijuana. But algorithms hold a special station in the new technological temple because computers have become our favorite idols.

In fact, our purported efforts to enlighten ourselves about algorithms’ role in our culture sometimes offer an unexpected view into our zealous devotion to them. The media scholar Lev Manovich had this to say about “The Algorithms of Our Lives”:

Software has become a universal language, the interface to our imagination and the world. What electricity and the combustion engine were to the early 20th century, software is to the early 21st century. I think of it as a layer that permeates contemporary societies.

This is a common account of algorithmic culture, that software is a fundamental, primary structure of contemporary society. And like any well-delivered sermon, it seems convincing at first. Until we think a little harder about the historical references Manovich invokes, such as electricity and the engine, and how selectively those specimens characterize a prior era. Yes, they were important, but is it fair to call them paramount and exceptional?

It turns out that we have a long history of explaining the present via the output of industry. These rationalizations are always grounded in familiarity, and thus they feel convincing. But mostly they are metaphorsHere’s Nicholas Carr’s take on metaphorizing progress in terms of contemporary technology, from the 2008 Atlantic cover story that he expanded into his bestselling book The Shallows:

The process of adapting to new intellectual technologies is reflected in the changing metaphors we use to explain ourselves to ourselves. When the mechanical clock arrived, people began thinking of their brains as operating “like clockwork.” Today, in the age of software, we have come to think of them as operating “like computers.”

Carr’s point is that there’s a gap between the world and the metaphors people use to describe that world. We can see how erroneous or incomplete or just plain metaphorical these metaphors are when we look at them in retrospect.

Take the machine. In his book Images of Organization, Gareth Morgan describes the way businesses are seen in terms of different metaphors, among them the organization as machine, an idea that forms the basis for Taylorism.

Gareth Morgan’s metaphors of organization (Venkatesh Rao/Ribbonfarm)

We can find similar examples in computing. For Larry Lessig, the accidental homophony between “code” as the text of a computer program and “code” as the text of statutory law becomes the fulcrum on which his argument that code is an instrument of social control balances.

Each generation, we reset a belief that we’ve reached the end of this chain of metaphors, even though history always proves us wrong precisely because there’s always another technology or trend offering a fresh metaphor. Indeed, an exceptionalism that favors the present is one of the ways that science has become theology.

In fact, Carr fails to heed his own lesson about the temporariness of these metaphors. Just after having warned us that we tend to render current trends into contingent metaphorical explanations, he offers a similar sort of definitive conclusion:

Today, in the age of software, we have come to think of them as operating “like computers.” But the changes, neuroscience tells us, go much deeper than metaphor. Thanks to our brain’s plasticity, the adaptation occurs also at a biological level.

As with the machinic and computational metaphors that he critiques, Carr settles on another seemingly transparent, truth-yielding one. The real firmament is neurological, and computers are fitzing with our minds, a fact provable by brain science. And actually, software and neuroscience enjoy a metaphorical collaboration thanks to artificial intelligence’s idea that computing describes or mimics the brain. Compuplasting-as-thought reaches the rank of religious fervor when we choose to believe, as some do, that we can simulate cognition through computation and achieve the singularity.

* * *

The metaphor of mechanical automation has always been misleading anyway, with or without the computation. Take manufacturing. The goods people buy from Walmart appear safely ensconced in their blister packs, as if magically stamped out by unfeeling, silent machines (robots—those original automata—themselves run by the tinier, immaterial robots algorithms).

But the automation metaphor breaks down once you bother to look at how even the simplest products are really produced. The photographer Michael Wolf’s images of Chinese factory workers and the toys they fabricate show that finishing consumer goods to completion requires intricate, repetitive human effort.

Michael Wolf Photography

Eyelashes must be glued onto dolls’ eyelids. Mickey Mouse heads must be shellacked. Rubber ducky eyes must be painted white. The same sort of manual work is required to create more complex goods too. Like your iPhone—you know, the one that’s designed in California but “assembled in China.” Even though injection-molding machines and other automated devices help produce all the crap we buy, the metaphor of the factory-as-automated machine obscures the fact that manufacturing isn’t as machinic nor as automated as we think it is.

The algorithmic metaphor is just a special version of the machine metaphor, one specifying a particular kind of machine (the computer) and a particular way of operating it (via a step-by-step procedure for calculation). And when left unseen, we are able to invent a transcendental ideal for the algorithm. The canonical algorithm is not just a model sequence but a concise and efficient one. In its ideological, mythic incarnation, the ideal algorithm is thought to be some flawless little trifle of lithe computer code, processing data into tapestry like a robotic silkworm. A perfect flower, elegant and pristine, simple and singular. A thing you can hold in your palm and caress. A beautiful thing. A divine one.

But just as the machine metaphor gives us a distorted view of automated manufacture as prime mover, so the algorithmic metaphor gives us a distorted, theological view of computational action.

“The Google search algorithm” names something with an initial coherence that quickly scurries away once you really look for it. Googling isn’t a matter of invoking a programmatic subroutine—not on its own, anyway. Google is a monstrosity. It’s a confluence of physical, virtual, computational, and non-computational stuffs—electricity, data centers, servers, air conditioners, security guards, financial markets—just like the rubber ducky is a confluence of vinyl plastic, injection molding, the hands and labor of Chinese workers, the diesel fuel of ships and trains and trucks, the steel of shipping containers.

Once you start looking at them closely, every algorithm betrays the myth of unitary simplicity and computational purity. You may remember the Netflix Prize, a million dollar competition to build a better collaborative filtering algorithm for film recommendations. In 2009, the company closed the book on the prize, adding a faux-machined “completed” stamp to its website.

But as it turns out, that method didn’t really improve Netflix’s performance very much. The company ended up downplaying the ratings and instead using something different to manage viewer preferences: very specific genres like “Emotional Hindi-Language Movies for Hopeless Romantics.” Netflix calls them “altgenres.”

An example of a Netflix altgenre in action (tumblr/Genres of Netflix)

While researching an in-depth analysis of altgenres published a year ago at The Atlantic, Alexis Madrigal scraped the Netflix site, downloading all 76,000+ micro-genres using not an algorithm but a hackneyed, long-running screen-scraping apparatus. After acquiring the data, Madrigal and I organized and analyzed it (by hand), and I built a generator that allowed our readers to fashion their own altgenres based on different grammars (like “Deep Sea Forbidden Love Mockumentaries” or “Coming-of-Age Violent Westerns Set in Europe About Cats”).

Netflix VP Todd Yellin explained to Madrigal why the process of generating altgenres is no less manual than our own process of reverse engineering them. Netflix trains people to watch films, and those viewers laboriously tag the films with lots of metadata, including ratings of factors like sexually suggestive content or plot closure. These tailored altgenres are then presented to Netflix customers based on their prior viewing habits.

One of the hypothetical, “gonzo” altgenres created by The Atlantic‘s Netflix Genre Generator (The Atlantic)

Despite the initial promise of the Netflix Prize and the lurid appeal of a “million dollar algorithm,” Netflix operates by methods that look more like the Chinese manufacturing processes Michael Wolf’s photographs document. Yes, there’s a computer program matching viewing habits to a database of film properties. But the overall work of the Netflix recommendation system is distributed amongst so many different systems, actors, and processes that only a zealot would call the end result an algorithm.

The same could be said for data, the material algorithms operate upon. Data has become just as theologized as algorithms, especially “big data,” whose name is meant to elevate information to the level of celestial infinity. Today, conventional wisdom would suggest that mystical, ubiquitous sensors are collecting data by the terabyteful without our knowledge or intervention. Even if this is true to an extent, examples like Netflix’s altgenres show that data is created, not simply aggregated, and often by means of laborious, manual processes rather than anonymous vacuum-devices.

Once you adopt skepticism toward the algorithmic- and the data-divine, you can no longer construe any computational system as merely algorithmic. Think about Google Maps, for example. It’s not just mapping software running via computer—it also involves geographical information systems, geolocation satellites and transponders, human-driven automobiles, roof-mounted panoramic optical recording systems, international recording and privacy law, physical- and data-network routing systems, and web/mobile presentational apparatuses. That’s not algorithmic culture—it’s just, well, culture.

* * *

If algorithms aren’t gods, what are they instead? Like metaphors, algorithms are simplifications, or distortions. They are caricatures. They take a complex system from the world and abstract it into processes that capture some of that system’s logic and discard others. And they couple to other processes, machines, and materials that carry out the extra-computational part of their work.

Unfortunately, most computing systems don’t want to admit that they are burlesques. They want to be innovators, disruptors, world-changers, and such zeal requires sectarian blindness. The exception is games, which willingly admit that they are caricatures—and which suffer the consequences of this admission in the court of public opinion. Games know that they are faking it, which makes them less susceptible to theologization. SimCity isn’t an urban planning tool, it’s  a cartoon of urban planning. Imagine the folly of thinking otherwise! Yet, that’s precisely the belief people hold of Google and Facebook and the like.

A Google Maps Street View vehicle roams the streets of Washington D.C. Google Maps entails algorithms, but also other things, like internal combustion engine automobiles. (justgrimes/Flickr)

Just as it’s not really accurate to call the manufacture of plastic toys “automated,” it’s not quite right to call Netflix recommendations or Google Maps “algorithmic.” Yes, true, there are algorithmsw involved, insofar as computers are involved, and computers run software that processes information. But that’s just a part of the story, a theologized version of the diverse, varied array of people, processes, materials, and machines that really carry out the work we shorthand as “technology.” The truth is as simple as it is uninteresting: The world has a lot of stuff in it, all bumping and grinding against one another.

I don’t want to downplay the role of computation in contemporary culture. Striphas and Manovich are right—there are computers in and around everything these days. But the algorithm has taken on a particularly mythical role in our technology-obsessed era, one that has allowed it wear the garb of divinity. Concepts like “algorithm” have become sloppy shorthands, slang terms for the act of mistaking multipart complex systems for simple, singular ones. Of treating computation theologically rather than scientifically or culturally.

This attitude blinds us in two ways. First, it allows us to chalk up any kind of computational social change as pre-determined and inevitable. It gives us an excuse not to intervene in the social shifts wrought by big corporations like Google or Facebook or their kindred, to see their outcomes as beyond our influence. Second, it makes us forget that particular computational systems are abstractions, caricatures of the world, one perspective among many. The first error turns computers into gods, the second treats their outputs as scripture.

Computers are powerful devices that have allowed us to mimic countless other machines all at once. But in so doing, when pushed to their limits, that capacity to simulate anything reverses into the inability or unwillingness to distinguish one thing from anything else. In its Enlightenment incarnation, the rise of reason represented not only the ascendency of science but also the rise of skepticism, of incredulity at simplistic, totalizing answers, especially answers that made appeals to unseen movers. But today even as many scientists and technologists scorn traditional religious practice, they unwittingly invoke a new theology in so doing.

Algorithms aren’t gods. We need not believe that they rule the world in order to admit that they influence it, sometimes profoundly. Let’s bring algorithms down to earth again. Let’s keep the computer around without fetishizing it, without bowing down to it or shrugging away its inevitable power over us, without melting everything down into it as a new name for fate. I don’t want an algorithmic culture, especially if that phrase just euphemizes a corporate, computational theocracy.

But a culture with computers in it? That might be all right.

Latour on digital methods (Installing [social] order)


In a fascinating, apparently not-peer-reviewed non-article available free online here, Tommaso Venturini and Bruno Latour discuss the potential of “digital methods” for the contemporary social sciences.

The paper summarizes, and quite nicely, the split of sociological methods to the statistical aggregate using quantitative methods (capturing supposedly macro-phenomenon) and irreducibly basic interactions using qualitative methods (capturing supposedly micro-phenomenon). The problem is that neither of which aided the sociologist in capture emergent phenomenon, that is, capturing controversies and events as they happen rather than estimate them after they have emerged (quantitative macro structures) or capture them divorced from non-local influences (qualitative micro phenomenon).

The solution, they claim, is to adopt digital methods in the social sciences. The paper is not exactly a methodological outline of how to accomplish these methods, but there is something of a justification available for it, and it sounds something like this:

Thanks to digital traceability, researchers no longer need to choose between precision and scope in their observations: it is now possible to follow a multitude of interactions and, simultaneously, to distinguish the specific contribution that each one makes to the construction of social phenomena. Born in an era of scarcity, the social sciences are entering an age of abundance. In the face of the richness of these new data, nothing justifies keeping old distinctions. Endowed with a quantity of data comparable to the natural sciences, the social sciences can finally correct their lazy eyes and simultaneously maintain the focus and scope of their observations.

The Creepy New Wave of the Internet (NY Review of Books)

Sue Halpern


The Zero Marginal Cost Society: The Internet of Things, the Collaborative Commons, and the Eclipse of Capitalism
by Jeremy Rifkin
Palgrave Macmillan, 356 pp., $28.00

Enchanted Objects: Design, Human Desire, and the Internet of Things
by David Rose
Scribner, 304 pp., $28.00

Age of Context: Mobile, Sensors, Data and the Future of Privacy
by Robert Scoble and Shel Israel, with a foreword by Marc Benioff
Patrick Brewster, 225 pp., $14.45 (paper)

More Awesome Than Money: Four Boys and Their Heroic Quest to Save Your Privacy from Facebook
by Jim Dwyer
Viking, 374 pp., $27.95

A detail of Penelope Umbrico’s Sunset Portraits from 11,827,282 Flickr Sunsets on 1/7/13, 2013. For the project, Umbrico searched the website Flickr for scenes of sunsets in which the sun, not the subject, predominated. The installation, consisting of two thousand 4 x 6 C-prints, explores the idea that ‘the individual assertion of “being here” is ultimately read as a lack of individuality when faced with so many assertions that are more or less all the same.’ A collection of her work, Penelope Umbrico (photographs), was published in 2011 by Aperture.

Every day a piece of computer code is sent to me by e-mail from a website to which I subscribe called IFTTT. Those letters stand for the phrase “if this then that,” and the code is in the form of a “recipe” that has the power to animate it. Recently, for instance, I chose to enable an IFTTT recipe that read, “if the temperature in my house falls below 45 degrees Fahrenheit, then send me a text message.” It’s a simple command that heralds a significant change in how we will be living our lives when much of the material world is connected—like my thermostat—to the Internet.

It is already possible to buy Internet-enabled light bulbs that turn on when your car signals your home that you are a certain distance away and coffeemakers that sync to the alarm on your phone, as well as WiFi washer-dryers that know you are away and periodically fluff your clothes until you return, and Internet-connected slow cookers, vacuums, and refrigerators. “Check the morning weather, browse the web for recipes, explore your social networks or leave notes for your family—all from the refrigerator door,” reads the ad for one.

Welcome to the beginning of what is being touted as the Internet’s next wave by technologists, investment bankers, research organizations, and the companies that stand to rake in some of an estimated $14.4 trillion by 2022—what they call the Internet of Things (IoT). Cisco Systems, which is one of those companies, and whose CEO came up with that multitrillion-dollar figure, takes it a step further and calls this wave “the Internet of Everything,” which is both aspirational and telling. The writer and social thinker Jeremy Rifkin, whose consulting firm is working with businesses and governments to hurry this new wave along, describes it like this:

The Internet of Things will connect every thing with everyone in an integrated global network. People, machines, natural resources, production lines, logistics networks, consumption habits, recycling flows, and virtually every other aspect of economic and social life will be linked via sensors and software to the IoT platform, continually feeding Big Data to every node—businesses, homes, vehicles—moment to moment, in real time. Big Data, in turn, will be processed with advanced analytics, transformed into predictive algorithms, and programmed into automated systems to improve thermodynamic efficiencies, dramatically increase productivity, and reduce the marginal cost of producing and delivering a full range of goods and services to near zero across the entire economy.

In Rifkin’s estimation, all this connectivity will bring on the “Third Industrial Revolution,” poised as he believes it is to not merely redefine our relationship to machines and their relationship to one another, but to overtake and overthrow capitalism once the efficiencies of the Internet of Things undermine the market system, dropping the cost of producing goods to, basically, nothing. His recent book, The Zero Marginal Cost Society: The Internet of Things, the Collaborative Commons, and the Eclipse of Capitalism, is a paean to this coming epoch.

It is also deeply wishful, as many prospective arguments are, even when they start from fact. And the fact is, the Internet of Things is happening, and happening quickly. Rifkin notes that in 2007 there were ten million sensors of all kinds connected to the Internet, a number he says will increase to 100 trillion by 2030. A lot of these are small radio-frequency identification (RFID) microchips attached to goods as they crisscross the globe, but there are also sensors on vending machines, delivery trucks, cattle and other farm animals, cell phones, cars, weather-monitoring equipment, NFL football helmets, jet engines, and running shoes, among other things, generating data meant to streamline, inform, and increase productivity, often by bypassing human intervention. Additionally, the number of autonomous Internet-connected devices such as cell phones—devices that communicate directly with one another—now doubles every five years, growing from 12.5 billion in 2010 to an estimated 25 billion next year and 50 billion by 2020.

For years, a cohort of technologists, most notably Ray Kurzweil, the writer, inventor, and director of engineering at Google, have been predicting the day when computer intelligence surpasses human intelligence and merges with it in what they call the Singularity. We are not there yet, but a kind of singularity is already upon us as we swallow pills embedded with microscopic computer chips, activated by stomach acids, that will be able to report compliance with our doctor’s orders (or not) directly to our electronic medical records. Then there is the singularity that occurs when we outfit our bodies with “wearable technology” that sends data about our physical activity, heart rate, respiration, and sleep patterns to a database in the cloud as well as to our mobile phones and computers (and to Facebook and our insurance company and our employer).

Cisco Systems, for instance, which is already deep into wearable technology, is working on a platform called “the Connected Athlete” that “turns the athlete’s body into a distributed system of sensors and network intelligence…[so] the athlete becomes more than just a competitor—he or she becomes a Wireless Body Area Network, or WBAN.” Wearable technology, which generated $800 million in 2013, is expected to make nearly twice that this year. These are numbers that not only represent sales, but the public’s acceptance of, and habituation to, becoming one of the things connected to and through the Internet.

One reason that it has been easy to miss the emergence of the Internet of Things, and therefore miss its significance, is that much of what is presented to the public as its avatars seems superfluous and beside the point. An alarm clock that emits the scent of bacon, a glow ball that signals if it is too windy to go out sailing, and an “egg minder” that tells you how many eggs are in your refrigerator no matter where you are in the (Internet-connected) world, revolutionary as they may be, hardly seem the stuff of revolutions; because they are novelties, they obscure what is novel about them.

And then there is the creepiness factor. In the weeks before the general release of Google Glass, Google’s $1,500 see-through eyeglass computer that lets the wearer record what she is seeing and hearing, the press reported a number of incidents in which early adopters were physically accosted by people offended by the product’s intrusiveness. Enough is enough, the Glass opponents were saying.

Why a small cohort of people encountering Google Glass for the first time found it disturbing is the same reason that David Rose, an instructor at MIT and the founder of a company that embeds Internet connectivity into everyday devices like umbrellas and medicine vials, celebrates it and waxes nearly poetic on the potential of “heads up displays.” As he writes in Enchanted Objects: Design, Human Desire, and the Internet of Things, such devices have the potential to radically transform human encounters. Rose imagines a party where

Wearing your fashionable [heads up] display, you will instruct the device to display the people’s names and key biographical info above their heads. In the business meeting, you will call up information about previous meetings and agenda items. The HUD display will call up useful websites, tap into social networks, and dig into massive info sources…. You will fact-check your friends and colleagues…. You will also engage in real-time messaging, including videoconferencing with friends or colleagues who will participate, coach, consult, or lurk.
Whether this scenario excites or repels you, it represents the vision of more than one of the players moving us in the direction of pervasive connectivity. Rose’s company, Ambient Devices, has been at the forefront of what he calls “enchanting” objects—that is, connecting them to the Internet to make them “extraordinary.” This is a task that Glenn Lurie, the CEO of ATT Mobility, believes is “spot on.” Among these enchanted objects are the Google Latitude Doorbell that “lets you know where your family members are and when they are approaching home,” an umbrella that turns blue when it is about to rain so you might be inspired to take it with you, and a jacket that gives you a hug every time someone likes your Facebook post.

Rose envisions “an enchanted wall in your kitchen that could display, through lines of colored light, the trends and patterns of your loved ones’ moods,” because it will offer “a better understanding of [the] hidden thoughts and emotions that are relevant to us….” If his account of a mood wall seems unduly fanciful (and nutty), it should be noted that this summer, British Airways gave passengers flying from New York to London blankets embedded with neurosensors to track how they were feeling. Apparently this was more scientific than simply asking them. According to one report:

When the fiber optics woven into the blanket turned red, flight attendants knew that the passengers were feeling stressed and anxious. Blue blankets were a sign that the passenger was feeling calm and relaxed.
Thus the airline learned that passengers were happiest when eating and drinking, and most relaxed when sleeping.

While, arguably, this “finding” is as trivial as an umbrella that turns blue when it’s going to rain, there is nothing trivial about collecting personal data, as innocuous as that data may seem. It takes very little imagination to foresee how the kitchen mood wall could lead to advertisements for antidepressants that follow you around the Web, or trigger an alert to your employer, or show up on your Facebook page because, according to Robert Scoble and Shel Israel in Age of Context: Mobile, Sensors, Data and the Future of Privacy, Facebook “wants to build a system that anticipates your needs.”

It takes even less imagination to foresee how information about your comings and goings obtained from the Google Latitude Doorbell could be used in a court of law. Cars are now outfitted with scores of sensors, including ones in the seats that determine how many passengers are in them, as well as with an “event data recorder” (EDR), which is the automobile equivalent of an airplane’s black box. As Scoble and Israel report in Age of Context, “the general legal consensus is that police will be able to subpoena car logs the same way they now subpoena phone records.”

Meanwhile, cars themselves are becoming computers on wheels, with operating system updates coming wirelessly over the air, and with increasing capacity to “understand” their owners. As Scoble and Israel tell it:

They not only adjust seat positions and mirrors automatically, but soon they’ll also know your preferences in music, service stations, dining spots and hotels…. They know when you are headed home, and soon they’ll be able to remind you to stop at the market to get a dessert for dinner.
Recent revelations from the journalist Glenn Greenwald put the number of Americans under government surveillance at a colossal 1.2 million people. Once the Internet of Things is in place, that number might easily expand to include everyone else, because a system that can remind you to stop at the market for dessert is a system that knows who you are and where you are and what you’ve been doing and with whom you’ve been doing it. And this is information we give out freely, or unwittingly, and largely without question or complaint, trading it for convenience, or what passes for convenience.

Michael Cogliantry
The journalist A.J. Jacobs wearing data-collecting sensors to keep track of his health and fitness; from Rick Smolan and Jennifer Erwitt’s The Human Face of Big Data, published in 2012 by Against All Odds
In other words, as human behavior is tracked and merchandized on a massive scale, the Internet of Things creates the perfect conditions to bolster and expand the surveillance state. In the world of the Internet of Things, your car, your heating system, your refrigerator, your fitness apps, your credit card, your television set, your window shades, your scale, your medications, your camera, your heart rate monitor, your electric toothbrush, and your washing machine—to say nothing of your phone—generate a continuous stream of data that resides largely out of reach of the individual but not of those willing to pay for it or in other ways commandeer it.

That is the point: the Internet of Things is about the “dataization” of our bodies, ourselves, and our environment. As a post on the tech website Gigaom put it, “The Internet of Things isn’t about things. It’s about cheap data.” Lots and lots of it. “The more you tell the world about yourself, the more the world can give you what you want,” says Sam Lessin, the head of Facebook’s Identity Product Group. It’s a sentiment shared by Scoble and Israel, who write:

The more the technology knows about you, the more benefits you will receive. That can leave you with the chilling sensation that big data is watching you. In the vast majority of cases, we believe the coming benefits are worth that trade-off.
So, too, does Jeremy Rifkin, who dismisses our legal, social, and cultural affinity for privacy as, essentially, a bourgeois affectation—a remnant of the enclosure laws that spawned capitalism:

Connecting everyone and everything in a neural network brings the human race out of the age of privacy, a defining characteristic of modernity, and into the era of transparency. While privacy has long been considered a fundamental right, it has never been an inherent right. Indeed, for all of human history, until the modern era, life was lived more or less publicly….
In virtually every society that we know of before the modern era, people bathed together in public, often urinated and defecated in public, ate at communal tables, frequently engaged in sexual intimacy in public, and slept huddled together en masse. It wasn’t until the early capitalist era that people began to retreat behind locked doors.
As anyone who has spent any time on Facebook knows, transparency is a fiction—literally. Social media is about presenting a curated self; it is opacity masquerading as transparency. In a sense, then, it is about preserving privacy. So when Rifkin claims that for young people, “privacy has lost much of its appeal,” he is either confusing sharing (as in sharing pictures of a vacation in Spain) with openness, or he is acknowledging that young people, especially, have become inured to the trade-offs they are making to use services like Facebook. (But they are not completely inured to it, as demonstrated by both Jim Dwyer’s painstaking book More Awesome Than Money, about the failed race to build a noncommercial social media site called Diaspora in 2010, as well as the overwhelming response—as many as 31,000 requests an hour for invitations—to the recent announcement that there soon will be a Facebook alternative, Ello, that does not collect or sell users’ data.)

These trade-offs will only increase as the quotidian becomes digitized, leaving fewer and fewer opportunities to opt out. It’s one thing to edit the self that is broadcast on Facebook and Twitter, but the Internet of Things, which knows our viewing habits, grooming rituals, medical histories, and more, allows no such interventions—unless it is our behaviors and curiosities and idiosyncracies themselves that end up on the cutting room floor.

Even so, no matter what we do, the ubiquity of the Internet of Things is putting us squarely in the path of hackers, who will have almost unlimited portals into our digital lives. When, last winter, cybercriminals broke into more than 100,000 Internet-enabled appliances including refrigerators and sent out 750,000 spam e-mails to their users, they demonstrated just how vulnerable Internet-connected machines are.

Not long after that, Forbes reported that security researchers had come up with a $20 tool that was able to remotely control a car’s steering, brakes, acceleration, locks, and lights. It was an experiment that, again, showed how simple it is to manipulate and sabotage the smartest of machines, even though—but really because—a car is now, in the words of a Ford executive, a “cognitive device.”

More recently, a study of ten popular IoT devices by the computer company Hewlett-Packard uncovered a total of 250 security flaws among them. As Jerry Michalski, a former tech industry analyst and founder of the REX think tank, observed in a recent Pew study: “Most of the devices exposed on the internet will be vulnerable. They will also be prone to unintended consequences: they will do things nobody designed for beforehand, most of which will be undesirable.”

Breaking into a home system so that the refrigerator will send out spam that will flood your e-mail and hacking a car to trigger a crash are, of course, terrible and real possibilities, yet as bad as they may be, they are limited in scope. As IoT technology is adopted in manufacturing, logistics, and energy generation and distribution, the vulnerabilities do not have to scale up for the stakes to soar. In a New York Times article last year, Matthew Wald wrote:

If an adversary lands a knockout blow [to the energy grid]…it could black out vast areas of the continent for weeks; interrupt supplies of water, gasoline, diesel fuel and fresh food; shut down communications; and create disruptions of a scale that was only hinted at by Hurricane Sandy and the attacks of Sept. 11.
In that same article, Wald noted that though government officials, law enforcement personnel, National Guard members, and utility workers had been brought together to go through a worst-case scenario practice drill, they often seemed to be speaking different languages, which did not bode well for an effective response to what is recognized as a near inevitability. (Last year the Department of Homeland Security responded to 256 cyberattacks, half of them directed at the electrical grid. This was double the number for 2012.)

This Babel problem dogs the whole Internet of Things venture. After the “things” are connected to the Internet, they need to communicate with one another: your smart TV to your smart light bulbs to your smart door locks to your smart socks (yes, they exist). And if there is no lingua franca—which there isn’t so far—then when that television breaks or becomes obsolete (because soon enough there will be an even smarter one), your choices will be limited by what language is connecting all your stuff. Though there are industry groups trying to unify the platform, in September Apple offered a glimpse of how the Internet of Things actually might play out, when it introduced the company’s new smart watch, mobile payment system, health apps, and other, seemingly random, additions to its product line. As Mat Honan virtually shouted in Wired:

Apple is building a world in which there is a computer in your every interaction, waking and sleeping. A computer in your pocket. A computer on your body. A computer paying for all your purchases. A computer opening your hotel room door. A computer monitoring your movements as you walk though the mall. A computer watching you sleep. A computer controlling the devices in your home. A computer that tells you where you parked. A computer taking your pulse, telling you how many steps you took, how high you climbed and how many calories you burned—and sharing it all with your friends…. THIS IS THE NEW APPLE ECOSYSTEM. APPLE HAS TURNED OUR WORLD INTO ONE BIG UBIQUITOUS COMPUTER.
The ecosystem may be lush, but it will be, by design, limited. Call it the Internet of Proprietary Things.

For many of us, it is difficult to imagine smart watches and WiFi-enabled light bulbs leading to a new world order, whether that new world order is a surveillance state that knows more about us than we do about ourselves or the techno-utopia envisioned by Jeremy Rifkin, where people can make much of what they need on 3-D printers powered by solar panels and unleashed human creativity. Because home automation is likely to be expensive—it will take a lot of eggs before the egg minder pays for itself—it is unlikely that those watches and light bulbs will be the primary driver of the Internet of Things, though they will be its showcase.

Rather, the Internet’s third wave will be propelled by businesses that are able to rationalize their operations by replacing people with machines, using sensors to simplify distribution patterns and reduce inventories, deploying algorithms that eliminate human error, and so on. Those business savings are crucial to Rifkin’s vision of the Third Industrial Revolution, not simply because they have the potential to bring down the price of consumer goods, but because, for the first time, a central tenet of capitalism—that increased productivity requires increased human labor—will no longer hold. And once productivity is unmoored from labor, he argues, capitalism will not be able to support itself, either ideologically or practically.

What will rise in place of capitalism is what Rifkin calls the “collaborative commons,” where goods and property are shared, and the distinction between those who own the means of production and those who are beholden to those who own the means of production disappears. “The old paradigm of owners and workers, and of sellers and consumers, is beginning to break down,” he writes.

Consumers are becoming their own producers, eliminating the distinction. Prosumers will increasingly be able to produce, consume, and share their own goods…. The automation of work is already beginning to free up human labor to migrate to the evolving social economy…. The Internet of Things frees human beings from the market economy to pursue nonmaterial shared interests on the Collaborative Commons.
Rifkin’s vision that people will occupy themselves with more fulfilling activities like making music and self-publishing novels once they are freed from work, while machines do the heavy lifting, is offered at a moment when a new kind of structural unemployment born of robotics, big data, and artificial intelligence takes hold globally, and traditional ways of making a living disappear. Rifkin’s claims may be comforting, but they are illusory and misleading. (We’ve also heard this before, in 1845, when Marx wrote in The German Ideology that under communism people would be “free to hunt in the morning, fish in the afternoon, rear cattle in the evening, [and] criticize after dinner.”)

As an example, Rifkin points to Etsy, the online marketplace where thousands of “prosumers” sell their crafts, as a model for what he dubs the new creative economy. “Currently 900,000 small producers of goods advertise at no cost on the Etsy website,” he writes.

Nearly 60 million consumers per month from around the world browse the website, often interacting personally with suppliers…. This form of laterally scaled marketing puts the small enterprise on a level playing field with the big boys, allowing them to reach a worldwide user market at a fraction of the cost.
All that may be accurate and yet largely irrelevant if the goal is for those 900,000 small producers to make an actual living. As Amanda Hess wrote last year in Slate:

Etsy says its crafters are “thinking and acting like entrepreneurs,” but they’re not thinking or acting like very effective ones. Seventy-four percent of Etsy sellers consider their shop a “business,” including 65 percent of sellers who made less than $100 last year.
While it is true that a do-it-yourself subculture is thriving, and sharing cars, tools, houses, and other property is becoming more common, it is also true that much of this activity is happening under duress as steady employment disappears. As an article in The New York Times this past summer made clear, employment in the sharing economy, also known as the gig economy, where people piece together an income by driving for Uber and delivering groceries for Instacart, leaves them little time for hunting and fishing, unless it’s hunting for work and fishing under a shared couch for loose change.

So here comes the Internet’s Third Wave. In its wake jobs will disappear, work will morph, and a lot of money will be made by the companies, consultants, and investment banks that saw it coming. Privacy will disappear, too, and our intimate spaces will become advertising platforms—last December Google sent a letter to the SEC explaining how it might run ads on home appliances—and we may be too busy trying to get our toaster to communicate with our bathroom scale to notice. Technology, which allows us to augment and extend our native capabilities, tends to evolve haphazardly, and the future that is imagined for it—good or bad—is almost always historical, which is to say, naive.

Weather history ‘time machine’ created (Science Daily)

Date: October 15, 2014

Source: San Diego State University

Summary: A software program that allows climate researchers to access historical climate data for the entire global surface (excluding the poles) has been developed. This software include the oceans, and is based statistical research into historical climates.

During the 1930s, North America endured the Dust Bowl, a prolonged era of dryness that withered crops and dramatically altered where the population settled. Land-based precipitation records from the years leading up to the Dust Bowl are consistent with the telltale drying-out period associated with a persistent dry weather pattern, but they can’t explain why the drought was so pronounced and long-lasting.

The mystery lies in the fact that land-based precipitation tells only part of the climate story.Building accurate computer reconstructions of historical global precipitation is tricky business. The statistical models are very complicated, the historical data is often full of holes, and researchers invariably have to make educated guesses at correcting for sampling errors.

Hard science

The high degree of difficulty and expertise required means that relatively few climate scientists have been able to base their research on accurate models of historical precipitation. Now, a new software program developed by a research team including San Diego State University Distinguished Professor of Mathematics and Statistics Samuel Shen will democratize this ability, allowing far more researchers access to these models.

“In the past, only a couple dozen scientists could do these reconstructions,” Shen said. “Now, anybody can play with this user-friendly software, use it to inform their research, and develop new models and hypotheses. This new tool brings historical precipitation reconstruction from a ‘rocket science’ to a ‘toy science.'”

The National Science Foundation-funded project is a collaboration between Shen, University of Maryland atmospheric scientist Phillip A. Arkin and National Oceanic and Atmospheric Administration climatologist Thomas M. Smith.

Predicting past patterns

Prescribed oceanic patterns are useful for predicting large weather anomalies. Prolonged dry or wet spells over certain regions can reliably tell you whether, for instance, North America will undergo an oceanic weather pattern such as the El Nino or La Nina patterns. The problem for historical models is that reliable data exists from only a small percentage of Earth’s surface. About eighty-four percent of all rain falls in the middle of the ocean with no one to record it. Satellite weather tracking is only a few decades old, so for historical models, researchers must fill in the gaps based on the data that does exist.

Shen, who co-directs SDSU’s Center for Climate and Sustainability Studies Area of Excellence, is an expert in minimizing error size inside model simulations. In the case of climate science, that means making the historical fill-in-the-gap guesses as accurate as possible.Shen and his SDSU graduate students Nancy Tafolla and Barbara Sperberg produced a user-friendly, technologically advanced piece of software that does the statistical heavy lifting for researchers. The program, known as SOGP 1.0, is based on research published last month in the Journal of Atmospheric Sciences. The group released SOGP 1.0 to the public last week, available by request.

SOGP 1.0, which stands for a statistical technique known as spectral optimal gridding of precipitation, is based on the MATLAB programming language, commonly used in science and engineering. It reconstructs precipitation records for the entire globe (excluding the Polar Regions) between the years 1900 and 2011 and allows researchers to zoom in on particular regions and timeframes.

New tool for climate change models

For example, Shen referenced a region in the middle of the Pacific Ocean that sometimes glows bright red on the computer model, indicating extreme dryness, and sometimes dark blue, indicating an unusually wet year. When either of these climate events occur, he said, it’s almost certain that North American weather will respond to these patterns, sometimes in a way that lasts several years.

“The tropical Pacific is the engine of climate,” Shen explained.

In the Dust Bowl example, the SOGP program shows extreme dryness in the tropical Pacific in the late 1920s and early 1930s — a harbinger of a prolonged dry weather event in North America. Combining this data with land-record data, the model can retroactively demonstrate the Dust Bowl’s especially brutal dry spell.

“If you include the ocean’s precipitation signal, the drought signal is amplified,” Shen said. “We can understand the 1930s Dust Bowl better by knowing the oceanic conditions.”

The program isn’t a tool meant to look exclusively at the past, though. Shen hopes that its ease of use will encourage climate scientists to incorporate this historical data into their own models, improving our future predictions of climate change.

Researchers interested in using SOGP 1.0 can request the software package as well as the digital datasets used by the program by e-mailing with the subject line, “SOGP precipitation product request,” followed by your name, affiliation, position, and the purpose for which you intend to use the program.

Journal Reference:

  1. Samuel S. P. Shen, Nancy Tafolla, Thomas M. Smith, Phillip A. Arkin. Multivariate Regression Reconstruction and Its Sampling Error for the Quasi-Global Annual Precipitation from 1900 to 2011. Journal of the Atmospheric Sciences, 2014; 71 (9): 3250 DOI: 10.1175/JAS-D-13-0301.1

Can Big Data Tell Us What Clinical Trials Don’t? (New York Times)

CreditIllustration by Christopher Brand

When a helicopter rushed a 13-year-old girl showing symptoms suggestive of kidney failure to Stanford’s Packard Children’s Hospital, Jennifer Frankovich was the rheumatologist on call. She and a team of other doctors quickly diagnosed lupus, an autoimmune disease. But as they hurried to treat the girl, Frankovich thought that something about the patient’s particular combination of lupus symptoms — kidney problems, inflamed pancreas and blood vessels — rang a bell. In the past, she’d seen lupus patients with these symptoms develop life-threatening blood clots. Her colleagues in other specialties didn’t think there was cause to give the girl anti-clotting drugs, so Frankovich deferred to them. But she retained her suspicions. “I could not forget these cases,” she says.

Back in her office, she found that the scientific literature had no studies on patients like this to guide her. So she did something unusual: She searched a database of all the lupus patients the hospital had seen over the previous five years, singling out those whose symptoms matched her patient’s, and ran an analysis to see whether they had developed blood clots. “I did some very simple statistics and brought the data to everybody that I had met with that morning,” she says. The change in attitude was striking. “It was very clear, based on the database, that she could be at an increased risk for a clot.”

The girl was given the drug, and she did not develop a clot. “At the end of the day, we don’t know whether it was the right decision,” says Chris Longhurst, a pediatrician and the chief medical information officer at Stanford Children’s Health, who is a colleague of Frankovich’s. But they felt that it was the best they could do with the limited information they had.

A large, costly and time-consuming clinical trial with proper controls might someday prove Frankovich’s hypothesis correct. But large, costly and time-consuming clinical trials are rarely carried out for uncommon complications of this sort. In the absence of such focused research, doctors and scientists are increasingly dipping into enormous troves of data that already exist — namely the aggregated medical records of thousands or even millions of patients to uncover patterns that might help steer care.

The Tatonetti Laboratory at Columbia University is a nexus in this search for signal in the noise. There, Nicholas Tatonetti, an assistant professor of biomedical informatics — an interdisciplinary field that combines computer science and medicine — develops algorithms to trawl medical databases and turn up correlations. For his doctoral thesis, he mined the F.D.A.’s records of adverse drug reactions to identify pairs of medications that seemed to cause problems when taken together. He found an interaction between two very commonly prescribed drugs: The antidepressant paroxetine (marketed as Paxil) and the cholesterol-lowering medication pravastatin were connected to higher blood-sugar levels. Taken individually, the drugs didn’t affect glucose levels. But taken together, the side-effect was impossible to ignore. “Nobody had ever thought to look for it,” Tatonetti says, “and so nobody had ever found it.”

The potential for this practice extends far beyond drug interactions. In the past, researchers noticed that being born in certain months or seasons appears to be linked to a higher risk of some diseases. In the Northern Hemisphere, people with multiple sclerosis tend to be born in the spring, while in the Southern Hemisphere they tend to be born in November; people with schizophrenia tend to have been born during the winter. There are numerous correlations like this, and the reasons for them are still foggy — a problem Tatonetti and a graduate assistant, Mary Boland, hope to solve by parsing the data on a vast array of outside factors. Tatonetti describes it as a quest to figure out “how these diseases could be dependent on birth month in a way that’s not just astrology.” Other researchers think data-mining might also be particularly beneficial for cancer patients, because so few types of cancer are represented in clinical trials.

As with so much network-enabled data-tinkering, this research is freighted with serious privacy concerns. If these analyses are considered part of treatment, hospitals may allow them on the grounds of doing what is best for a patient. But if they are considered medical research, then everyone whose records are being used must give permission. In practice, the distinction can be fuzzy and often depends on the culture of the institution. After Frankovich wrote about her experience in The New England Journal of Medicine in 2011, her hospital warned her not to conduct such analyses again until a proper framework for using patient information was in place.

In the lab, ensuring that the data-mining conclusions hold water can also be tricky. By definition, a medical-records database contains information only on sick people who sought help, so it is inherently incomplete. Also, they lack the controls of a clinical study and are full of other confounding factors that might trip up unwary researchers. Daniel Rubin, a professor of bioinformatics at Stanford, also warns that there have been no studies of data-driven medicine to determine whether it leads to positive outcomes more often than not. Because historical evidence is of “inferior quality,” he says, it has the potential to lead care astray.

Yet despite the pitfalls, developing a “learning health system” — one that can incorporate lessons from its own activities in real time — remains tantalizing to researchers. Stefan Thurner, a professor of complexity studies at the Medical University of Vienna, and his researcher, Peter Klimek, are working with a database of millions of people’s health-insurance claims, building networks of relationships among diseases. As they fill in the network with known connections and new ones mined from the data, Thurner and Klimek hope to be able to predict the health of individuals or of a population over time. On the clinical side, Longhurst has been advocating for a button in electronic medical-record software that would allow doctors to run automated searches for patients like theirs when no other sources of information are available.

With time, and with some crucial refinements, this kind of medicine may eventually become mainstream. Frankovich recalls a conversation with an older colleague. “She told me, ‘Research this decade benefits the next decade,’ ” Frankovich says. “That was how it was. But I feel like it doesn’t have to be that way anymore.”

Vira-latas sob controle (Fapesp)

22 de setembro de 2014

Por Yuri Vasconcelos

Software estima a população de cães e gatos abandonados e simula estratégias que beneficiam a saúde animal e humana (foto: Wikimedia)

Revista Pesquisa FAPESP – Ninguém conhece ao certo o tamanho das populações canina ou felina no Brasil, sejam elas de animais supervisionados – que têm dono e vivem em domicílios – ou de rua.

A caracterização demográfica de cães e gatos é um passo importante para definir estratégias de manejo populacional desses animais, além de contribuir para o controle de zoonoses como a raiva e a leishmaniose visceral, que causam 55 mil mortes e 500 mil casos no mundo, respectivamente.

Para lidar melhor com esse problema, um grupo de pesquisadores da Faculdade de Medicina Veterinária (FMVZ) da Universidade de São Paulo (USP), na capital paulista, criou um software capaz de estimar com elevado índice de precisão quantos cães e gatos domiciliados vivem nas cidades brasileiras. Em breve, esse programa poderá ser acessado livremente por órgãos do Ministério da Saúde e prefeituras.

“Conhecer a população de rua é essencial. Ela é resultado do abandono de animais”, diz o médico veterinário Fernando Ferreira, professor e coordenador do programa de pós-graduação da FMVZ.

O Brasil lidera a incidência de leishmaniose visceral na América Latina com cerca de 3 mil infectados por ano, o que representa 90% do total do continente. A raiva, apesar de poder ser controlada com vacinação, ainda tem casos no país. Em 1990, foram 50 casos em humanos, situação que variou de zero a dois casos entre 2007 e 2013.

Animais abandonados representam um problema de saúde pública, porque são os principais reservatórios e transmissores dessas enfermidades. Ao mesmo tempo, esses animais são vítimas de atropelamentos, abusos e crueldade.

A técnica mais confiável para dimensionar e classificar a população canina de rua foi criada pelo Instituto Pasteur em 2002 e indica que esses animais representam cerca de 5% dos indivíduos que têm dono.

“Assim, sabendo quantos cães supervisionados vivem numa determinada região, é possível estimar quantos existem nas ruas desse mesmo lugar”, diz Ferreira. “Já que existe uma relação direta entre essas duas populações, as estratégias de controle de cães abandonados passam pelo controle reprodutivo dos animais domiciliados”, explica o pesquisador, que contou no projeto com a colaboração do professor Marcos Amaku, também da FMVZ.

Batizado com a sigla capm – iniciais em inglês de companion animal population management ou manejo populacional de cães e gatos –, o software foi desenvolvido pelo doutorando Oswaldo Santos Baquero, bolsista da FAPESP.

“No meu estudo, avalio a validade de um desenho amostral complexo para estimar o tamanho populacional de cães domiciliados em municípios brasileiros. Também elaborei um modelo matemático de dinâmica populacional para simular cenários e definir prioridades de intervenção”, conta Baquero.

Para ele, a partir da modelagem matemática é possível, por exemplo, compreender com mais facilidade que o principal efeito esperado da esterilização é o aumento da população infértil e não a diminuição do tamanho de uma população inteira.

“Modelos matemáticos da transmissão da raiva na China sugerem que a melhor forma de controlar a doença é reduzir a taxa de natalidade canina e aumentar a imunização. Essas duas ações combinadas mostraram-se mais efetivas do que o sacrifício de animais.”

Leia a reportagem completa em:

Forming consensus in social networks (Science Daily)

Date: September 3, 2014

Source: University of Miami

Summary: To understand the process through which we operate as a group, and to explain why we do what we do, researchers have developed a novel computational model and the corresponding conditions for reaching consensus in a wide range of situations.

Social networks have become a dominant force in society. Family, friends, peers, community leaders and media communicators are all part of people’s social networks. Individuals within a network may have different opinions on important issues, but it’s their collective actions that determine the path society takes.

To understand the process through which we operate as a group, and to explain why we do what we do, researchers have developed a novel computational model and the corresponding conditions for reaching consensus in a wide range of situations. The findings are published in the August 2014 issue on Signal Processing for Social Networks of the IEEE Journal of Selected Topics in Signal Processing.

“We wanted to provide a new method for studying the exchange of opinions and evidence in networks,” said Kamal Premaratne, professor of electrical and computer engineering, at the University of Miami (UM) and principal investigator of the study. “The new model helps us understand the collective behavior of adaptive agents–people, sensors, data bases or abstract entities–by analyzing communication patterns that are characteristic of social networks.”

The model addresses some fundamental questions: what is a good way to model opinions and how these opinions are updated, and when is consensus reached.

One key feature of the new model is its capacity to handle the uncertainties associated with soft data (such as opinions of people) in combination with hard data (facts and numbers).

“Human-generated opinions are more nuanced than physical data and require rich models to capture them,” said Manohar N. Murthi, associate professor of electrical and computer engineering at UM and co-author of the study. “Our study takes into account the difficulties associated with the unstructured nature of the network,” he adds. “By using a new ‘belief updating mechanism,’ our work establishes the conditions under which agents can reach a consensus, even in the presence of these difficulties.”

The agents exchange and revise their beliefs through their interaction with other agents. The interaction is usually local, in the sense that only neighboring agents in the network exchange information, for the purpose of updating one’s belief or opinion. The goal is for the group of agents in a network to arrive at a consensus that is somehow ‘similar’ to the ground truth — what has been confirmed by the gathering of objective data.

In previous works, consensus achieved by the agents was completely dependent on how agents update their beliefs. In other words, depending on the updating scheme being utilized, one can get different consensus states. The consensus in the current model is more rational or meaningful.

“In our work, the consensus is consistent with a reliable estimate of the ground truth, if it is available,” Premaratne said. “This consistency is very important, because it allows us to estimate how credible each agent is.”

According to the model, if the consensus opinion is closer to an agent’s opinion, then one can say that this agent is more credible. On the other hand, if the consensus opinion is very different from an agent’s opinion, then it can be inferred that this agent is less credible.

“The fact that the same strategy can be used even in the absence of a ground truth is of immense importance because, in practice, we often have to determine if an agent is credible or not when we don’t have knowledge of the ground truth,” Murthi said.

In the future, the researchers would like to expand their model to include the formation of opinion clusters, where each cluster of agents share similar opinions. Clustering can be seen in the emergence of extremism, minority opinion spreading, the appearance of political affiliations, or affinity for a particular product, for example.


Journal Reference:

  1. Thanuka L. Wickramarathne, Kamal Premaratne, Manohar N. Murthi, Nitesh V. Chawla. Convergence Analysis of Iterated Belief Revision in Complex Fusion Environments. IEEE Journal of Selected Topics in Signal Processing, 2014; 8 (4): 598 DOI: 10.1109/JSTSP.2014.2314854

Deadly Algorithms (Radical Philosophy)

Can legal codes hold software accountable for code that kills?

RP 187 (Sept/Oct 2014)

Schuppli web-web

Algorithms have long adjudicated over vital processes that help to ensure our well-being and survival, from pacemakers that maintain the natural rhythms of the heart, and genetic algorithms that optimise emergency response times by cross-referencing ambulance locations with demographic data, to early warning systems that track approaching storms, detect seismic activity, and even seek to prevent genocide by monitoring ethnic conflict with orbiting satellites. [1] However, algorithms are also increasingly being tasked with instructions to kill: executing coding sequences that quite literally execute.

Guided by the Obama presidency’s conviction that the War on Terror can be won by ‘out-computing’ its enemies and pre-empting terrorists’ threats using predictive software, a new generation of deadly algorithms is being designed that will both control and manage the ‘kill-list,’ and along with it decisions to strike. [2] Indeed, the recently terminated practice of ‘signature strikes’, in which data analytics was used to determine emblematic ‘terrorist’ behaviour and match these patterns to potential targets on the ground, already points to a future in which intelligence-gathering, assessment and military action, including the calculation of who can legally be killed, will largely be performed by machines based upon an ever-expanding database of aggregated information. As such, this transition to execution by algorithm is not simply a continuation of killing at ever greater distances inaugurated by the invention of the bow and arrow that separated warrior and foe, as many have suggested. [3] It is also a consequence of the ongoing automation of warfare, which can be traced back to the cybernetic coupling of Claude Shannon’s mathematical theory of information with Norbert Wiener’s wartime research into feedback loops and communication control systems. [4] As this new era of intelligent weapons systems progresses, operational control and decision-making are increasingly being outsourced to machines.

Computing terror

In 2011 the US Department of Defense (DOD) released its ‘roadmap’ forecasting the expanded use of unmanned technologies, of which unmanned aircraft systems – drones – are but one aspect of an overall strategy directed towards the implementation of fully autonomous Intelligent Agents. It projects its future as follows:

The Department of Defense’s vision for unmanned systems is the seamless integration of diverse unmanned capabilities that provide flexible options for Joint Warfighters while exploiting the inherent advantages of unmanned technologies, including persistence, size, speed, maneuverability, and reduced risk to human life. DOD envisions unmanned systems seamlessly operating with manned systems while gradually reducing the degree of human control and decision making required for the unmanned portion of the force structure. [5]

The document is a strange mix of Cold War caricature and Fordism set against the backdrop of contemporary geopolitical anxieties, which sketches out two imaginary vignettes to provide ‘visionary’ examples of the ways in which autonomy can improve efficiencies through inter-operability across military domains, aimed at enhancing capacities and flexibility between manned and unmanned sectors of the US Army, Air Force and Navy. In these future scenarios, the scripting and casting are strikingly familiar, pitting the security of hydrocarbon energy supplies against rogue actors equipped with Russian technology. One concerns an ageing Russian nuclear submarine deployed by a radicalized Islamic nation-state that is beset by an earthquake in the Pacific, thus contaminating the coastal waters of Alaska and threatening its oil energy reserves. The other involves the sabotaging of an underwater oil pipeline in the Gulf of Guinea off the coast of Africa, complicated by the approach of a hostile surface vessel capable of launching a Russian short-range air-to-surface missile. [6]

These Hollywood-style action film vignettes – fully elaborated across five pages of the report – provide an odd counterpoint to the claims being made throughout the document as to the sober science, political prudence and economic rationalizations that guide the move towards fully unmanned systems. On what grounds are we to be convinced by these visions and strategies? On the basis of a collective cultural imaginary that finds its politics within the CGI labs of the infotainment industry? Or via an evidence-based approach to solving the complex problems posed by changing global contexts? Not surprisingly, the level of detail (and techno-fetishism) used to describe unmanned responses to these risk scenarios is far more exhaustive than that devoted to the three primary challenges which the report identifies as specific to the growing reliance upon and deployment of automated and autonomous systems:

1. Investment in science and technology (S&T) to enable more capable autonomous operations.

2. Development of policies and guidelines on what decisions can be safely and ethically delegated and under what conditions.

3. Development of new Verification and Validation (V&V) and T&E techniques to enable verifiable ‘trust’ in autonomy. [7]

As the second of these ‘challenges’ indicates, the delegation of decision-making to computational regimes is particularly crucial here, in so far as it provokes a number of significant ethical dilemmas but also urgent questions regarding whether existing legal frameworks are capable of attending to the emergence of these new algorithmic actors. This is especially concerning when the logic of precedent that organizes much legal decision-making (within common law systems) has followed the same logic that organized the drone programme in the first place: namely, the justification of an action based upon a pattern of behaviour that was established by prior events.

The legal aporia intersects with a parallel discourse around moral responsibility; a much broader debate that has tended to structure arguments around the deployment of armed drones as an antagonism between humans and machines. As the authors of the entry on ‘Computing and Moral Responsibility’ in the Stanford Encyclopedia of Philosophy put it:

Traditionally philosophical discussions on moral responsibility have focused on the human components in moral action. Accounts of how to ascribe moral responsibility usually describe human agents performing actions that have well-defined, direct consequences. In today’s increasingly technological society, however, human activity cannot be properly understood without making reference to technological artifacts, which complicates the ascription of moral responsibility. [8]

When one poses the question, under what conditions is it morally acceptable to deliberately kill a human being, one is not, in this case, asking whether the law permits such an act for reasons of imminent threat, self-defence or even empathy for someone who is in extreme pain or in a non-responsive vegetative state. The moral register around the decision to kill operates according to a different ethical framework: one that doesn’t necessarily bind the individual to a contract enacted between the citizen and the state. Moral positions can be specific to individual values and beliefs whereas legal frameworks permit actions in our collective name as citizens contracted to a democratically elected body that acts on our behalf but with which we might be in political disagreement. While it is, then, much easier to take a moral stance towards events that we might oppose – US drone strikes in Pakistan – than to justify a claim as to their specific illegality given the anti-terror legislation that has been put in place since 9/11, assigning moral responsibility, proving criminal negligence or demonstrating legal liability for the outcomes of deadly events becomes even more challenging when humans and machines interact to make decisions together, a complication that will only intensify as unmanned systems become more sophisticated and act as increasingly independent legal agents. Moreover, the outsourcing of decision-making to the judiciary as regards the validity of scientific evidence, which followed the 1993 Daubertruling – in the context of a case brought against Merrell Dow Pharmaceuticals – has, in addition, made it difficult for the law to take an activist stance when confronted with the limitations of its own scientific understandings of technical innovation. At present it would obviously be unreasonable to take an algorithm to court when things go awry, let alone when they are executed perfectly, as in the case of a lethal drone strike.

By focusing upon the legal dimension of algorithmic liability as opposed to more wide-ranging moral questions I do not want to suggest that morality and law should be consigned to separate spheres. However, it is worth making a preliminary effort to think about the ways in which algorithms are not simply reordering the fundamental principles that govern our lives, but might also be asked to provide alternate ethical arrangements derived out of mathematical axioms.

Algorithmic accountability

Law, which has already expanded the category of ‘legal personhood’ to include non-human actors such as corporations, also offers ways, then, to think about questions of algorithmic accountability. [9] Of course many would argue that legal methods are not the best frameworks for resolving moral dilemmas. But then again nor are the objectives of counter-terrorism necessarily best serviced by algorithmic oversight. Shifting the emphasis towards a juridical account of algorithmic reasoning might, at any rate, prove useful when confronted with the real possibility that the kill list and other emergent matrices for managing the war on terror will be algorithmically derived as part of a techno-social assemblage in which it becomes impossible to isolate human from non-human agents. It does, however, raise the ‘bar’ for what we would now need to ask the law to do. The degree to which legal codes can maintain their momentum alongside rapid technological change and submit ‘complicated algorithmic systems to the usual process of checks-and-balances that is generally imposed on powerful items that affect society on a large scale’ is of considerable concern. [10] Nonetheless, the stage has already been set for the arrival of a new cast of juridical actors endowed not so much with free will in the classical sense (that would provide the conditions for criminal liability), but intelligent systems which are wilfully free in the sense that they have been programmed to make decisions based upon their own algorithmic logic.[11] While armed combat drones are the most publicly visible of the automated military systems that the DOD is rolling out, they are only one of the many remote-controlled assets that will gather, manage, analyse and act on the data that they acquire and process.

Proponents of algorithmic decision-making laud the near instantaneous response time that allows Intelligent Agents – what some have called ‘moral predators’ – to make micro-second adjustments to avert a lethal drone strike should, for example, children suddenly emerge out of a house that is being targeted as a militant hideout. [12] Indeed robotic systems have long been argued to decrease the error margin of civilian casualties that are often the consequence of actions made by tired soldiers in the field. Nor are machines overly concerned with their own self-preservation, which might likewise cloud judgement under conditions of duress. Yet, as Sabine Gless and Herbert Zech ask, if these ‘Intelligent Agents are often used in areas where the risk of failure and error can be reduced by relying on machines rather than humans … the question arises: Who is liable if things go wrong?’[13]

Typically when injury and death occur to humans, the legal debate focuses upon the degree to which such an outcome was foreseeable and thus adjudicates on the basis of whether all reasonable efforts and pre-emptive protocols had been built into the system to mitigate against such an occurrence. However, programmers cannot of course run all the variables that combine to produce machinic decisions, especially when the degree of uncertainty as to conditions and knowledge of events on the ground is as variable as the shifting contexts of conflict and counter-terrorism. Werner Dahm, chief scientist at the United States Air Force, typically stresses the difficulty of designing error-free systems: ‘You have to be able to show that the system is not going to go awry – you have to disprove a negative.’ [14] Given that highly automated decision-making processes involve complex and rapidly changing contexts mediated by multiple technologies, can we then reasonably expect to build a form of ethical decision-making into these unmanned systems? And would an algorithmic approach to managing the ethical dimensions of drone warfare – for example, whether to strike 16-year-old Abdulrahman al-Awlaki in Yemen because his father was a radicalized cleric; a role that he might inherit – entail the same logics that characterized signature strikes, namely that of proximity to militant-like behaviour or activity? [15] The euphemistically rebranded kill list known as the ‘disposition matrix’ suggests that such determinations can indeed be arrived at computationally. As Greg Miller notes: ‘The matrix contains the names of terrorism suspects arrayed against an accounting of the resources being marshaled to track them down, including sealed indictments and clandestine operations.’ [16]

Intelligent systems are arguably legal agents but not as of yet legal persons, although precedents pointing to this possibility have already been set in motion. The idea that an actual human being or ‘legal person’ stands behind the invention of every machine who might ultimately be found responsible when things go wrong, or even when they go right, is no longer tenable and obfuscates the fact that complex systems are rarely, if ever, the product of single authorship; nor do humans and machines operate in autonomous realms. Indeed, both are so thoroughly entangled with each other that the notion of a sovereign human agent functioning outside the realm of machinic mediation seems wholly improbable. Consider for a moment only one aspect of conducting drone warfare in Pakistan – that of US flight logistics – in which we find that upwards of 165 people are required just to keep a Predator drone in the air for twenty-four hours, the half-life of an average mission. These personnel requirements are themselves embedded in multiple techno-social systems composed of military contractors, intelligence officers, data analysts, lawyers, engineers, programmers, as well as hardware, software, satellite communication, and operation centres (CAOC), and so on. This does not take into account the R&D infrastructure that engineered the unmanned system, designed its operating procedures and beta-tested it. Nor does it acknowledge the administrative apparatus that brought all of these actors together to create the event we call a drone strike. [17]

In the case of a fully automated system, decision-making is reliant upon feedback loops that continually pump new information into the system in order to recalibrate it. But perhaps more significantly in terms of legal liability, decision-making is also governed by the system’s innate ability to self-educate: the capacity of algorithms to learn and modify their coding sequences independent of human oversight. Isolating the singular agent who is directly responsible – legally – for the production of a deadly harm (as currently required by criminal law) suggests, then, that no one entity beyond the Executive Office of the President might ultimately be held accountable for the aggregate conditions that conspire to produce a drone strike and with it the possibility of civilian casualties. Given that the USA doesn’t accept the jurisdiction of the International Criminal Court and Article 25 of the Rome Statute governing individual criminal responsibility, what new legal formulations could, then, be created that would be able to account for indirect and aggregate causality born out of a complex chain of events including so called digital perpetrators? American tort law, which adjudicates over civil wrongs, might be one such place to look for instructive models. In particular, legal claims regarding the use of environmental toxins, which are highly distributed events whose lethal effects often take decades to appear, and involve an equally complex array of human and non-human agents, have been making their way into court, although not typically with successful outcomes for the plaintiffs. The most notable of these litigations have been the mass toxic tort regarding the use of Agent Orange as a defoliant in Vietnam and the Bhopal disaster in India. [18] Ultimately, however, the efficacy of such an approach has to be considered in light of the intended outcome of assigning liability, which in the cases mentioned was not so much deterrence or punishment, but, rather, compensation for damages.

Recoding the law

While machines can be designed with a high degree of intentional behaviour and will out-perform humans in many instances, the development of unmanned systems will need to take into account a far greater range of variables, including shifting geopolitical contexts and murky legal frameworks, when making the calculation that conditions have been met to execute someone. Building in fail-safe procedures that abort when human subjects of a specific size (children) or age and gender (males under the age of 18) appear, sets the stage for a proto-moral decision-making regime. But is the design of ethical constraints really where we wish to push back politically when it comes to the potential for execution by algorithm? Or can we work to complicate the impunity that certain techno-social assemblages currently enjoy? As a 2009 report by the Royal Academy of Engineering on autonomous systems argues,

Legal and regulatory models based on systems with human operators may not transfer well to the governance of autonomous systems. In addition, the law currently distinguishes between human operators and technical systems and requires a human agent to be responsible for an automated or autonomous system. However, technologies which are used to extend human capabilities or compensate for cognitive or motor impairment may give rise to hybrid agents … Without a legal framework for autonomous technologies, there is a risk that such essentially human agents could not be held legally responsible for their actions – so who should be responsible? [19]

Implicating a larger set of agents including algorithmic ones that aid and abet such an act might well be a more effective legal strategy, even if expanding the limits of criminal liability proves unwieldy. As the 2009 ECCHR Study on Criminal Accountability in Sri Lanka put it: ‘Individuals, who exercise the power to organise the pattern of crimes that were later committed, can be held criminally liable as perpetrators. These perpetrators can usually be found in civil ministries such as the ministry of defense or the office of the president.’ [20] Moving down the chain of command and focusing upon those who participate in the production of violence by carrying out orders has been effective in some cases (Sri Lanka), but also problematic in others (Abu Ghraib) where the indictment of low-level officers severed the chain of causal relations that could implicate more powerful actors. Of course prosecuting an algorithm alone for executing lethal orders that the system is in fact designed to make is fairly nonsensical if the objective is punishment. The move must, then, be part of an overall strategy aimed at expanding the field of causality and thus broadening the reach of legal responsibility.

My own work as a researcher on the Forensic Architecture project, alongside Eyal Weizman and several others, in developing new methods of spatial and visual investigation for the UN inquiry into the use of armed drones, provides one specific vantage point for considering how machinic capacities are reordering the field of political action and thus calling forth new legal strategies.[21] In taking seriously the agency of things, we must also take seriously the agency of things whose productive capacities are enlisted in the specific decision to kill. Computational regimes, in operating largely beyond the thresholds of human perception, have produced informatic conjunctions that have redistributed and transformed the spaces in which action occurs, as well as the nature of such consequential actions themselves. When algorithms are being enlisted to out-compute terrorism and calculate who can and should be killed, do we not need to produce a politics appropriate to these radical modes of calculation and a legal framework that is sufficiently agile to deliberate over such events?

Decision-making by automated systems will produce new relations of power for which we have as yet inadequate legal frameworks or modes of political resistance – and, perhaps even more importantly, insufficient collective understanding as to how such decisions will actually be made and upon what grounds. Scientific knowledge about technical processes does not belong to the domain of science alone, as the Daubert ruling implies. However, demands for public accountability and oversight will require much greater participation in the epistemological frameworks that organize and manage these new techno-social systems, and that may be a formidable challenge for all of us. What sort of public assembly will be able to prevent the premature closure of a certain ‘epistemology of facts’, as Bruno Latour would say, that are at present cloaked under a veil of secrecy called ‘national security interests’ – the same order of facts that scripts the current DOD roadmap for unmanned systems?

In a recent ABC Radio interview, Sarah Knuckey, director of the Project on Extrajudicial Executions at New York University Law School, emphasized the degree to which drone warfare has strained the limits of international legal conventions and with it the protection of civilians. [22] The ‘rules of warfare’ are ‘already hopelessly out-dated’, she says, and will require ‘new rules of engagement to be drawn up’: ‘There is an enormous amount of concern about the practices the US is conducting right now and the policies that underlie those practices. But from a much longer-term perspective and certainly from lawyers outside the US there is real concern about not just what’s happening now but what it might mean 10, 15, 20 years down the track.’ [23] Could these new rules of engagement – new legal codes – assume a similarly preemptive character to the software codes and technologies that are being evolved – what I would characterize as a projective sense of the law? Might they take their lead from the spirit of the Geneva Conventions protecting the rights of noncombatants, rather than from those protocols (the Hague Conventions of 1899, 1907) that govern the use of weapons of war, and are thus reactive in their formulation and event-based? If so, this would have to be a set of legal frameworks that is not so much determined by precedent – by what has happened in the past – but, instead, by what may take place in the future.


1. ^ See, for example, the satellite monitoring and atrocity evidence programmes: ‘Eyes on Darfur’ ( and ‘The Sentinel Project for Genocide Prevention’ (

2. ^ Cori Crider, ‘Killing in the Name of Algorithms: How Big Data Enables the Obama Administration’s Drone War’, Al Jazeera America, 2014,
obama.html; accessed 18 May 2014. See also the flow chart in Daniel Byman and Benjamin Wittes, ‘How Obama Decides Your Fate if He Thinks You’re a Terrorist,’ The Atlantic, 3 January 2013,

3. ^ For a recent account of the multiple and compound geographies through which drone operations are executed, see Derek Gregory, ‘Drone Geographies’, Radical Philosophy 183 (January/February 2014), pp. 7–19.

4. ^ Contemporary information theorists would argue that the second-order cybernetic model of feedback and control, in which external data is used to adjust the system, doesn’t take into account the unpredictability of evolutive data internal to the system resulting from crunching ever-larger datasets. See Luciana Parisi’s Introduction to Contagious Architecture: Computation, Aesthetics, and Space, MIT Press, Cambridge MA, 2013. For a discussion of Weiner’s cybernetics in this context, see Reinhold Martin, ‘The Organizational Complex: Cybernetics, Space, Discourse’, Assemblage 37, 1998, p. 110.

5. ^ DOD, Unmanned Systems Integrated Roadmap Fy2011–2036, Office of the Undersecretary of Defense for Acquisition, Technology, & Logistics, Washington, DC, 2011, p. 3,

6. ^ Ibid., pp. 1–10.

7. ^ Ibid., p. 27.

8. ^ Merel Noorman and Edward N. Zalta, ‘Computing and Moral Responsibility,’ The Stanford Encyclopedia of Philosophy(2014),

9. ^ See John Dewey, ‘The Historic Background of Corporate Legal Personality’, Yale Law Journal, vol. 35, no. 6, 1926, pp. 656, 669.

10. ^ Data & Society Research Institute, ‘Workshop Primer: Algorithmic Accountability’, The Social, Cultural & Ethical Dimensions of ‘Big Data’ workshop, 2014, p. 3.

11. ^ See Gunther Teubner, ‘Rights of Non-Humans? Electronic Agents and Animals as New Actors in Politics and Law,’ Journal of Law & Society, vol. 33, no.4, 2006, pp. 497–521.

12. ^ See Bradley Jay Strawser, ‘Moral Predators: The Duty to Employ Uninhabited Aerial Vehicles,’ Journal of Military Ethics, vol. 9, no. 4, 2010, pp. 342–68.

13. ^ Sabine Gless and Herbert Zech, ‘Intelligent Agents: International Perspectives on New Challenges for Traditional Concepts of Criminal, Civil Law and Data Protection’, text for ‘Intelligent Agents’ workshop, 7–8 February 2014, University of Basel, Faculty of Law,

14. ^ Agence-France Presse, ‘The Next Wave in U.S. Robotic War: Drones on Their Own’, Defense News, 28 September 2012, p. 2,

15. ^ When questioned about the drone strike that killed 16-year old American-born Abdulrahman al-Awlaki, teenage son of radicalized cleric Anwar Al-Awlaki, in Yemen in 2011, Robert Gibbs, former White House press secretary and senior adviser to President Obama’s re-election campaign, replied that the boy should have had ‘a more responsible father’.

16. ^ Greg Miller, ‘Plan for Hunting Terrorists Signals U.S. Intends to Keep Adding Names to Kill Lists’, Washington Post, 23 October 2012,–11e2–a55c-39408fbe6a4b_story.html.

17. ^ ‘While it might seem counterintuitive, it takes significantly more people to operate unmanned aircraft than it does to fly traditional warplanes. According to the Air Force, it takes a jaw-dropping 168 people to keep just one Predator aloft for twenty-four hours! For the larger Global Hawk surveillance drone, that number jumps to 300 people. In contrast, an F-16 fighter aircraft needs fewer than one hundred people per mission.’ Medea Benjamin, Drone Warfare: Killing by Remote Control, Verso, London and New York, 2013, p. 21.

18. ^ See Peter H. Schuck, Agent Orange on Trial: Mass Toxic Disasters in the Courts, Belknap Press of Harvard University Press, Cambridge MA, 1987. See also:

19. ^ Royal Academy of Engineering, Autonomous Systems: Social, Legal and Ethical Issues, RAE, London, 2009, p. 3,

20. ^ European Center for Constitutional and Human Rights, Study on Criminal Accountability in Sri Lanka as of January 2009, ECCHR, Berlin, 2010, p. 88.

21. ^ Other members of the Forensic Architecture drone investigative team included Jacob Burns, Steffen Kraemer, Francesco Sebregondi and SITU Research. See

22. ^ Bureau of Investigative Journalism, ‘Get the Data: Drone Wars’,

23. ^ Annabelle Quince, ‘Future of Drone Strikes Could See Execution by Algorithm’, Rear Vision, ABC Radio, edited transcript, pp. 2–3.

Life in Code and Software (


Mediated Life in a Complex Computational Ecology
ISBN: 978-1-60785-283-4
edited by David M. Berry


Introduction: What is Code and Software?

This book explores the relationship between living, code and software. Technologies of code and software increasingly make up an important part of our urban environment. Indeed, their reach stretches to even quite remote areas of the world. Life in Code and Software introduces and explores the way in which code and software are becoming the conditions of possibility for human living, crucially forming a computational ecology, made up of disparate software ecologies, that we inhabit. As such we need to take account of this new computational environment and think about how today we live in a highly mediated, code-based world. That is, we live in a world where computational concepts and ideas are foundational, or ontological, which I call computationality, and within which, code and software become the paradigmatic forms of knowing and doing. Such that other candidates for this role, such as: air, the economy, evolution, the environment, satellites, etc., are understood and explained through computational concepts and categories. (more…)

Thinking Software

Eric W. Weisstein 
What is a Turing Machine?
David Barker-Plummer 
Turing Machines
Achim Jung 
A Short Introduction to the Lambda Calculus
Luciana Parisi & Stamatia Portanova 
Soft Thought (in architecture and choreography)
David M. Berry 
Understanding Digital Humanities
Edsger W. Dijkstra 
Go To Statement Considered Harmful
Alan M. Turing 
Computing Machinery and Intelligence
Martin Gardner 
The Fantastic Combinations of John Conway’s New Solitaire Game ‘Life’
David Golumbia 
Computation, Gender, and Human Thinking
Alan M. Turing 
Extract from On Computable Numbers, with an Application to the Entscheidungs Problem

Video of a Turing Machine – Overview

Kevin Slavin 
How Algorithms Shape Our World

Video shows how these complex computer programs determine: espionage tactics, stock prices, movie scripts, and architecture.

Code Literacy (‘iteracy’)

David M. Berry 
Iteracy: Reading, Writing and Running Code
Ian Bogost 
Procedural Literacy: Problem Solving with Programming, Systems, & Play
Cathy Davidson 
Why We Need a 4th R: Reading, wRiting, aRithmetic, algoRithms
Jeannette M. Wing 
Computational Thinking
Stephan Ramsay 
On Building
Edsger W. Dijkstra 
On the Cruelty of Really Teaching Computing Science
Louis McCallum and Davy Smith 
Show Us Your Screens

A short documentary about live coding practise by Louis McCallum and Davy Smith.

Jeannette M. Wing 
Computational Thinking and Thinking About Computing’

Wing argues that computational thinking will be a fundamental skill used by everyone in the world. To reading, writing, and arithmetic, she adds computational thinking to everyones’ analytical ability.

why the lucky stiff 
Hackety Hack: Learning to Code

why the lucky stiff (or _why) is a computer programmer, talking about learning to code.

Decoding Code

David M. Berry 
A Contribution Towards a Grammar of Code
Mark C. Marino 
Critical Code Studies
Lev Manovich 
Software Takes Command
Dennis G. Jerz 
Somewhere Nearby is Colossal Cave: Examining Will Crowther’s Original “Adventure” in Code and in Kentucky
Aleksandr Matrosov, Eugene Rodionov, David Harley, and Juraj Malcho, J. 
Stuxnet Under the Microscope
Ralph Langner 
Cracking Stuxnet, a 21st-century Cyber Weapon

A fascinating look inside cyber-forensics and the processes of reading code to understand how it works and what it attacks.

Stephen Ramsay 
Algorithms are Thoughts, Chainsaws are Tools

A short film on livecoding presented as part of the Critical Code Studies Working Group, March 2010, by Stephen Ramsay. Presents a “live reading” of a performance by composer Andrew Sorensen.

Wendy Chun 
Critical Code Studies

Wendy Chun giving a lecture on code studies and reading source code.

Federica Frabetti 
Critical Code Studies

Federica Frabetti giving a lecture on code studies and reading source code.

David M. Berry 
Thinking Software: Realtime Streams and Knowledge in the Digital Age

As software/code increasingly structures the contemporary world, curiously, it also withdraws, and becomes harder and harder for us to focus on as it is embedded, hidden, off-shored or merely forgotten about. The challenge is to bring software/code back into visibility so that we can pay attention to both what it is (ontology/medium), where it has come from (media archaeology/genealogy) but also what it is doing (through a form of mechanology), so we can understand this ‘dynamic of organized inorganic matter’.

Software Ecologies

Gabriella Coleman 
The Anthropology of Hackers
Felix Guattari 
The Three Ecologies
Robert Kitchin 
The Programmable City
Bruno Latour 
The Whole is Always Smaller Than Its Parts- A Digital Test of Gabriel Tarde’s Monads
Mathew Fuller and Sonia Matos 
Feral Computing: From Ubiquitous Calculation to Wild Interactions
Jussi Parikka 
Media Ecologies and Imaginary Media: Transversal Expansions, Contractions, and Foldings
David Gelernter 
Time to Start Taking the Internet Seriously
Adrian Mackenzie 
The Problem of Computer Code: Leviathan or Common Power?
Adrian Mackenzie 
Wirelessness as Experience of Transition
Thomas Goetz 
Harnessing the Power of Feedback Loops
Christian Ulrik Andersen & Søren Pold 
The Scripted Spaces of Urban Ubiquitous Computing: The Experience, Poetics, and Politics of Public Scripted Space
B.J. Fogg, Gregory Cuellar, and David Danielson 
Motivating, Influencing, and Persuading Users
Alexander R. Galloway 
“Deleuze and Computers” – Alexander R. Galloway

“Deleuze and Computers” – a lecture by Alexander R. Galloway at the W.E.B. Du Bois Library at the University of Massachusetts Amherst on December 2nd, 2011.

Gary Wolf 
The Quantified Self

The notion of using computational devices in everyday life to record everything about you.

Gary Kovacs 
Tracking the Trackers

As you surf the Web, information is being collected about you.

Michael Najjar 
How Art Envisions Our Future

Data, information, computation, and technology mediated through art


A ‘Frozen’ PDF Version of this Living Book

Download a ‘frozen’ PDF version of this book as it appeared on 13th July 2012

How Quantum Computers and Machine Learning Will Revolutionize Big Data (Wired)



Image: infocux Technologies/Flickr

When subatomic particles smash together at the Large Hadron Collider in Switzerland, they create showers of new particles whose signatures are recorded by four detectors. The LHC captures 5 trillion bits of data — more information than all of the world’s libraries combined — every second. After the judicious application of filtering algorithms, more than 99 percent of those data are discarded, but the four experiments still produce a whopping 25 petabytes (25×1015 bytes) of data per year that must be stored and analyzed. That is a scale far beyond the computing resources of any single facility, so the LHC scientists rely on a vast computing grid of 160 data centers around the world, a distributed network that is capable of transferring as much as 10 gigabytes per second at peak performance.

The LHC’s approach to its big data problem reflects just how dramatically the nature of computing has changed over the last decade. Since Intel co-founder Gordon E. Moore first defined it in 1965, the so-called Moore’s law — which predicts that the number of transistors on integrated circuits will double every two years — has dominated the computer industry. While that growth rate has proved remarkably resilient, for now, at least, “Moore’s law has basically crapped out; the transistors have gotten as small as people know how to make them economically with existing technologies,” said Scott Aaronson, a theoretical computer scientist at the Massachusetts Institute of Technology.

Instead, since 2005, many of the gains in computing power have come from adding more parallelism via multiple cores, with multiple levels of memory. The preferred architecture no longer features a single central processing unit (CPU) augmented with random access memory (RAM) and a hard drive for long-term storage. Even the big, centralized parallel supercomputers that dominated the 1980s and 1990s are giving way to distributed data centers and cloud computing, often networked across many organizations and vast geographical distances.

These days, “People talk about a computing fabric,” said Stanford University electrical engineerStephen Boyd. These changes in computer architecture translate into the need for a different computational approach when it comes to handling big data, which is not only grander in scope than the large data sets of yore but also intrinsically different from them.

The demand for ever-faster processors, while important, isn’t the primary focus anymore. “Processing speed has been completely irrelevant for five years,” Boyd said. “The challenge is not how to solve problems with a single, ultra-fast processor, but how to solve them with 100,000 slower processors.” Aaronson points out that many problems in big data can’t be adequately addressed by simply adding more parallel processing. These problems are “more sequential, where each step depends on the outcome of the preceding step,” he said. “Sometimes, you can split up the work among a bunch of processors, but other times, that’s harder to do.” And often the software isn’t written to take full advantage of the extra processors. “If you hire 20 people to do something, will it happen 20 times faster?” Aaronson said. “Usually not.”

Researchers also face challenges in integrating very differently structured data sets, as well as the difficulty of moving large amounts of data efficiently through a highly distributed network.

Those issues will become more pronounced as the size and complexity of data sets continue to grow faster than computing resources, according to California Institute of Technology physicist Harvey Newman, whose team developed the LHC’s grid of data centers and trans-Atlantic network. He estimates that if current trends hold, the computational needs of big data analysis will place considerable strain on the computing fabric. “It requires us to think about a different kind of system,” he said.

Memory and Movement

Emmanuel Candes, an applied mathematician at Stanford University, was once able to crunch big data problems on his desktop computer. But last year, when he joined a collaboration of radiologists developing dynamic magnetic resonance imaging — whereby one could record a patient’s heartbeat in real time using advanced algorithms to create high-resolution videos from limited MRI measurements — he found that the data no longer fit into his computer’s memory, making it difficult to perform the necessary analysis.

Addressing the storage-capacity challenges of big data is not simply a matter of building more memory, which has never been more plentiful. It is also about managing the movement of data. That’s because, increasingly, the desired data is no longer at people’s fingertips, stored in a single computer; it is distributed across multiple computers in a large data center or even in the “cloud.”There is a hierarchy to data storage, ranging from the slowest, cheapest and most abundant memory to the fastest and most expensive, with the least available space. At the bottom of this hierarchy is so-called “slow memory” such as hard drives and flash drives, the cost of which continues to drop. There is more space on hard drives, compared to the other kinds of memory, but saving and retrieving the data takes longer. Next up this ladder comes RAM, which is must faster than slow memory but offers less space is more expensive. Then there is cache memory — another trade-off of space and price in exchange for faster retrieval speeds — and finally the registers on the microchip itself, which are the fastest of all but the priciest to build, with the least available space. If memory storage were like real estate, a hard drive would be a sprawling upstate farm, RAM would be a medium-sized house in the suburbs, cache memory would be a townhouse on the outskirts of a big city, and the register memory would be a tiny studio in a prime urban location.

Longer commutes for stored data translate into processing delays. “When computers are slow today, it’s not because of the microprocessor,” Aaronson said. “The microprocessor is just treading water waiting for the disk to come back with the data.” Big data researchers prefer to minimize how much data must be moved back and forth from slow memory to fast memory. The problem is exacerbated when the data is distributed across a network or in the cloud, because it takes even longer to move the data back and forth, depending on bandwidth capacity, so that it can be analyzed.

One possible solution to this dilemma is to embrace the new paradigm. In addition to distributed storage, why not analyze the data in a distributed way as well, with each unit (or node) in a network of computers performing a small piece of a computation? Each partial solution is then integrated to find the full result. This approach is similar in concept to the LHC’s, in which one complete copy of the raw data (after filtering) is stored at the CERN research facility in Switzerland that is home to the collider. A second copy is divided into batches that are then distributed to data centers around the world. Each center analyzes its chunk of data and transmits the results to regional computers before moving on to the next batch.

Alon Halevy, a computer scientist at Google, says the biggest breakthroughs in big data are likely to come from data integration.Image: Peter DaSilva for Quanta Magazine

Boyd’s system is based on so-calledconsensus algorithms. “It’s a mathematical optimization problem,” he said of the algorithms. “You are using past data to train the model in hopes that it will work on future data.” Such algorithms are useful for creating an effective SPAM filter, for example, or for detecting fraudulent bank transactions.

This can be done on a single computer, with all the data in one place. Machine learning typically uses many processors, each handling a little bit of the problem. But when the problem becomes too large for a single machine, a consensus optimization approach might work better, in which the data set is chopped into bits and distributed across 1,000 “agents” that analyze their bit of data and each produce a model based on the data they have processed. The key is to require a critical condition to be met: although each agent’s model can be different, all the models must agree in the end — hence the term “consensus algorithms.”

The process by which 1,000 individual agents arrive at a consensus model is similar in concept to the Mechanical Turk crowd-sourcing methodology employed by Amazon — with a twist. With the Mechanical Turk, a person or a business can post a simple task, such as determining which photographs contain a cat, and ask the crowd to complete the task in exchange for gift certificates that can be redeemed for Amazon products, or for cash awards that can be transferred to a personal bank account. It may seem trivial to the human user, but the program learns from this feedback, aggregating all the individual responses into its working model, so it can make better predictions in the future.

In Boyd’s system, the process is iterative, creating a feedback loop. The initial consensus is shared with all the agents, which update their models in light of the new information and reach a second consensus, and so on. The process repeats until all the agents agree. Using this kind of distributed optimization approach significantly cuts down on how much data needs to be transferred at any one time.

The Quantum Question

Late one night, during a swanky Napa Valley conference last year, MIT physicist Seth Lloyd found himself soaking in a hot tub across from Google’s Sergey Brin and Larry Page — any aspiring technology entrepreneur’s dream scenario. Lloyd made his pitch, proposing a quantum version of Google’s search engine whereby users could make queries and receive results without Google knowing which questions were asked. The men were intrigued. But after conferring with their business manager the next day, Brin and Page informed Lloyd that his scheme went against their business plan. “They want to know everything about everybody who uses their products and services,” he joked.

It is easy to grasp why Google might be interested in a quantum computer capable of rapidly searching enormous data sets. A quantum computer, in principle, could offer enormous increases in processing power, running algorithms significantly faster than a classical (non-quantum) machine for certain problems. Indeed, the company just purchased a reportedly $15 million prototype from a Canadian firm called D-Wave Systems, although the jury is still out on whether D-Wave’s product is truly quantum.

“This is not about trying all the possible answers in parallel. It is fundamentally different from parallel processing,” said Aaronson. Whereas a classical computer stores information as bits that can be either 0s or 1s, a quantum computer could exploit an unusual property: the superposition of states. If you flip a regular coin, it will land on heads or tails. There is zero probability that it will be both heads and tails. But if it is a quantum coin, technically, it exists in an indeterminate state of both heads and tails until you look to see the outcome.

A true quantum computer could encode information in so-called qubits that can be 0 and 1 at the same time. Doing so could reduce the time required to solve a difficult problem that would otherwise take several years of computation to mere seconds. But that is easier said than done, not least because such a device would be highly sensitive to outside interference: The slightest perturbation would be equivalent to looking to see if the coin landed heads or tails, and thus undo the superposition.

Data from a seemingly simple query about coffee production across the globe can be surprisingly difficult to integrate. Image: Peter DaSilva for Quanta Magazine

However, Aaronson cautions against placing too much hope in quantum computing to solve big data’s computational challenges, insisting that if and when quantum computers become practical, they will be best suited to very specific tasks, most notably to simulate quantum mechanical systems or to factor large numbers to break codes in classical cryptography. Yet there is one way that quantum computing might be able to assist big data: by searching very large, unsorted data sets — for example, a phone directory in which the names are arranged randomly instead of alphabetically.

It is certainly possible to do so with sheer brute force, using a massively parallel computer to comb through every record. But a quantum computer could accomplish the task in a fraction of the time. That is the thinking behind Grover’s algorithm, which was devised by Bell Labs’ Lov Grover in 1996. However, “to really make it work, you’d need a quantum memory that can be accessed in a quantum superposition,” Aaronson said, but it would need to do so in such a way that the very act of accessing the memory didn’t destroy the superposition, “and that is tricky as hell.”

In short, you need quantum RAM (Q-RAM), and Lloyd has developed a conceptual prototype, along with an accompanying program he calls a Q-App (pronounced “quapp”) targeted to machine learning. He thinks his system could find patterns within data without actually looking at any individual records, thereby preserving the quantum superposition (and the users’ privacy). “You can effectively access all billion items in your database at the same time,” he explained, adding that “you’re not accessing any one of them, you’re accessing common features of all of them.”

For example, if there is ever a giant database storing the genome of every human being on Earth, “you could search for common patterns among different genes” using Lloyd’s quantum algorithm, with Q-RAM and a small 70-qubit quantum processor while still protecting the privacy of the population, Lloyd said. The person doing the search would have access to only a tiny fraction of the individual records, he said, and the search could be done in a short period of time. With the cost of sequencing human genomes dropping and commercial genotyping services rising, it is quite possible that such a database might one day exist, Lloyd said. It could be the ultimate big data set, considering that a single genome is equivalent to 6 billion bits.

Lloyd thinks quantum computing could work well for powerhouse machine-learning algorithms capable of spotting patterns in huge data sets — determining what clusters of data are associated with a keyword, for example, or what pieces of data are similar to one another in some way. “It turns out that many machine-learning algorithms actually work quite nicely in quantum computers, assuming you have a big enough Q-RAM,” he said. “These are exactly the kinds of mathematical problems people try to solve, and we think we could do very well with the quantum version of that.”

The Future Is Integration

“No matter how much you speed up the computers or the way you put computers together, the real issues are at the data level.”

Google’s Alon Halevy believes that the real breakthroughs in big data analysis are likely to come from integration — specifically, integrating across very different data sets. “No matter how much you speed up the computers or the way you put computers together, the real issues are at the data level,” he said. For example, a raw data set could include thousands of different tables scattered around the Web, each one listing crime rates in New York, but each may use different terminology and column headers, known as “schema.” A header of “New York” can describe the state, the five boroughs of New York City, or just Manhattan. You must understand the relationship between the schemas before the data in all those tables can be integrated.

That, in turn, requires breakthroughs in techniques to analyze the semantics of natural language. It is one of the toughest problems in artificial intelligence — if your machine-learning algorithm aspires to perfect understanding of nearly every word. But what if your algorithm needs to understand only enough of the surrounding text to determine whether, for example, a table includes data on coffee production in various countries so that it can then integrate the table with other, similar tables into one common data set? According to Halevy, a researcher could first use a coarse-grained algorithm to parse the underlying semantics of the data as best it could and then adopt a crowd-sourcing approach like a Mechanical Turk to refine the model further through human input. “The humans are training the system without realizing it, and then the system can answer many more questions based on what it has learned,” he said.

Chris Mattmann, a senior computer scientist at NASA’s Jet Propulsion Laboratory and director at theApache Software Foundation, faces just such a complicated scenario with a research project that seeks to integrate two different sources of climate information: remote-sensing observations of the Earth made by satellite instrumentation and computer-simulated climate model outputs. The Intergovernmental Panel on Climate Change would like to be able to compare the various climate models against the hard remote-sensing data to determine which models provide the best fit. But each of those sources stores data in different formats, and there are many different versions of those formats.

Many researchers emphasize the need to develop a broad spectrum of flexible tools that can deal with many different kinds of data. For example, many users are shifting from traditional highly structured relational databases, broadly known as SQL, which represent data in a conventional tabular format, to a more flexible format dubbed NoSQL. “It can be as structured or unstructured as you need it to be,” said Matt LeMay, a product and communications consultant and the former head of consumer products at URL shortening and bookmarking service Bitly, which uses both SQL and NoSQL formats for data storage, depending on the application.

Mattmann cites an Apache software program called Tika that allows the user to integrate data across 1,200 of the most common file formats. But in some cases, some human intervention is still required. Ultimately, Mattmann would like to fully automate this process via intelligent software that can integrate differently structured data sets, much like the Babel Fish in Douglas Adams’ “Hitchhiker’s Guide to the Galaxy” book series enabled someone to understand any language.

Integration across data sets will also require a well-coordinated distributed network system comparable to the one conceived of by Newman’s group at Caltech for the LHC, which monitors tens of thousands of processors and more than 10 major network links. Newman foresees a computational future for big data that relies on this type of automation through well-coordinated armies of intelligent agents, that track the movement of data from one point in the network to another, identifying bottlenecks and scheduling processing tasks. Each might only record what is happening locally but would share the information in such a way as to shed light on the network’s global situation.

“Thousands of agents at different levels are coordinating to help human beings understand what’s going on in a complex and very distributed system,” Newman said. The scale would be even greater in the future, when there would be billions of such intelligent agents, or actors, making up a vast global distributed intelligent entity. “It’s the ability to create those things and have them work on one’s behalf that will reduce the complexity of these operational problems,” he said. “At a certain point, when there’s a complicated problem in such a system, no set of human beings can really understand it all and have access to all the information.”

Software Uses Cyborg Swarm to Map Unknown Environs (Science Daily)

Oct. 16, 2013 — Researchers from North Carolina State University have developed software that allows them to map unknown environments — such as collapsed buildings — based on the movement of a swarm of insect cyborgs, or “biobots.”

Researchers from North Carolina State University have developed software that allows them to map unknown environments — such as collapsed buildings — based on the movement of a swarm of insect cyborgs, or “biobots.” (Credit: Image by Edgar Lobaton.)

“We focused on how to map areas where you have little or no precise information on where each biobot is, such as a collapsed building where you can’t use GPS technology,” says Dr. Edgar Lobaton, an assistant professor of electrical and computer engineering at NC State and senior author of a paper on the research.

“One characteristic of biobots is that their movement can be somewhat random,” Lobaton says. “We’re exploiting that random movement to work in our favor.”

Here’s how the process would work in the field. A swarm of biobots, such as remotely controlled cockroaches, would be equipped with electronic sensors and released into a collapsed building or other hard-to-reach area. The biobots would initially be allowed to move about randomly. Because the biobots couldn’t be tracked by GPS, their precise locations would be unknown. However, the sensors would signal researchers via radio waves whenever biobots got close to each other.

Once the swarm has had a chance to spread out, the researchers would send a signal commanding the biobots to keep moving until they find a wall or other unbroken surface — and then continue moving along the wall. This is called “wall following.”

The researchers repeat this cycle of random movement and “wall following” several times, continually collecting data from the sensors whenever the biobots are near each other. The new software then uses an algorithm to translate the biobot sensor data into a rough map of the unknown environment.

“This would give first responders a good idea of the layout in a previously unmapped area,” Lobaton says.

The software would also allow public safety officials to determine the location of radioactive or chemical threats, if the biobots have been equipped with the relevant sensors.

The researchers have tested the software using computer simulations and are currently testing the program with robots. They plan to work with fellow NC State researcher Dr. Alper Bozkurt to test the program with biobots.

The paper, “Topological Mapping of Unknown Environments using an Unlocalized Robotic Swarm,” will be presented at the International Conference on Intelligent Robots and Systems being held Nov. 3-8 in Tokyo, Japan. Lead author of the paper is Alireza Dirafzoon, a Ph.D. student at NC State. The work was supported by National Science Foundation grant CNS-1239243.

Computer Scientists Suggest New Spin On Origins of Evolvability: Competition to Survive Not Necessary? (Science Daily)

Apr. 26, 2013 — Scientists have long observed that species seem to have become increasingly capable of evolving in response to changes in the environment. But computer science researchers now say that the popular explanation of competition to survive in nature may not actually be necessary for evolvability to increase.

The average evolvability of organisms in each niche at the end of a simulation is shown. The lighter the color, the more evolvable individuals are within that niche. The overall result is that, as in the first model, evolvability increases with increasing distance from the starting niche in the center. (Credit: Joel Lehman, Kenneth O. Stanley. Evolvability Is Inevitable: Increasing Evolvability without the Pressure to Adapt. PLoS ONE, 2013; 8 (4): e62186 DOI: 10.1371/journal.pone.0062186)

In a paper published this week inPLOS ONE, the researchers report that evolvability can increase over generations regardless of whether species are competing for food, habitat or other factors.

Using a simulated model they designed to mimic how organisms evolve, the researchers saw increasing evolvability even without competitive pressure.

“The explanation is that evolvable organisms separate themselves naturally from less evolvable organisms over time simply by becoming increasingly diverse,” said Kenneth O. Stanley, an associate professor at the College of Engineering and Computer Science at the University of Central Florida. He co-wrote the paper about the study along with lead author Joel Lehman, a post-doctoral researcher at the University of Texas at Austin.

The finding could have implications for the origins of evolvability in many species.

“When new species appear in the future, they are most likely descendants of those that were evolvable in the past,” Lehman said. “The result is that evolvable species accumulate over time even without selective pressure.”

During the simulations, the team’s simulated organisms became more evolvable without any pressure from other organisms out-competing them. The simulations were based on a conceptual algorithm.

“The algorithms used for the simulations are abstractly based on how organisms are evolved, but not on any particular real-life organism,” explained Lehman.

The team’s hypothesis is unique and is in contrast to most popular theories for why evolvability increases.

“An important implication of this result is that traditional selective and adaptive explanations for phenomena such as increasing evolvability deserve more scrutiny and may turn out unnecessary in some cases,” Stanley said.

Stanley is an associate professor at UCF. He has a bachelor’s of science in engineering from the University of Pennsylvania and a doctorate in computer science from the University of Texas at Austin. He serves on the editorial boards of several journals. He has over 70 publications in competitive venues and has secured grants worth more than $1 million. His works in artificial intelligence and evolutionary computation have been cited more than 4,000 times.

Journal Reference:

  1. Joel Lehman, Kenneth O. Stanley. Evolvability Is Inevitable: Increasing Evolvability without the Pressure to AdaptPLoS ONE, 2013; 8 (4): e62186 DOI:10.1371/journal.pone.0062186

Desafios do “tsunami de dados” (FAPESP)

Lançado pelo Instituto Microsoft Research-FAPESP de Pesquisas em TI, o livro O Quarto Paradigma debate os desafios da eScience, nova área dedicada a lidar com o imenso volume de informações que caracteriza a ciência atual


Por Fábio de Castro

Agência FAPESP – Se há alguns anos a falta de dados limitava os avanços da ciência, hoje o problema se inverteu. O desenvolvimento de novas tecnologias de captação de dados, nas mais variadas áreas e escalas, tem gerado um volume tão imenso de informações que o excesso se tornou um gargalo para o avanço científico.

Nesse contexto, cientistas da computação têm se unido a especialistas de diferentes áreas para desenvolver novos conceitos e teorias capazes de lidar com a enxurrada de dados da ciência contemporânea. O resultado é chamado de eScience.

Esse é o tema debatido no livro O Quarto Paradigma – Descobertas científicas na era da eScience, lançado no dia 3 de novembro pelo Instituto Microsoft Research-FAPESP de Pesquisas em TI.

Organizado por Tony Hey, Stewart Tansley, Kristin Tolle – todos da Microsoft Research –, a publicação foi lançada na sede da FAPESP, em evento que contou com a presença do diretor científico da Fundação, Carlos Henrique de Brito Cruz.

Durante o lançamento, Roberto Marcondes Cesar Jr., do Instituto de Matemática e Estatística (IME) da Universidade de São Paulo (USP), apresentou a palestra “eScience no Brasil”. “O Quarto Paradigma: computação intensiva de dados avançando a descoberta científica” foi o tema da palestra de Daniel Fay, diretor de Terra, Energia e Meio Ambiente da MSR.

Brito Cruz destacou o interesse da FAPESP em estimular o desenvolvimento da eScience no Brasil. “A FAPESP está muito conectada a essa ideia, porque muitos dos nossos projetos e programas apresentam essa necessidade de mais capacidade de gerenciar grandes conjuntos de dados. O nosso grande desafio está na ciência por trás dessa capacidade de lidar com grandes volumes de dados”, disse.

Iniciativas como o Programa FAPESP de Pesquisa sobre Mudanças Climáticas Globais (PFPMCG), o BIOTA-FAPESP e o Programa FAPESP de Pesquisa em Bioenergia (BIOEN) são exemplos de programas que têm grande necessidade de integrar e processar imensos volumes de dados.

“Sabemos que a ciência avança quando novos instrumentos são disponibilizados. Por outro lado, os cientistas normalmente não percebem o computador como um novo grande instrumento que revoluciona a ciência. A FAPESP está interessada em ações para que a comunidade científica tome consciência de que há grandes desafios na área de eScience”, disse Brito Cruz.

O livro é uma coleção de 26 ensaios técnicos divididos em quatro seções: “Terra e meio ambiente”, “Saúde e bem-estar”, “Infraestrutura científica” e “Comunicação acadêmica”.

“O livro fala da emergência de um novo paradigma para as descobertas científicas. Há milhares de anos, o paradigma vigente era o da ciência experimental, fundamentada na descrição de fenômenos naturais. Há algumas centenas de anos, surgiu o paradigma da ciência teórica, simbolizado pelas leis de Newton. Há algumas décadas, surgiu a ciência computacional, simulando fenômenos complexos. Agora, chegamos ao quarto paradigma, que é o da ciência orientada por dados”, disse Fay.

Com o advento do novo paradigma, segundo ele, houve uma mudança completa na natureza da descoberta científica. Entraram em cena modelos complexos, com amplas escalas espaciais e temporais, que exigem cada vez mais interações multidisciplinares.

“Os dados, em quantidade incrível, são provenientes de diferentes fontes e precisam também de abordagem multidisciplinar e, muitas vezes, de tratamento em tempo real. As comunidades científicas também estão mais distribuídas. Tudo isso transformou a maneira como se fazem descobertas”, disse Fay.

A ecologia, uma das áreas altamente afetadas pelos grandes volumes de dados, é um exemplo de como o avanço da ciência, cada vez mais, dependerá da colaboração entre pesquisadores acadêmicos e especialistas em computação.

“Vivemos em uma tempestade de sensoriamento remoto, sensores terrestres baratos e acesso a dados na internet. Mas extrair as variáveis que a ciência requer dessa massa de dados heterogêneos continua sendo um problema. É preciso ter conhecimento especializado sobre algoritmos, formatos de arquivos e limpeza de dados, por exemplo, que nem sempre é acessível para o pessoal da área de ecologia”, explicou.

O mesmo ocorre em áreas como medicina e biologia – que se beneficiam de novas tecnologias, por exemplo, em registros de atividade cerebral, ou de sequenciamento de DNA – ou a astronomia e física, à medida que os modernos telescópios capturam terabytes de informação diariamente e o Grande Colisor de Hádrons (LHC) gera petabytes de dados a cada ano.

Instituto Virtual

Segundo Cesar Jr., a comunidade envolvida com eScience no Brasil está crescendo. O país tem 2.167 cursos de sistemas de informação ou engenharia e ciências da computação. Em 2009, houve 45 mil formados nessas áreas e a pós-graduação, entre 2007 e 2009, tinha 32 cursos, mil orientadores, 2.705 mestrandos e 410 doutorandos.

“A ciência mudou do paradigma da aquisição de dados para o da análise de dados. Temos diferentes tecnologias que produzem terabytes em diversos campos do conhecimento e, hoje, podemos dizer que essas áreas têm foco na análise de um dilúvio de dados”, disse o membro da Coordenação da Área de Ciência e Engenharia da Computação da FAPESP.

Em 2006, a Sociedade Brasileira de Computação (SBC) organizou um encontro a fim de identificar os problemas-chave e os principais desafios para a área. Isso levou a diferentes propostas para que o Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq) criasse um programa específico para esse tipo de problema.

“Em 2009, realizamos uma série de workshops na FAPESP, reunindo, para discutir essa questão, cientistas de áreas como agricultura, mudanças climáticas, medicina, transcriptômica, games, governo eletrônico e redes sociais. A iniciativa resultou em excelentes colaborações entre grupos de cientistas com problemas semelhantes e originou diversas iniciativas”, disse César Jr.

As chamadas do Instituto Microsoft Research-FAPESP de Pesquisas em TI, segundo ele, têm sido parte importante do conjunto de iniciativas para promover a eScience, assim como a organização da Escola São Paulo de Ciência Avançada em Processamento e Visualização de Imagens Computacionais. Além disso, a FAPESP tem apoiado diversos projetos de pesquisa ligados ao tema.

“A comunidade de eScience em São Paulo tem trabalhado com profissionais de diversas áreas e publicado em revistas de várias delas. Isso é indicação de qualidade adquirida pela comunidade para encarar o grande desafio que teremos nos próximos anos”, disse César Jr., que assina o prefácio da edição brasileira do livro.

  • O Quarto Paradigma
    Organizadores: Tony Hey, Stewart Tansley e Kristin Tolle
    Lançamento: 2011
    Preço: R$ 60
    Páginas: 263
    Mais informações:

NSF seeks cyber infrastructure to make sense of scientific data (Federal Computer Week)

By Camille Tuutti, Oct 04, 2011

The National Science Foundation has tapped a research team at the University of North Carolina-Chapel Hill to develop a national data infrastructure that would help future scientists and researchers manage the data deluge, share information and fuel innovation in the scientific community.

The UNC group will lead the DataNet Federation Consortium, which includes seven universities. The infrastructure that the consortium will try to create would support collaborative multidisciplinary research and will “democratize access to information among researchers and citizen scientists alike,” said Rob Pennington, program director in NSF’s Office of Cyberinfrastructure.

“It means researchers on the cutting edge have access to new, more extensive, multidisciplinary datasets that will enable breakthroughs and the creation of new fields of science and engineering,” he added.

The effort would be a “significant step in the right direction” in solving some of the key problems researchers run into, said Stan Ahalt, director at the Renaissance Computing Institute at UNC-Chapel Hill, which federates the consortium’s data repositories to enable cross-disciplinary research. One of the issues researchers today grapple with is how to best manage data in a way that maximizes its utility to the scientific community, he said. Storing massive quantities of data and the lack of well-designed methods that allow researchers to use unstructured and structured data simultaneously are additional obstacles for researchers, Ahalt added.

The national data infrastructure may not solve everything immediately, he said, “but it will give us a platform for start working meticulously on more long-term rugged solutions or robust solutions.”

DFC will use iRODS, the integrated Rule Oriented Data System, to implement a data management infrastructure. Multiple federal agencies are already using the technology: the NASA Center for Climate Simulation, for example, imported a Moderate Resolution Imaging Spectroradiometer satellite image dataset onto the environment so academic researchers would have access, said Reagan Moore, principal investigator for the Data Intensive Cyber Environments research group at UNC-Chapel Hill that leads the consortium.

It’s very typical for a scientific community to develop a set of practices around a particular methodology of collecting data, Ahalt explained. For example, hydrologists know where their censors are and what those mean from a geographical perspective. Those hydrologists put their data in a certain format that may not be obvious to someone who is, for example, doing atmospheric studies, he said.

“The long-term goal of this effort is to improve the ability to do research,” Moore said. “If I’m a researcher in any given area, I’d like to be able to access data from other people working in the same area, collaborate with them, and then build a new collection that represents the new research results that are found. To do that, I need access to the old research results, to the observational data, to simulations or analyze what happens using computers, etc. These environments then greatly minimize the effort required to manage and distribute a collection and make it available to research.”

For science research as a whole, Ahalt said the infrastructure could mean a lot more than just managing the data deluge or sharing information within the different research communities.

“Data is the currency of the knowledge economy,” he said. “Right now, a lot of what we do collectively and globally from an economic standpoint is highly dependent on our ability to manipulate and analyze data. Data is also the currency of science; it’s our ability to have a national infrastructure that will allow us to share those scientific assets.”

The bottom line: “We’ll be more efficient at producing new science, new innovation and new innovation knowledge,” he said.

About the Author

Camille Tuutti is a staff writer covering the federal workforce.