Arquivo da tag: Computação quântica

We’re not prepared for the end of Moore’s Law (MIT Technology Review)

technologyreview.com

David Rotman


February 24, 2020

Moore’s argument was an economic one. Integrated circuits, with multiple transistors and other electronic devices interconnected with aluminum metal lines on a tiny square of silicon wafer, had been invented a few years earlier by Robert Noyce at Fairchild Semiconductor. Moore, the company’s R&D director, realized, as he wrote in 1965, that with these new integrated circuits, “the cost per component is nearly inversely proportional to the number of components.” It was a beautiful bargain—in theory, the more transistors you added, the cheaper each one got. Moore also saw that there was plenty of room for engineering advances to increase the number of transistors you could affordably and reliably put on a chip.

Soon these cheaper, more powerful chips would become what economists like to call a general purpose technology—one so fundamental that it spawns all sorts of other innovations and advances in multiple industries. A few years ago, leading economists credited the information technology made possible by integrated circuits with a third of US productivity growth since 1974. Almost every technology we care about, from smartphones to cheap laptops to GPS, is a direct reflection of Moore’s prediction. It has also fueled today’s breakthroughs in artificial intelligence and genetic medicine, by giving machine-learning techniques the ability to chew through massive amounts of data to find answers.

But how did a simple prediction, based on extrapolating from a graph of the number of transistors by year—a graph that at the time had only a few data points—come to define a half-century of progress? In part, at least, because the semiconductor industry decided it would.

Cover of Electronics Magazine April, 1965
The April 1965 Electronics Magazine in which Moore’s article appeared.Wikimedia

Moore wrote that “cramming more components onto integrated circuits,” the title of his 1965 article, would “lead to such wonders as home computers—or at least terminals connected to a central computer—automatic controls for automobiles, and personal portable communications equipment.” In other words, stick to his road map of squeezing ever more transistors onto chips and it would lead you to the promised land. And for the following decades, a booming industry, the government, and armies of academic and industrial researchers poured money and time into upholding Moore’s Law, creating a self-fulfilling prophecy that kept progress on track with uncanny accuracy. Though the pace of progress has slipped in recent years, the most advanced chips today have nearly 50 billion transistors.

Every year since 2001, MIT Technology Review has chosen the 10 most important breakthrough technologies of the year. It’s a list of technologies that, almost without exception, are possible only because of the computation advances described by Moore’s Law.

For some of the items on this year’s list the connection is obvious: consumer devices, including watches and phones, infused with AI; climate-change attribution made possible by improved computer modeling and data gathered from worldwide atmospheric monitoring systems; and cheap, pint-size satellites. Others on the list, including quantum supremacy, molecules discovered using AI, and even anti-aging treatments and hyper-personalized drugs, are due largely to the computational power available to researchers.

But what happens when Moore’s Law inevitably ends? Or what if, as some suspect, it has already died, and we are already running on the fumes of the greatest technology engine of our time?

RIP

“It’s over. This year that became really clear,” says Charles Leiserson, a computer scientist at MIT and a pioneer of parallel computing, in which multiple calculations are performed simultaneously. The newest Intel fabrication plant, meant to build chips with minimum feature sizes of 10 nanometers, was much delayed, delivering chips in 2019, five years after the previous generation of chips with 14-nanometer features. Moore’s Law, Leiserson says, was always about the rate of progress, and “we’re no longer on that rate.” Numerous other prominent computer scientists have also declared Moore’s Law dead in recent years. In early 2019, the CEO of the large chipmaker Nvidia agreed.

In truth, it’s been more a gradual decline than a sudden death. Over the decades, some, including Moore himself at times, fretted that they could see the end in sight, as it got harder to make smaller and smaller transistors. In 1999, an Intel researcher worried that the industry’s goal of making transistors smaller than 100 nanometers by 2005 faced fundamental physical problems with “no known solutions,” like the quantum effects of electrons wandering where they shouldn’t be.

For years the chip industry managed to evade these physical roadblocks. New transistor designs were introduced to better corral the electrons. New lithography methods using extreme ultraviolet radiation were invented when the wavelengths of visible light were too thick to precisely carve out silicon features of only a few tens of nanometers. But progress grew ever more expensive. Economists at Stanford and MIT have calculated that the research effort going into upholding Moore’s Law has risen by a factor of 18 since 1971.

Likewise, the fabs that make the most advanced chips are becoming prohibitively pricey. The cost of a fab is rising at around 13% a year, and is expected to reach $16 billion or more by 2022. Not coincidentally, the number of companies with plans to make the next generation of chips has now shrunk to only three, down from eight in 2010 and 25 in 2002.

Finding successors to today’s silicon chips will take years of research.If you’re worried about what will replace moore’s Law, it’s time to panic.

Nonetheless, Intel—one of those three chipmakers—isn’t expecting a funeral for Moore’s Law anytime soon. Jim Keller, who took over as Intel’s head of silicon engineering in 2018, is the man with the job of keeping it alive. He leads a team of some 8,000 hardware engineers and chip designers at Intel. When he joined the company, he says, many were anticipating the end of Moore’s Law. If they were right, he recalls thinking, “that’s a drag” and maybe he had made “a really bad career move.”

But Keller found ample technical opportunities for advances. He points out that there are probably more than a hundred variables involved in keeping Moore’s Law going, each of which provides different benefits and faces its own limits. It means there are many ways to keep doubling the number of devices on a chip—innovations such as 3D architectures and new transistor designs.

These days Keller sounds optimistic. He says he has been hearing about the end of Moore’s Law for his entire career. After a while, he “decided not to worry about it.” He says Intel is on pace for the next 10 years, and he will happily do the math for you: 65 billion (number of transistors) times 32 (if chip density doubles every two years) is 2 trillion transistors. “That’s a 30 times improvement in performance,” he says, adding that if software developers are clever, we could get chips that are a hundred times faster in 10 years.

Still, even if Intel and the other remaining chipmakers can squeeze out a few more generations of even more advanced microchips, the days when you could reliably count on faster, cheaper chips every couple of years are clearly over. That doesn’t, however, mean the end of computational progress.

Time to panic

Neil Thompson is an economist, but his office is at CSAIL, MIT’s sprawling AI and computer center, surrounded by roboticists and computer scientists, including his collaborator Leiserson. In a new paper, the two document ample room for improving computational performance through better software, algorithms, and specialized chip architecture.

One opportunity is in slimming down so-called software bloat to wring the most out of existing chips. When chips could always be counted on to get faster and more powerful, programmers didn’t need to worry much about writing more efficient code. And they often failed to take full advantage of changes in hardware architecture, such as the multiple cores, or processors, seen in chips used today.

Thompson and his colleagues showed that they could get a computationally intensive calculation to run some 47 times faster just by switching from Python, a popular general-purpose programming language, to the more efficient C. That’s because C, while it requires more work from the programmer, greatly reduces the required number of operations, making a program run much faster. Further tailoring the code to take full advantage of a chip with 18 processing cores sped things up even more. In just 0.41 seconds, the researchers got a result that took seven hours with Python code.

That sounds like good news for continuing progress, but Thompson worries it also signals the decline of computers as a general purpose technology. Rather than “lifting all boats,” as Moore’s Law has, by offering ever faster and cheaper chips that were universally available, advances in software and specialized architecture will now start to selectively target specific problems and business opportunities, favoring those with sufficient money and resources.

Indeed, the move to chips designed for specific applications, particularly in AI, is well under way. Deep learning and other AI applications increasingly rely on graphics processing units (GPUs) adapted from gaming, which can handle parallel operations, while companies like Google, Microsoft, and Baidu are designing AI chips for their own particular needs. AI, particularly deep learning, has a huge appetite for computer power, and specialized chips can greatly speed up its performance, says Thompson.

But the trade-off is that specialized chips are less versatile than traditional CPUs. Thompson is concerned that chips for more general computing are becoming a backwater, slowing “the overall pace of computer improvement,” as he writes in an upcoming paper, “The Decline of Computers as a General Purpose Technology.”

At some point, says Erica Fuchs, a professor of engineering and public policy at Carnegie Mellon, those developing AI and other applications will miss the decreases in cost and increases in performance delivered by Moore’s Law. “Maybe in 10 years or 30 years—no one really knows when—you’re going to need a device with that additional computation power,” she says.

The problem, says Fuchs, is that the successors to today’s general purpose chips are unknown and will take years of basic research and development to create. If you’re worried about what will replace Moore’s Law, she suggests, “the moment to panic is now.” There are, she says, “really smart people in AI who aren’t aware of the hardware constraints facing long-term advances in computing.” What’s more, she says, because application–specific chips are proving hugely profitable, there are few incentives to invest in new logic devices and ways of doing computing.

Wanted: A Marshall Plan for chips

In 2018, Fuchs and her CMU colleagues Hassan Khan and David Hounshell wrote a paper tracing the history of Moore’s Law and identifying the changes behind today’s lack of the industry and government collaboration that fostered so much progress in earlier decades. They argued that “the splintering of the technology trajectories and the short-term private profitability of many of these new splinters” means we need to greatly boost public investment in finding the next great computer technologies.

If economists are right, and much of the growth in the 1990s and early 2000s was a result of microchips—and if, as some suggest, the sluggish productivity growth that began in the mid-2000s reflects the slowdown in computational progress—then, says Thompson, “it follows you should invest enormous amounts of money to find the successor technology. We’re not doing it. And it’s a public policy failure.”

There’s no guarantee that such investments will pay off. Quantum computing, carbon nanotube transistors, even spintronics, are enticing possibilities—but none are obvious replacements for the promise that Gordon Moore first saw in a simple integrated circuit. We need the research investments now to find out, though. Because one prediction is pretty much certain to come true: we’re always going to want more computing power.

This story was part of our March 2020 issue.

The predictions issue

The End of Theory: The Data Deluge Makes the Scientific Method Obsolete (Wired)

wired.com

Chris Anderson, Science, 06.23.2008 12:00 PM


Illustration: Marian Bantjes “All models are wrong, but some are useful.”

So proclaimed statistician George Box 30 years ago, and he was right. But what choice did we have? Only models, from cosmological equations to theories of human behavior, seemed to be able to consistently, if imperfectly, explain the world around us. Until now. Today companies like Google, which have grown up in an era of massively abundant data, don’t have to settle for wrong models. Indeed, they don’t have to settle for models at all.

Sixty years ago, digital computers made information readable. Twenty years ago, the Internet made it reachable. Ten years ago, the first search engine crawlers made it a single database. Now Google and like-minded companies are sifting through the most measured age in history, treating this massive corpus as a laboratory of the human condition. They are the children of the Petabyte Age.

The Petabyte Age is different because more is different. Kilobytes were stored on floppy disks. Megabytes were stored on hard disks. Terabytes were stored in disk arrays. Petabytes are stored in the cloud. As we moved along that progression, we went from the folder analogy to the file cabinet analogy to the library analogy to — well, at petabytes we ran out of organizational analogies.

At the petabyte scale, information is not a matter of simple three- and four-dimensional taxonomy and order but of dimensionally agnostic statistics. It calls for an entirely different approach, one that requires us to lose the tether of data as something that can be visualized in its totality. It forces us to view data mathematically first and establish a context for it later. For instance, Google conquered the advertising world with nothing more than applied mathematics. It didn’t pretend to know anything about the culture and conventions of advertising — it just assumed that better data, with better analytical tools, would win the day. And Google was right.

Google’s founding philosophy is that we don’t know why this page is better than that one: If the statistics of incoming links say it is, that’s good enough. No semantic or causal analysis is required. That’s why Google can translate languages without actually “knowing” them (given equal corpus data, Google can translate Klingon into Farsi as easily as it can translate French into German). And why it can match ads to content without any knowledge or assumptions about the ads or the content.

Speaking at the O’Reilly Emerging Technology Conference this past March, Peter Norvig, Google’s research director, offered an update to George Box’s maxim: “All models are wrong, and increasingly you can succeed without them.”

This is a world where massive amounts of data and applied mathematics replace every other tool that might be brought to bear. Out with every theory of human behavior, from linguistics to sociology. Forget taxonomy, ontology, and psychology. Who knows why people do what they do? The point is they do it, and we can track and measure it with unprecedented fidelity. With enough data, the numbers speak for themselves.

The big target here isn’t advertising, though. It’s science. The scientific method is built around testable hypotheses. These models, for the most part, are systems visualized in the minds of scientists. The models are then tested, and experiments confirm or falsify theoretical models of how the world works. This is the way science has worked for hundreds of years.

Scientists are trained to recognize that correlation is not causation, that no conclusions should be drawn simply on the basis of correlation between X and Y (it could just be a coincidence). Instead, you must understand the underlying mechanisms that connect the two. Once you have a model, you can connect the data sets with confidence. Data without a model is just noise.

But faced with massive data, this approach to science — hypothesize, model, test — is becoming obsolete. Consider physics: Newtonian models were crude approximations of the truth (wrong at the atomic level, but still useful). A hundred years ago, statistically based quantum mechanics offered a better picture — but quantum mechanics is yet another model, and as such it, too, is flawed, no doubt a caricature of a more complex underlying reality. The reason physics has drifted into theoretical speculation about n-dimensional grand unified models over the past few decades (the “beautiful story” phase of a discipline starved of data) is that we don’t know how to run the experiments that would falsify the hypotheses — the energies are too high, the accelerators too expensive, and so on.

Now biology is heading in the same direction. The models we were taught in school about “dominant” and “recessive” genes steering a strictly Mendelian process have turned out to be an even greater simplification of reality than Newton’s laws. The discovery of gene-protein interactions and other aspects of epigenetics has challenged the view of DNA as destiny and even introduced evidence that environment can influence inheritable traits, something once considered a genetic impossibility.

In short, the more we learn about biology, the further we find ourselves from a model that can explain it.

There is now a better way. Petabytes allow us to say: “Correlation is enough.” We can stop looking for models. We can analyze the data without hypotheses about what it might show. We can throw the numbers into the biggest computing clusters the world has ever seen and let statistical algorithms find patterns where science cannot.

The best practical example of this is the shotgun gene sequencing by J. Craig Venter. Enabled by high-speed sequencers and supercomputers that statistically analyze the data they produce, Venter went from sequencing individual organisms to sequencing entire ecosystems. In 2003, he started sequencing much of the ocean, retracing the voyage of Captain Cook. And in 2005 he started sequencing the air. In the process, he discovered thousands of previously unknown species of bacteria and other life-forms.

If the words “discover a new species” call to mind Darwin and drawings of finches, you may be stuck in the old way of doing science. Venter can tell you almost nothing about the species he found. He doesn’t know what they look like, how they live, or much of anything else about their morphology. He doesn’t even have their entire genome. All he has is a statistical blip — a unique sequence that, being unlike any other sequence in the database, must represent a new species.

This sequence may correlate with other sequences that resemble those of species we do know more about. In that case, Venter can make some guesses about the animals — that they convert sunlight into energy in a particular way, or that they descended from a common ancestor. But besides that, he has no better model of this species than Google has of your MySpace page. It’s just data. By analyzing it with Google-quality computing resources, though, Venter has advanced biology more than anyone else of his generation.

This kind of thinking is poised to go mainstream. In February, the National Science Foundation announced the Cluster Exploratory, a program that funds research designed to run on a large-scale distributed computing platform developed by Google and IBM in conjunction with six pilot universities. The cluster will consist of 1,600 processors, several terabytes of memory, and hundreds of terabytes of storage, along with the software, including IBM’s Tivoli and open source versions of Google File System and MapReduce.111 Early CluE projects will include simulations of the brain and the nervous system and other biological research that lies somewhere between wetware and software.

Learning to use a “computer” of this scale may be challenging. But the opportunity is great: The new availability of huge amounts of data, along with the statistical tools to crunch these numbers, offers a whole new way of understanding the world. Correlation supersedes causation, and science can advance even without coherent models, unified theories, or really any mechanistic explanation at all.

There’s no reason to cling to our old ways. It’s time to ask: What can science learn from Google?

Chris Anderson (canderson@wired.com) is the editor in chief of Wired.

Related The Petabyte Age: Sensors everywhere. Infinite storage. Clouds of processors. Our ability to capture, warehouse, and understand massive amounts of data is changing science, medicine, business, and technology. As our collection of facts and figures grows, so will the opportunity to find answers to fundamental questions. Because in the era of big data, more isn’t just more. More is different.

Correction:
1 This story originally stated that the cluster software would include the actual Google File System.
06.27.08

Algoritmo quântico mostrou-se mais eficaz do que qualquer análogo clássico (Revista Fapesp)

11 de dezembro de 2015

José Tadeu Arantes | Agência FAPESP – O computador quântico poderá deixar de ser um sonho e se tornar realidade nos próximos 10 anos. A expectativa é que isso traga uma drástica redução no tempo de processamento, já que algoritmos quânticos oferecem soluções mais eficientes para certas tarefas computacionais do que quaisquer algoritmos clássicos correspondentes.

Até agora, acreditava-se que a chave da computação quântica eram as correlações entre dois ou mais sistemas. Exemplo de correlação quântica é o processo de “emaranhamento”, que ocorre quando pares ou grupos de partículas são gerados ou interagem de tal maneira que o estado quântico de cada partícula não pode ser descrito independentemente, já que depende do conjunto (Para mais informações veja agencia.fapesp.br/20553/).

Um estudo recente mostrou, no entanto, que mesmo um sistema quântico isolado, ou seja, sem correlações com outros sistemas, é suficiente para implementar um algoritmo quântico mais rápido do que o seu análogo clássico. Artigo descrevendo o estudo foi publicado no início de outubro deste ano na revista Scientific Reports, do grupo Nature: Computational speed-up with a single qudit.

O trabalho, ao mesmo tempo teórico e experimental, partiu de uma ideia apresentada pelo físico Mehmet Zafer Gedik, da Sabanci Üniversitesi, de Istambul, Turquia. E foi realizado mediante colaboração entre pesquisadores turcos e brasileiros. Felipe Fernandes Fanchini, da Faculdade de Ciências da Universidade Estadual Paulista (Unesp), no campus de Bauru, é um dos signatários do artigo. Sua participação no estudo se deu no âmbito do projeto Controle quântico em sistemas dissipativos, apoiado pela FAPESP.

“Este trabalho traz uma importante contribuição para o debate sobre qual é o recurso responsável pelo poder de processamento superior dos computadores quânticos”, disse Fanchini à Agência FAPESP.

“Partindo da ideia de Gedik, realizamos no Brasil um experimento, utilizando o sistema de ressonância magnética nuclear (RMN) da Universidade de São Paulo (USP) em São Carlos. Houve, então, a colaboração de pesquisadores de três universidades: Sabanci, Unesp e USP. E demonstramos que um circuito quântico dotado de um único sistema físico, com três ou mais níveis de energia, pode determinar a paridade de uma permutação numérica avaliando apenas uma vez a função. Isso é impensável em um protocolo clássico.”

Segundo Fanchini, o que Gedik propôs foi um algoritmo quântico muito simples que, basicamente, determina a paridade de uma sequência. O conceito de paridade é utilizado para informar se uma sequência está em determinada ordem ou não. Por exemplo, se tomarmos os algarismos 1, 2 e 3 e estabelecermos que a sequência 1- 2-3 está em ordem, as sequências 2-3-1 e 3-1-2, resultantes de permutações cíclicas dos algarismos, estarão na mesma ordem.

Isso é fácil de entender se imaginarmos os algarismos dispostos em uma circunferência. Dada a primeira sequência, basta girar uma vez em um sentido para obter a sequência seguinte, e girar mais uma vez para obter a outra. Porém, as sequências 1-3-2, 3-2-1 e 2-1-3 necessitam, para serem criadas, de permutações acíclicas. Então, se convencionarmos que as três primeiras sequências são “pares”, as outras três serão “ímpares”.

“Em termos clássicos, a observação de um único algarismo, ou seja uma única medida, não permite dizer se a sequência é par ou ímpar. Para isso, é preciso realizar ao menos duas observações. O que Gedik demonstrou foi que, em termos quânticos, uma única medida é suficiente para determinar a paridade. Por isso, o algoritmo quântico é mais rápido do que qualquer equivalente clássico. E esse algoritmo pode ser concretizado por meio de uma única partícula. O que significa que sua eficiência não depende de nenhum tipo de correlação quântica”, informou Fanchini.

O algoritmo em pauta não diz qual é a sequência. Mas informa se ela é par ou ímpar. Isso só é possível quando existem três ou mais níveis. Porque, havendo apenas dois níveis, algo do tipo 1-2 ou 2-1, não é possível definir uma sequência par ou ímpar. “Nos últimos tempos, a comunidade voltada para a computação quântica vem explorando um conceito-chave da teoria quântica, que é o conceito de ‘contextualidade’. Como a ‘contextualidade’ também só opera a partir de três ou mais níveis, suspeitamos que ela possa estar por trás da eficácia de nosso algoritmo”, acrescentou o pesquisador.

Conceito de contextulidade

“O conceito de ‘contextualidade’ pode ser melhor entendido comparando-se as ideias de mensuração da física clássica e da física quântica. Na física clássica, supõe-se que a mensuração nada mais faça do que desvelar características previamente possuídas pelo sistema que está sendo medido. Por exemplo, um determinado comprimento ou uma determinada massa. Já na física quântica, o resultado da mensuração não depende apenas da característica que está sendo medida, mas também de como foi organizada a mensuração, e de todas as mensurações anteriores. Ou seja, o resultado depende do contexto do experimento. E a ‘contextualidade’ é a grandeza que descreve esse contexto”, explicou Fanchini.

Na história da física, a “contextualidade” foi reconhecida como uma característica necessária da teoria quântica por meio do famoso Teorema de Bell. Segundo esse teorema, publicado em 1964 pelo físico irlandês John Stewart Bell (1928 – 1990), nenhuma teoria física baseada em variáveis locais pode reproduzir todas as predições da mecânica quântica. Em outras palavras, os fenômenos físicos não podem ser descritos em termos estritamente locais, uma vez que expressam a totalidade.

“É importante frisar que em outro artigo [Contextuality supplies the ‘magic’ for quantum computation] publicado na Nature em junho de 2014, aponta a contextualidade como a possível fonte do poder da computação quântica. Nosso estudo vai no mesmo sentido, apresentando um algoritmo concreto e mais eficiente do que qualquer um jamais imaginável nos moldes clássicos.”

How Quantum Computers and Machine Learning Will Revolutionize Big Data (Wired)

BY JENNIFER OUELLETTE, QUANTA MAGAZINE

10.14.13

Image: infocux Technologies/Flickr

When subatomic particles smash together at the Large Hadron Collider in Switzerland, they create showers of new particles whose signatures are recorded by four detectors. The LHC captures 5 trillion bits of data — more information than all of the world’s libraries combined — every second. After the judicious application of filtering algorithms, more than 99 percent of those data are discarded, but the four experiments still produce a whopping 25 petabytes (25×1015 bytes) of data per year that must be stored and analyzed. That is a scale far beyond the computing resources of any single facility, so the LHC scientists rely on a vast computing grid of 160 data centers around the world, a distributed network that is capable of transferring as much as 10 gigabytes per second at peak performance.

The LHC’s approach to its big data problem reflects just how dramatically the nature of computing has changed over the last decade. Since Intel co-founder Gordon E. Moore first defined it in 1965, the so-called Moore’s law — which predicts that the number of transistors on integrated circuits will double every two years — has dominated the computer industry. While that growth rate has proved remarkably resilient, for now, at least, “Moore’s law has basically crapped out; the transistors have gotten as small as people know how to make them economically with existing technologies,” said Scott Aaronson, a theoretical computer scientist at the Massachusetts Institute of Technology.

Instead, since 2005, many of the gains in computing power have come from adding more parallelism via multiple cores, with multiple levels of memory. The preferred architecture no longer features a single central processing unit (CPU) augmented with random access memory (RAM) and a hard drive for long-term storage. Even the big, centralized parallel supercomputers that dominated the 1980s and 1990s are giving way to distributed data centers and cloud computing, often networked across many organizations and vast geographical distances.

These days, “People talk about a computing fabric,” said Stanford University electrical engineerStephen Boyd. These changes in computer architecture translate into the need for a different computational approach when it comes to handling big data, which is not only grander in scope than the large data sets of yore but also intrinsically different from them.

The demand for ever-faster processors, while important, isn’t the primary focus anymore. “Processing speed has been completely irrelevant for five years,” Boyd said. “The challenge is not how to solve problems with a single, ultra-fast processor, but how to solve them with 100,000 slower processors.” Aaronson points out that many problems in big data can’t be adequately addressed by simply adding more parallel processing. These problems are “more sequential, where each step depends on the outcome of the preceding step,” he said. “Sometimes, you can split up the work among a bunch of processors, but other times, that’s harder to do.” And often the software isn’t written to take full advantage of the extra processors. “If you hire 20 people to do something, will it happen 20 times faster?” Aaronson said. “Usually not.”

Researchers also face challenges in integrating very differently structured data sets, as well as the difficulty of moving large amounts of data efficiently through a highly distributed network.

Those issues will become more pronounced as the size and complexity of data sets continue to grow faster than computing resources, according to California Institute of Technology physicist Harvey Newman, whose team developed the LHC’s grid of data centers and trans-Atlantic network. He estimates that if current trends hold, the computational needs of big data analysis will place considerable strain on the computing fabric. “It requires us to think about a different kind of system,” he said.

Memory and Movement

Emmanuel Candes, an applied mathematician at Stanford University, was once able to crunch big data problems on his desktop computer. But last year, when he joined a collaboration of radiologists developing dynamic magnetic resonance imaging — whereby one could record a patient’s heartbeat in real time using advanced algorithms to create high-resolution videos from limited MRI measurements — he found that the data no longer fit into his computer’s memory, making it difficult to perform the necessary analysis.

Addressing the storage-capacity challenges of big data is not simply a matter of building more memory, which has never been more plentiful. It is also about managing the movement of data. That’s because, increasingly, the desired data is no longer at people’s fingertips, stored in a single computer; it is distributed across multiple computers in a large data center or even in the “cloud.”There is a hierarchy to data storage, ranging from the slowest, cheapest and most abundant memory to the fastest and most expensive, with the least available space. At the bottom of this hierarchy is so-called “slow memory” such as hard drives and flash drives, the cost of which continues to drop. There is more space on hard drives, compared to the other kinds of memory, but saving and retrieving the data takes longer. Next up this ladder comes RAM, which is must faster than slow memory but offers less space is more expensive. Then there is cache memory — another trade-off of space and price in exchange for faster retrieval speeds — and finally the registers on the microchip itself, which are the fastest of all but the priciest to build, with the least available space. If memory storage were like real estate, a hard drive would be a sprawling upstate farm, RAM would be a medium-sized house in the suburbs, cache memory would be a townhouse on the outskirts of a big city, and the register memory would be a tiny studio in a prime urban location.

Longer commutes for stored data translate into processing delays. “When computers are slow today, it’s not because of the microprocessor,” Aaronson said. “The microprocessor is just treading water waiting for the disk to come back with the data.” Big data researchers prefer to minimize how much data must be moved back and forth from slow memory to fast memory. The problem is exacerbated when the data is distributed across a network or in the cloud, because it takes even longer to move the data back and forth, depending on bandwidth capacity, so that it can be analyzed.

One possible solution to this dilemma is to embrace the new paradigm. In addition to distributed storage, why not analyze the data in a distributed way as well, with each unit (or node) in a network of computers performing a small piece of a computation? Each partial solution is then integrated to find the full result. This approach is similar in concept to the LHC’s, in which one complete copy of the raw data (after filtering) is stored at the CERN research facility in Switzerland that is home to the collider. A second copy is divided into batches that are then distributed to data centers around the world. Each center analyzes its chunk of data and transmits the results to regional computers before moving on to the next batch.

Alon Halevy, a computer scientist at Google, says the biggest breakthroughs in big data are likely to come from data integration.Image: Peter DaSilva for Quanta Magazine

Boyd’s system is based on so-calledconsensus algorithms. “It’s a mathematical optimization problem,” he said of the algorithms. “You are using past data to train the model in hopes that it will work on future data.” Such algorithms are useful for creating an effective SPAM filter, for example, or for detecting fraudulent bank transactions.

This can be done on a single computer, with all the data in one place. Machine learning typically uses many processors, each handling a little bit of the problem. But when the problem becomes too large for a single machine, a consensus optimization approach might work better, in which the data set is chopped into bits and distributed across 1,000 “agents” that analyze their bit of data and each produce a model based on the data they have processed. The key is to require a critical condition to be met: although each agent’s model can be different, all the models must agree in the end — hence the term “consensus algorithms.”

The process by which 1,000 individual agents arrive at a consensus model is similar in concept to the Mechanical Turk crowd-sourcing methodology employed by Amazon — with a twist. With the Mechanical Turk, a person or a business can post a simple task, such as determining which photographs contain a cat, and ask the crowd to complete the task in exchange for gift certificates that can be redeemed for Amazon products, or for cash awards that can be transferred to a personal bank account. It may seem trivial to the human user, but the program learns from this feedback, aggregating all the individual responses into its working model, so it can make better predictions in the future.

In Boyd’s system, the process is iterative, creating a feedback loop. The initial consensus is shared with all the agents, which update their models in light of the new information and reach a second consensus, and so on. The process repeats until all the agents agree. Using this kind of distributed optimization approach significantly cuts down on how much data needs to be transferred at any one time.

The Quantum Question

Late one night, during a swanky Napa Valley conference last year, MIT physicist Seth Lloyd found himself soaking in a hot tub across from Google’s Sergey Brin and Larry Page — any aspiring technology entrepreneur’s dream scenario. Lloyd made his pitch, proposing a quantum version of Google’s search engine whereby users could make queries and receive results without Google knowing which questions were asked. The men were intrigued. But after conferring with their business manager the next day, Brin and Page informed Lloyd that his scheme went against their business plan. “They want to know everything about everybody who uses their products and services,” he joked.

It is easy to grasp why Google might be interested in a quantum computer capable of rapidly searching enormous data sets. A quantum computer, in principle, could offer enormous increases in processing power, running algorithms significantly faster than a classical (non-quantum) machine for certain problems. Indeed, the company just purchased a reportedly $15 million prototype from a Canadian firm called D-Wave Systems, although the jury is still out on whether D-Wave’s product is truly quantum.

“This is not about trying all the possible answers in parallel. It is fundamentally different from parallel processing,” said Aaronson. Whereas a classical computer stores information as bits that can be either 0s or 1s, a quantum computer could exploit an unusual property: the superposition of states. If you flip a regular coin, it will land on heads or tails. There is zero probability that it will be both heads and tails. But if it is a quantum coin, technically, it exists in an indeterminate state of both heads and tails until you look to see the outcome.

A true quantum computer could encode information in so-called qubits that can be 0 and 1 at the same time. Doing so could reduce the time required to solve a difficult problem that would otherwise take several years of computation to mere seconds. But that is easier said than done, not least because such a device would be highly sensitive to outside interference: The slightest perturbation would be equivalent to looking to see if the coin landed heads or tails, and thus undo the superposition.

Data from a seemingly simple query about coffee production across the globe can be surprisingly difficult to integrate. Image: Peter DaSilva for Quanta Magazine

However, Aaronson cautions against placing too much hope in quantum computing to solve big data’s computational challenges, insisting that if and when quantum computers become practical, they will be best suited to very specific tasks, most notably to simulate quantum mechanical systems or to factor large numbers to break codes in classical cryptography. Yet there is one way that quantum computing might be able to assist big data: by searching very large, unsorted data sets — for example, a phone directory in which the names are arranged randomly instead of alphabetically.

It is certainly possible to do so with sheer brute force, using a massively parallel computer to comb through every record. But a quantum computer could accomplish the task in a fraction of the time. That is the thinking behind Grover’s algorithm, which was devised by Bell Labs’ Lov Grover in 1996. However, “to really make it work, you’d need a quantum memory that can be accessed in a quantum superposition,” Aaronson said, but it would need to do so in such a way that the very act of accessing the memory didn’t destroy the superposition, “and that is tricky as hell.”

In short, you need quantum RAM (Q-RAM), and Lloyd has developed a conceptual prototype, along with an accompanying program he calls a Q-App (pronounced “quapp”) targeted to machine learning. He thinks his system could find patterns within data without actually looking at any individual records, thereby preserving the quantum superposition (and the users’ privacy). “You can effectively access all billion items in your database at the same time,” he explained, adding that “you’re not accessing any one of them, you’re accessing common features of all of them.”

For example, if there is ever a giant database storing the genome of every human being on Earth, “you could search for common patterns among different genes” using Lloyd’s quantum algorithm, with Q-RAM and a small 70-qubit quantum processor while still protecting the privacy of the population, Lloyd said. The person doing the search would have access to only a tiny fraction of the individual records, he said, and the search could be done in a short period of time. With the cost of sequencing human genomes dropping and commercial genotyping services rising, it is quite possible that such a database might one day exist, Lloyd said. It could be the ultimate big data set, considering that a single genome is equivalent to 6 billion bits.

Lloyd thinks quantum computing could work well for powerhouse machine-learning algorithms capable of spotting patterns in huge data sets — determining what clusters of data are associated with a keyword, for example, or what pieces of data are similar to one another in some way. “It turns out that many machine-learning algorithms actually work quite nicely in quantum computers, assuming you have a big enough Q-RAM,” he said. “These are exactly the kinds of mathematical problems people try to solve, and we think we could do very well with the quantum version of that.”

The Future Is Integration

“No matter how much you speed up the computers or the way you put computers together, the real issues are at the data level.”

Google’s Alon Halevy believes that the real breakthroughs in big data analysis are likely to come from integration — specifically, integrating across very different data sets. “No matter how much you speed up the computers or the way you put computers together, the real issues are at the data level,” he said. For example, a raw data set could include thousands of different tables scattered around the Web, each one listing crime rates in New York, but each may use different terminology and column headers, known as “schema.” A header of “New York” can describe the state, the five boroughs of New York City, or just Manhattan. You must understand the relationship between the schemas before the data in all those tables can be integrated.

That, in turn, requires breakthroughs in techniques to analyze the semantics of natural language. It is one of the toughest problems in artificial intelligence — if your machine-learning algorithm aspires to perfect understanding of nearly every word. But what if your algorithm needs to understand only enough of the surrounding text to determine whether, for example, a table includes data on coffee production in various countries so that it can then integrate the table with other, similar tables into one common data set? According to Halevy, a researcher could first use a coarse-grained algorithm to parse the underlying semantics of the data as best it could and then adopt a crowd-sourcing approach like a Mechanical Turk to refine the model further through human input. “The humans are training the system without realizing it, and then the system can answer many more questions based on what it has learned,” he said.

Chris Mattmann, a senior computer scientist at NASA’s Jet Propulsion Laboratory and director at theApache Software Foundation, faces just such a complicated scenario with a research project that seeks to integrate two different sources of climate information: remote-sensing observations of the Earth made by satellite instrumentation and computer-simulated climate model outputs. The Intergovernmental Panel on Climate Change would like to be able to compare the various climate models against the hard remote-sensing data to determine which models provide the best fit. But each of those sources stores data in different formats, and there are many different versions of those formats.

Many researchers emphasize the need to develop a broad spectrum of flexible tools that can deal with many different kinds of data. For example, many users are shifting from traditional highly structured relational databases, broadly known as SQL, which represent data in a conventional tabular format, to a more flexible format dubbed NoSQL. “It can be as structured or unstructured as you need it to be,” said Matt LeMay, a product and communications consultant and the former head of consumer products at URL shortening and bookmarking service Bitly, which uses both SQL and NoSQL formats for data storage, depending on the application.

Mattmann cites an Apache software program called Tika that allows the user to integrate data across 1,200 of the most common file formats. But in some cases, some human intervention is still required. Ultimately, Mattmann would like to fully automate this process via intelligent software that can integrate differently structured data sets, much like the Babel Fish in Douglas Adams’ “Hitchhiker’s Guide to the Galaxy” book series enabled someone to understand any language.

Integration across data sets will also require a well-coordinated distributed network system comparable to the one conceived of by Newman’s group at Caltech for the LHC, which monitors tens of thousands of processors and more than 10 major network links. Newman foresees a computational future for big data that relies on this type of automation through well-coordinated armies of intelligent agents, that track the movement of data from one point in the network to another, identifying bottlenecks and scheduling processing tasks. Each might only record what is happening locally but would share the information in such a way as to shed light on the network’s global situation.

“Thousands of agents at different levels are coordinating to help human beings understand what’s going on in a complex and very distributed system,” Newman said. The scale would be even greater in the future, when there would be billions of such intelligent agents, or actors, making up a vast global distributed intelligent entity. “It’s the ability to create those things and have them work on one’s behalf that will reduce the complexity of these operational problems,” he said. “At a certain point, when there’s a complicated problem in such a system, no set of human beings can really understand it all and have access to all the information.”