SCIENTIFIC METHOD 10:23 AM MAR 7, 2016
What are you trying to say
— Stephen Ziliak, Roosevelt University economics professor
How many statisticians does it take to ensure at least a 50 percent chance of a disagreement about p-values? According to a tongue-in-cheek assessment by statistician George Cobb of Mount Holyoke College, the answer is two … or one. So it’s no surprise that when the American Statistical Association gathered 26 experts to develop a consensus statement on statistical significance and p-values, the discussion quickly became heated.
It may sound crazy to get indignant over a scientific term that few lay people have even heard of, but the consequences matter. The misuse of the p-value can drive bad science (there was no disagreement over that), and the consensus project was spurred by a growing worry that in some scientific fields, p-values have become a litmus test for deciding which studies are worthy of publication. As a result, research that produces p-values that surpass an arbitrary threshold are more likely to be published, while studies with greater or equal scientific importance may remain in the file drawer, unseen by the scientific community.
The results can be devastating, said Donald Berry, a biostatistician at the University of Texas MD Anderson Cancer Center. “Patients with serious diseases have been harmed,” he wrote in a commentary published today. “Researchers have chased wild geese, finding too often that statistically significant conclusions could not be reproduced.” Faulty statistical conclusions, he added, have real economic consequences.
“The p-value was never intended to be a substitute for scientific reasoning,” the ASA’s executive director, Ron Wasserstein, said in a press release. On that point, the consensus committee members agreed, but statisticians have deep philosophical differences1 about the proper way to approach inference and statistics, and “this was taken as a battleground for those different views,” said Steven Goodman, co-director of the Meta-Research Innovation Center at Stanford. Much of the dispute centered around technical arguments over frequentist versus Bayesian methods and possible alternatives or supplements to p-values. “There were huge differences, including profoundly different views about the core problems and practices in need of reform,” Goodman said. “People were apoplectic over it.”
The group debated and discussed the issues for more than a year before finally producing a statement they could all sign. They released that consensus statement on Monday, along with 20 additional commentariesfrom members of the committee. The ASA statement is intended to address the misuse of p-values and promote a better understanding of them among researchers and science writers, and it marks the first time the association has taken an official position on a matter of statistical practice. The statement outlines some fundamental principles regarding p-values.
Among the committee’s tasks: Selecting a definition of the p-value that nonstatisticians could understand. They eventually settled on this: “Informally, a p-value is the probability under a specified statistical model that a statistical summary of the data (for example, the sample mean difference between two compared groups) would be equal to or more extreme than its observed value.” That definition is about as clear as mud (I stand by my conclusion that even scientists can’t easily explain p-values), but the rest of the statement and the ideas it presents are far more accessible.
One of the most important messages is that the p-value cannot tell you if your hypothesis is correct. Instead, it’s the probability of your data given your hypothesis. That sounds tantalizingly similar to “the probability of your hypothesis given your data,” but they’re not the same thing, said Stephen Senn, a biostatistician at the Luxembourg Institute of Health. To understand why, consider this example. “Is the pope Catholic? The answer is yes,” said Senn. “Is a Catholic the pope? The answer is probably not. If you change the order, the statement doesn’t survive.”
A common misconception among nonstatisticians is that p-values can tell you the probability that a result occurred by chance. This interpretation is dead wrong, but you see it again and again and again and again. The p-value only tells you something about the probability of seeing your results given a particular hypothetical explanation — it cannot tell you the probability that the results are true or whether they’re due to random chance. The ASA statement’s Principle No. 2: “P-values do not measure the probability that the studied hypothesis is true, or the probability that the data were produced by random chance alone.”
Nor can a p-value tell you the size of an effect, the strength of the evidence or the importance of a result. Yet despite all these limitations, p-values are often used as a way to separate true findings from spurious ones, and that creates perverse incentives. When the goal shifts from seeking the truth to obtaining a p-value that clears an arbitrary threshold (0.05 or less is considered “statistically significant” in many fields), researchers tend to fish around in their data and keep trying different analyses until they find something with the right p-value, as you can see for yourself in a p-hacking tool we built last year.
Indeed, many of the ASA committee’s members argue in their commentaries that the problem isn’t p-values, just the way they’re used — “failing to adjust them for cherry picking, multiple testing, post-data subgroups and other biasing selection effects,” as Deborah Mayo, a philosopher of statistics at Virginia Tech, puts it. When p-values are treated as a way to sort results into bins labeled significant or not significant, the vast efforts to collect and analyze data are degraded into mere labels, said Kenneth Rothman, an epidemiologist at Boston University.
The 20 commentaries published with the ASA statement present a range of ideas about where to go from here. Some committee members argued that there should be a move to rely more on other measures, such as confidence intervals or Bayesian analyses. Others felt that switching to something else would only shift the problem around. “The solution is not to reform p-values or to replace them with some other statistical summary or threshold,” wrote Columbia University statistician Andrew Gelman, “but rather to move toward a greater acceptance of uncertainty and embracing of variation.”
If there’s one takeaway from the ASA statement, it’s that p-values are not badges of truth and p < 0.05 is not a line that separates real results from false ones. They’re simply one piece of a puzzle that should be considered in the context of other evidence.
This story began with a haiku from one of the p-value document’s companion responses; let’s end it with a limerick by University of Michigan biostatistician Roderick Little.
In statistics, one rule did we cherish:
P point oh five we publish, else perish!
Said Val Johnson, “that’s out of date, Our studies don’t replicate
P point oh oh five, then null is rubbish!”
CORRECTION (March 7, 11:05 a.m.): An earlier version of this article misstated the university where Deborah Mayo is a professor. She teaches at Virginia Tech, not the University of Pennsylvania.
- Even the Supreme Court has weighed in, unanimously ruling in 2011 that statistical significance does not automatically equate to scientific or policy importance. ^
Christie Aschwanden is FiveThirtyEight’s lead writer for science.
MAR. 27, 2014, 4:22 PM
At a time when we’re debating the value of majoring in the humanities, major companies are increasingly hiring anthropologists.
Google, for example, hired an ethnographer to ferret out the meaning of mobile. Intel has an in-house cultural anthropologist, and Microsoft is reportedly the second-largest employer of anthropologists in the world.
So the question becomes: Why are giant corporations now seeking cultural expertise?
While most execs are masters of analyzing spreadsheets, creating processes, and pitching products, anthropologists — and other practitioners of applied social science — can arrive at customer insights that big data tends to gloss over, especially around the role that products play in people’s lives.
That information is more valuable than you might think. What customers want from a product and what companies think they want can be totally different, but it can take an anthropological lens to learn why.
Take Adidas, for example. The brand has always been associated with elite performance: Jesse Owens, Muhammad Ali, and Zinedine Zidane all wore the brand. Founded by cobbler and athlete Adi Dassler in 1948, the assumption within the company had been that people bought athletic gear to gain a competitive edge. But in the early 2000s VP James Carnes noticed something strange: He kept running into people who were jogging around the city, headed to the gym, or on their way to yoga.
While they led the active lives of potential customers, these people weren’t training for a competition. “Is yoga a sport?” Carnes asked in an offsite meeting in 2003.
Trying to figure out the disconnect, he brought in a consultancy called Red Associates, which has a client list that includes Intel, Samsung, and Carlsberg, the European beer giant. Unlike elite consulting firms such as McKinsey, Red isn’t in the business of big data and management science. Instead, it focuses on arriving at insights that can only be found through the applied liberal arts, or what it calls “the human sciences,” a strategy that is detailed in its new book “Moment of Clarity: Using the Human Sciences To Solve Your Toughest Business Problems.” That’s why most of the Red’s 70-some employees aren’t MBAs; they come from disciplines like philosophy, sociology, and anthropology.
When Red collaborated with Adidas, it trained members of Adidas’s design team in conducting anthropological research. Design staffers spent 24 hours straight with customers, eating breakfast with them, joining them on runs, and asking them why they worked out. As detailed in the Economist, a Red staffer sent disposable cameras to customers, asking them to take a picture of the reason they exercised. Thirty women responded, and 25 of them sent a picture of a little black dress.
A little black dress is quite different than a marathon finish line or gold trophy.
To use a favorite word of Red partner Christian Madsbjerg, the little black dress shows an “asymmetry.” The traditional thinking at Adidas was that people bought their gear to help them win. But after observing their behavior through the lens of anthropology, it became clear that customers wanted products to help them lead healthy lifestyles, not win competitions.
How had Adidas misunderstood its customers for so long? Because Adidas executives thought they understood their customers’ motivations and lives, but they had never observed them closely enough.
Running, mountain biking, hitting the gym, going to yoga — people did these things to live healthier lives. But these “urban sports” weren’t like the traditional competitions that the company was originally organized around.
That was Carnes’s realization: His consumer’s definition of “sport” had changed, and his company had to change along with it. As described in “Moment of Clarity”:
If urban sports are on par with basketball or soccer, Adidas must then deliver on products with functionality, aesthetics, and quality. Adidas must lead, not copy in this whole new category of lifestyle sport …
The company went from being a sports brand exclusively for athletes … to becoming an inclusive brand inviting all of us to join a movement of living a healthier and better life. It went from creating corporate credos aimed at high-performance sports aficionados, such as “Impossible is nothing,” to sending democratic, yet aspirational message like “All In.”
With the help of Red, Adidas was able to understand the world of its customers. Interestingly, it’s the human sciences — literature, arts, anthropology — that allow for understanding the unique worlds that people live in. By observing people’s daily lives and the ways in which they interact with products, consultancies like Red are able to discern what products mean to customers in a way that big data can’t determine.
Why literature helps you understand customers
“If you look at launches of a new product, most of them fail,” Madsbjerg says. “That’s because people don’t understand the worlds in which we operate.”
The problem with standard corporate research, Madsbjerg says, is that it’s incredibly difficult to get around your own preconceptions. Even if your analytics are fresh, you’ll read old assumptions into them. By applying the humanities, however, you can get around them.
Say, for instance, you read an epic novel by Fyodor Dostoyevsky. In doing so, you’re not just processing words on a page, you’re beginning to understand a character’s world in Russia in a specific place, specific time, and from a specific perspective. To hear Red tell it, making an empathetic understanding of a character in a novel is very much like trying to understand a customer — Ford, after all, would be immensely interested in the world of someone buying a car.
It’s anthropological research, like Red helped the Adidas design team with, that allows for understanding the customer’s world.
This is different from the approach of most corporations, which rely on measures like surveys and focus groups. The problem with those is that people have a terrible time reporting their own preferences, Madsbjerg says. In one Swedish study, for instance, everyone reported that they were an exceptional driver, which is obviously impossible. By the same token, asking customers to tell you why they like a particular vodka doesn’t necessarily reveal their motivations.
That’s why Red emphasizes ethnographic interviewing, where you interview a subject again and again and observe them in a range of environments, looking for patterns of behavior. The long-form, in-depth research helps to reveal the worlds that people live in and their real motivations. Major insights follow — that little black dress told Adidas way more about their customers’ world than a survey ever could.
Finding an industry’s need
In another case, Red consulted for a leading pharmaceutical company specializing in diabetes. Back in the day, it was common practice for sales reps to use a “frequency and reach” strategy, talking to as many doctors as possible and pushing a brand message. The sales reps would get the time with doctors by giving them free flights and concert tickets. But then the law changed, and giving swag to doctors was made illegal. All of a sudden what was once a long courtship turned into a 90-second phone call.
In order to sell drugs in this new situation, they needed to recalibrate the conversation.
During the course of interviewing physicians, Red discovered a major concern that most doctors shared: “How do I get my patients to understand their conditions? How do I change their lifestyles?” Medication, it turned out, was the third most important aspect of treating diabetes — diet and exercise were much more vital.
As a result, Red’s associates worked with doctors to find different ways to help people change their diets, and they worked with sales reps to present that info to doctors. Since so many of the diabetes patients didn’t know how to cook, basic meal preparation became part of the sales material. Correspondingly, the pharmaceutical company became way more resonant: By understanding the world of the doctor, the brand saw a 15% increase in key indicators, like doctors’ trust.
The secret was to understand the world of the physicians and to give them what they needed, even if they didn’t consciously realize it yet.
Fox House, on the outskirts of Sheffield.
Feb 11, 2015
The ethnography I will be conducting for “Picturing the Social’ will be looking at practices of sharing photographs on social media. So is this to be a visual ethnography? A virtual ethnography? Or some kind of combination? Both of these approaches entail different theoretical and methodological models (Ardévol, 2012), which I will now briefly consider, along with outlining where this ethnography is situated in relation. Looking at these fields separately is not to suggest that they do not overlap — on the contrary, I believe that the visual and the virtual share many similarities. Photography is very much a social technology, in that images are typically created with the intention of sharing (Bourdieu, 1990), to the extent that photography has been termed the ‘original’ social media.
The Internet can be used as a means for collecting data, or as the topic of research in itself (Markham and Baym, 2009). Much of the discussion of virtual ethnography considers this first function, in which the Internet is used to access participants. Studies of computer-mediated communication, on the other hand, focus on the specific features of online spaces, such as virtual worlds and games. This particular ethnography will be a combination of both, in that I am not using social media simply to find people to observe, but rather am interested specifically in their online practices. Therefore this is an ethnography of the virtual, rather than an ethnography which makes use of the virtual.
One of my areas of interest relates to the relationship between online and offline space, and the collapse of the division between the two. For example, how does the online construction of notions of Sheffield affect subjects’ experience of it offline? For some members of the social media groups I am considering, their predominant experience of Sheffield is now online, as they live elsewhere — how perhaps should this be conceptualised in regards to the online/offline divide? Additionally, not all online spaces are to be conceptualised alike, as the aims and objectives of virtual worlds, social networks and discussion forums are markedly different from one another. The photography groups I am looking to study as part of this ethnography are communities of interest, in which various motivations — including sharing memories, discussing contemporary issues and soliciting feedback on creative practice — must be explored and understood as affordances of these online spaces.
Internet ethnography offers a useful opportunity to participate in the same settings as participants, and to use the same tools for interactions and expression. This parity of access means that ethnography of online spaces is “meaningfully different” from the study of offline social practices (Kozinets, 2010: 5). Hine conceptualises this difference in terms of an emphasis on flow and connectivity, in contrast to ethnography’s prior focus on location and boundaries (2000). O’Reilly similarly states that virtual ethnography is challenging assumptions of what constitutes a ‘field site’, in that “instead of thinking in terms of places or locations, our Internet ethnographer looks to connections between things” (O’Reilly, 2009: 217). Pink also stresses the importance of considering connections and the “potential forms of relatedness” constituted online, in which online and offline materials and localities “become interwoven in everyday and research narratives” (Pink, 2012). I am particularly interested to explore how theories of place and space will be useful for this ethnography, in that the groups’ focus on Sheffield as a physical and conceptual place is mediated and constituted through online spaces. How do these different notions of place and space entangle, and how do they affect each other in order to create new notions of what constitutes Sheffield and people’s relationship to it? My early observations have already yielded an interesting example of the online representation of a sensory experience of Sheffield as locality and as history — a video uploaded to one Sheffield-themed social media group documents a walk through the post-industrial landscape, in which the participant draws attention to the shift from Sheffield’s identity as a steel working city, to a collection of vacant lots and empty office buildings. The online space is therefore used to provide not just a commentary on contemporary politics, but also to capture a physical experience, and an emotional reaction to it.
Ethnographies frequently use participant-generated photographs to explore the perspectives of those involved, enabling them to ‘speak’ through images (see Mitchell, 2011). As I am not inviting participants to produce materials for this project, but using those that they have made already, this approach is not applicable here. Although I will be considering people’s use of photography to discuss issues that are of relevance to them — relating to history, sport, wildlife, weather and so on — my aim is not to use photography to access those beliefs, but rather to explore the specific role of photographs in this process. Much as I stressed above regarding the virtual, this is not an ethnography that uses the visual, but is rather an ethnography of the visual.
I therefore similarly will not be using images within this ethnography in order to supplement my findings, or to ‘show’ something under the pretence of unmediated communication. This function, in which images act as a kind of supporting evidence, is problematic for numerous reasons, in that it assumes that images can be regarded as objective, but only fragmentary, adjuncts to text. As this ethnography is focused upon the practice and discussion of photography, such an approach to the visual would be inappropriate, as it fails to acknowledge that images must be studied as cultural objects in their own right. Therefore this ethnography of the visual will consider how images — at the level of objects as well as the production of objects — function within broader social relations (Pink, 2012: 5). As such, I will need to employ a range of theoretical approaches, which explore photography as a social process, as a form of identity negotiation, and as a phenomenon that continually remakes its own cultural circumstances of production.
Pauwels (2012) provides a particularly useful overview of conducting visual research, in which the status of the materials, and the extent to which they matter, is of primary concern. This is one of my main topics of investigation — not so much what images are of, but why they matter to people, what they enable viewers to do, say and think, and why they have been shared in the first place. For me, this is the key concern of contemporary visual research: what is it that makes social media photography — from the taking of snaps on Snapchat, to the sharing of photographs on Flickr — so important?
It will be my aim, therefore, to study how the visual and the virtual combine in the notion of ‘photographic sharing’. In particular, the social media communities in which these photographs are circulated will offer an important means for studying how notions of place are negotiated and constituted through the co-presence that is facilitated by looking at images online.
Ardévol, E. (2012) Virtual/Visual Ethnography: Methodological Crossroads at the Intersection of Visual and Internet Research. In: Pink, S. (2012) Advances in Visual Methodology. London: Sage.
Bourdieu, P. (1990) Photography: A Middle-Brow Art. Stanford, CA: Stanford University Press. http://www.sup.org/books/title/?id=2477
Hine, C. (2000) Virtual Ethnography. London: Sage. http://www.uk.sagepub.com/books/Book207267?siteId=sage-uk&prodTypes=any&q=virtual+ethnography&fs=1
Kozinets, R. V. (2010) Netnography: Doing Ethnographic Research Online. London: Sage. http://www.uk.sagepub.com/books/Book233748?siteId=sage-uk&prodTypes=any&q=netnography&fs=1
Markham, A. N. & Baym, N. K. (2009) Internet Inquiry: Conversations about Method. Thousand Oaks, CA: Sage. http://www.uk.sagepub.com/books/Book226985?siteId=sage-uk&prodTypes=any&q=internet+inquiry&fs=1
Mitchell, C. (2011) Doing Visual Research. London: Sage. http://www.uk.sagepub.com/books/Book231677?siteId=sage-uk&prodTypes=any&q=doing+visual+research&fs=1
O’Reilly, K. (2009) Key Concepts in Ethnography. London: Sage. http://www.uk.sagepub.com/booksProdDesc.nav?prodId=Book229834
Pauwels, L. (2012) Contemplating the State of Visual Research. In: Pink, S. (2012) Advances in Visual Methodology. London: Sage.
Pink, S. (2012) Advances in Visual Methodology. London: Sage. http://www.uk.sagepub.com/booksProdDesc.nav?prodId=Book235866
‘Picturing the Social: Transforming our Understanding of Images in Social Media and Big Data research’ is an 18-month research project that started in September 2014 and is based at the University of Sheffield in the United Kingdom. It is funded through an ESRC’s Transformative Research grant and is focused on transforming the social science research landscape by carving out a more central place for image research within the emerging fields of social media and Big Data research. The project aims to better understand the huge volumes of images that are now routinely shared on social media and what this means for society. This project involves an interdisciplinary team of seven researchers from four universities as well as industry with expertise in: Media and Communication Studies (Farida Vis and Anne Burns, University of Sheffield), Visual Culture (Simon Faulkner and Jim Aulich, Manchester School of Art), Software Studies and Sociology (Olga Goriunova, Warwick University), Computer and Information Science (Francesco D’Orazio, Pulsar and Mike Thelwall, University of Wolverhampton). The project is part of the Visual Social Media Lab.
Oct. 1, 2013 — Academics at Coventry University have uncovered complex social networks within age-old Icelandic sagas, which challenge the stereotypical image of Vikings as unworldly, violent savages.
Pádraig Mac Carron and Ralph Kenna from the University’s Applied Mathematics Research Centre have carried out a detailed analysis of the relationships described in ancient Icelandic manuscripts to shed new light on Viking society.
In a study published in the European Physical Journal, Mac Carron and Kenna have asked whether remnants of reality could lurk within the pages of the documents in which Viking sagas were preserved.
They applied methods from statistical physics to social networks — in which nodes (connection points) represent individuals and links represent interactions between them — to home in on the relationships between the characters and societies depicted therein.
The academics used the Sagas of Icelanders — a unique corpus of medieval literature from the period around the settlement of Iceland a thousand years ago — as the basis for their investigation.
Although the historicity of these tales is often questioned, some believe they may contain fictionalised distortions of real societies, and Mac Carron’s and Kenna’s research bolsters this hypothesis.
They mapped out the interactions between over 1,500 characters that appear in 18 sagas including five particularly famous epic tales. Their analyses show, for example, that although an ‘outlaw tale’ has similar properties to other European heroic epics, and the ‘family sagas’ of Icelandic literature are quite distinct, the overall network of saga society is consistent with real social networks.
Moreover, although it is acknowledged that J. R. R. Tolkien was strongly influenced by Nordic literature, the Viking sagas have a different network structure to the Lord of the Rings and other works of fiction.
Professor Ralph Kenna from Coventry University’s Applied Mathematics Research Centre said: “This quantitative investigation is very different to traditional approaches to comparative studies of ancient texts, which focus on qualitative aspects. Rather than individuals and events, the new approach looks at interactions and reveals new insights — that the Icelandic sagas have similar properties to those of real-world social networks.
- P. Mac Carron, R. Kenna. Network analysis of the Íslendinga sögur – the Sagas of Icelanders. The European Physical Journal B, 2013; 86 (10) DOI:10.1140/epjb/e2013-40583-3