Companies Are Running Experiments on Millions of Users Every Year. What Are the Ethical Implications?

July 10, 2020

by Stefan Thomke

Leading companies – including Microsoft, Amazon, Booking.com, Facebook and Google – each conduct more than ten thousand online controlled experiments annually, which individually engage millions of users. Startups and companies without digital roots, such as Walmart, State Farm Insurance, Nike, FedEx, the New York Times Company, and the BBC, also run them regularly, though on a smaller scale. These organizations have discovered that an “everything is a test” mentality yields surprisingly large payoffs and competitive benefits, and may even help stock performance.

Rigorous experiments allow companies to assess not only ideas for websites but also potential business models, strategies, products, services, and marketing campaigns—all relatively inexpensively. Online experiments can transform exploration and optimization into a scientific, evidence-driven process, rather than a guessing game that it is guided by intuition, hierarchy, and commonly held—but often mistaken—beliefs. And it can all be done at a huge scale. Without experiments, many breakthroughs might never happen. And many bad ideas would be implemented, only to fail, wasting resources.

Rigorous experiments should be standard operating procedure for everyone. Yet experimenting on customers raises important ethical questions. Consider what happened to Facebook when, in 2012, it ran a weeklong experiment in which it studied whether emotional states can be transferred to others through online social networks. By 2011, over 4.7 million person-hours per day were being spent on Facebook, not including its mobile app—a big change in how humans interacted with each other since social media had been introduced not that long ago. Not surprisingly, the harmful psychological effects on its 1.35 billion users were widely debated in the public sphere, and competing hypotheses emerged. So Facebook decided to investigate. Using the social network’s News Feed—an algorithmically curated list of news (posts, stories, activities) about your Facebook friends—the company tested if viewing fewer positive news stories led to a reduction in positive posts by users. It also tested if the opposite happened when users were exposed to fewer negative news stories. The experiment involved 689,003 randomly selected users; about 310,000 (155,000 per condition) unwitting participants were exposed to manipulated emotional expressions in their News Feed, and the remaining users were subjected to control conditions in which a corresponding fraction of stories were randomly omitted.

In June 2014, researchers from Facebook and Cornell University published the results of the experiment in an academic journal, under the provocative title “Experimental Evidence of Massive-Scale Emotional Contagion through Social Networks.” The public outrage came swiftly. Facebook’s data science team had been running experiments on unsuspecting users for years without controversy, but the idea that the company could manipulate emotions struck a nerve. Critics raised concerns about Facebook inflicting psychological harm on its users and depriving them of information necessary for consent. The Wall Street Journal covered the “scandalous” experiment on its front page (Facebook Experiments Had Few Limits), and editors of the academic journal issued an unusual “editorial expression of concern.” The concern was whether the participants’ consent to Facebook’s general Data Use Policy was ethically meaningful, allowing users to opt out. From a learning perspective, the experiment was a success—it found that emotional contagion existed, but the effect on users was very small. The experiment wasn’t necessarily deceptive—the posts were real, and informing users in advance that they were part of an experiment would have biased the results. The controversy came from the fact that some users felt they were being harmfully manipulated by the company in the name of science, without concern for their own emotions or willingness to sign up as lab rats.

Clearly, the debate of what’s ethically meaningful is important for businesses. For Facebook, the experiment caused a huge backlash, and the company’s management eventually apologized. As a result, Facebook also implemented much stricter experimentation guidelines, including the review of research that goes beyond routine product testing by a large panel of experts in privacy and data security. But the ethics question of what should and should not be reviewed has to be carefully weighed against the opportunity cost. Too much internal scrutiny may slow experimentation to a trickle. Too little scrutiny may lead to another “emotional contagion”–like blowup.

That’s what happened to Amazon in 2000 when it ran experiments that charged different customers different prices for the same product (in this case, DVD titles). The tests created uncertainty for customers, and some even accused the online retailer of price discrimination based on demographics (which Amazon denied). Jeff Bezos admitted that the experiment was “a mistake” and instituted a policy that if Amazon ever tested differential pricing again, all buyers would pay the lowest prices, no matter what price was initially proposed to them.

Before conducting a test, stakeholders must agree that the experiment is worth doing. That needs to include the perception of its integrity— the goodness or badness of an experiment. If Amazon never intended to charge different prices for the same product, why invite the public’s wrath? The truth is that experimenters often face a higher standard. Here is why:

A company that compares a new idea (B, the challenger) to the status quo (A, the champion) to learn what does and does not work for customers will face greater scrutiny than a competitor who does not experiment at all. The bioethicist Michelle Meyer calls this dilemma the A/B illusion:

When a practice is implemented across the board, we tend to assume that it has value—that it “works”—even if it has never been compared to alternatives to see whether it works as well as those alternatives, or at all. Attempts to establish safety and efficacy through A/B or similar testing are then seen as depriving some people (those who receive B) of the standard practice. Those under the spell of the A/B illusion—as we all are at some time or another—view the salient moment of moral agency as the moment when an experiment to compare practices A and B was commenced, when it should more properly be recognized as the moment when practice A was unilaterally and uniformly implemented without evidence of its safety or effectiveness.

In other words, people tend to focus on the high-profile experiment in the foreground rather than the status quo in the background, regardless of how ineffective the current practice is. In an intriguing study, Meyer and her collaborators examined sixteen studies of 5,873 participants from three diverse populations in domains such as health care, vehicle design, and global poverty. Here is what they found: participants considered A/B tests morally more suspicious than the universal implementation of an untested practice on the entire population. This suspicion persisted even when there was no objective reason to prefer practice A over B.

Facebook could have simply changed its News Feed algorithm (or any other business practices) without subjecting it to an experiment at all. But that would be neither good management practice, nor more ethical. Perhaps Facebook simply fell victim to the A/B illusion and should have managed perceptions more proactively. When companies run experiments at massive scale and high velocity, decisions about an experiment’s integrity are usually made quickly, either by individuals or teams. That’s why some of the leading experimentation organizations include ethical guidelines (with case studies) as part of their standard employee training.

Reprinted by permission of Harvard Business Review Press. Adapted from EXPERIMENTATION WORKS: The Surprising Power of Business Experiments by Stefan H. Thomke. Copyright 2020 Stefan H. Thomke. All rights reserved.

______________________________________________________________________________

Stefan Thomke, an authority on the management of innovation, is the William Barclay Harding Professor of Business Administration at Harvard Business School. He has worked with global firms on product, process, and technology development, customer experience design, operational improvement, organizational change, and innovation strategy. He is also author of the books Experimentation Matters, Managing Product and Service Development, and his latest release, Experimentation Works: The Surprising Power of Business Experiments.