Introduction to Field Experiments and Randomized Controlled Trials

Have you ever been curious about the methods researchers employ to determine causal relationships among various factors, ultimately leading to significant breakthroughs and progress in numerous fields? In this article, we offer an overview of field experimentation and its importance in discerning cause and effect relationships. We outline how randomized experiments represent an unbiased method for determining what works. Furthermore, we discuss key aspects of experiments, such as intervention, excludability, and non-interference. To illustrate these concepts, we present a hypothetical example of a randomized controlled trial evaluating the efficacy of an experimental drug called Covi-Mapp.

Why experiments?

Every day, we find ourselves faced with questions of cause and effect. Understanding the driving forces behind outcomes is crucial, ranging from personal decisions like parenting strategies to organizational challenges such as effective advertising. This blog aims to provide a systematic introduction to experimentation, igniting enthusiasm for primary research and highlighting the myriad of experimental applications and opportunities available.

The challenge for those who seek to answer causal questions convincingly is to develop a research methodology that doesn't require identifying or measuring all potential confounders. Since no planned design can eliminate every possible systematic difference between treatment and control groups, random assignment emerges as a powerful tool for minimizing bias. In the contentious world of causal claims, randomized experiments represent an unbiased method for determining what works. Random assignment means participants are assigned to different groups or conditions in a study purely by chance. Basically, each participant has an equal chance to be assigned to a control group or a treatment group. 

Field experiments, or randomized studies conducted in real-world settings, can take many forms. While experiments on college campuses are often considered lab studies, certain experiments on campus – such as those examining club participation – may be regarded as field experiments, depending on the experimental design. Ultimately, whether a study is considered a field experiment hinges on the definition of "the field."

Researchers may employ two main scenarios for randomization. The first involves gathering study participants and randomizing them at the time of the experiment. The second capitalizes on naturally occurring randomizations, such as the Vietnam draft lottery. 

Intervention, Excludability, and Non-Interference

Three essential features of any experiment are intervention, excludability, and non-interference. In a general sense, the intervention refers to the treatment or action being tested in an experiment. The excludability principle is satisfied when the only difference between the experimental and control groups is the presence or absence of the intervention. The non-interference principle holds when the outcome of one participant in the study does not influence the outcomes of other participants. Together, these principles ensure that the experiment is designed to provide unbiased and reliable results, isolating the causal effect of the intervention under study.

Omitted Variables and Non-Compliance

To ensure unbiased results, researchers must randomize as much as possible to minimize omitted variable bias. Omitted variables are factors that influence the outcome but are not measured or are difficult to measure. These unmeasured attributes, sometimes called confounding variables or unobserved heterogeneity, must be accounted for to guarantee accurate findings.

Non-compliance can also complicate experiments. One-sided non-compliance occurs when individuals assigned to a treatment group don't receive the treatment (failure to treat), while two-sided non-compliance occurs when some subjects assigned to the treatment group go untreated or individuals assigned to the control group receive the treatment. Addressing these issues at the design level by implementing a blind or double-blind study can help mitigate potential biases.

Achieving Precision through Covariate Balance

To ensure the control and treatment groups are comparatively similar in all relevant aspects, particularly when the sample size (n) is small, it is essential to achieve covariate balance. Covariance measures the association between two variables, while a covariate is a factor that influences the outcome variable. By balancing covariates, we can more accurately isolate the effects of the treatment, leading to improved precision in our findings.

Fictional Example of Randomized Controlled Trial of Covi-Mapp for COVID-19 Management

Let's explore a fictional example to better understand experiments: a one-week randomized controlled trial of the experimental drug Covi-Mapp for managing Covid. In this case, the control group receives the standard care for Covid patients, while the treatment group receives the standard care plus Covi-Mapp. The outcome of interest is whether patients have cough symptoms on day 7, as subsidizing cough symptoms is an encouraging sign in Covid recovery. We'll measure the presence of cough on day 0 and day 7, as well as temperature on day 0 and day 7. Gender is also tracked. The control represents the standard care for COVID-19 patients, while the treatment includes standard care plus the experimental drug.

In this Covi-Mapp example, the intervention is the Covi-Mapp drug, the excludability principle is satisfied if the only difference in patient care between the groups is the drug administration, and the non-interference principle holds if one patient's outcome doesn't affect another's.

First, let's assume we have a dataset containing the relevant information for each patient, including cough status on day 0 and day 7, temperature on day 0 and day 7, treatment assignment, and gender. We'll read the data and explore the dataset:


d <- fread("../data/COVID_rct.csv")


"temperature_day0"  "cough_day0"        "treat_zmapp"       "temperature_day14" "cough_day14"       "male" 

Simple treatment effect of the experimental drug

Without any covariates, let's first look at the estimated effect of the treatment on the presence of cough on day 7. The estimated proportion of patients with a cough on day 7 for the control group (not receiving the experimental drug) is 0.847458. In other words, about 84.7% of patients in the control group are expected to have a cough on day 7, all else being equal. The estimated effect of the experimental drug on the presence of cough on day 7 is -0.23. This means that, on average, receiving the experimental drug reduces the proportion of patients with a cough on day 7 by 23.8% compared to the control group.

covid_1 <- d[ , lm(cough_day7 ~ treat_drug)]

coeftest(covid_1, vcovHC)

                 Estimate Std. Error t value Pr(>|t|)    

(Intercept)       0.847458   0.047616  17.798  < 2e-16 ***

treat_covid_mapp -0.237702   0.091459  -2.599  0.01079 *  

Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

We know that a patient's initial condition would affect the final outcome. If the patient has a cough and a fever on day 0, they might not fare well with the treatment. To better understand the treatment's effect, let's add these covariates:

covid_2 <- d[ , lm(cough_day7 ~ treat_drug +

                   cough_day0 + temperature_day0)]

coeftest(covid_2, vcovHC)

                  Estimate Std. Error t value Pr(>|t|)   

(Intercept)      -19.469655   7.607812 -2.5592 0.012054 * 

treat_covid_mapp  -0.165537   0.081976 -2.0193 0.046242 * 

cough_day0         0.064557   0.178032  0.3626 0.717689   

temperature_day0   0.205548   0.078060  2.6332 0.009859 **

Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

The output shows the results of a linear regression model, estimating the effect of the experimental drug (treat_covid_mapp) on the presence of cough on day 7, adjusting for cough on day 0 and temperature on day 0. The experimental drug significantly reduces the presence of cough on day 7 by approximately 16.6% compared to the control group (p-value = 0.046242). The presence of cough on day 0 does not significantly predict the presence of cough on day 7 (p-value = 0.717689). A one-unit increase in temperature on day 0 is associated with a 20.6% increase in the presence of cough on day 7, and this effect is statistically significant (p-value = 0.009859).

Should we add day 7 temperature as a covariate? By including it, we might find that the treatment is no longer statistically significant since the temperature on day 7 could be affected by the treatment itself. It is a post-treatment variable, and by including it, the experiment loses value as we used something that was affected by intervention as our covariate.

However, we'd like to investigate if the treatment affects men or women differently. Since we collected gender as part of the study, we could check for Heterogeneous Treatment Effect (HTE) for male vs. female. The experimental drug has a marginally significant effect on the outcome variable for females, reducing it by approximately 23.1% (p-value = 0.05391).

covid_4 <- d[ , lm(cough_day7 ~ treat_drug + treat_drug * male +

                   cough_day0 + temperature_day0)]

coeftest(covid_4, vcovHC)

t test of coefficients:

                  Estimate Std. Error  t value  Pr(>|t|)    

(Intercept)      48.712690  10.194000   4.7786 6.499e-06 ***

treat_zmapp      -0.230866   0.118272  -1.9520   0.05391 .  

male              3.085486   0.121773  25.3379 < 2.2e-16 ***

dehydrated_day0   0.041131   0.194539   0.2114   0.83301    

temperature_day0  0.504797   0.104511   4.8301 5.287e-06 ***

treat_zmapp:male -2.076686   0.198386 -10.4679 < 2.2e-16 ***

Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Which group, those coded as male == 0 or male == 1, have better health outcomes (cough) in control? What about in treatment? How does this help to contextualize any heterogeneous treatment effect that might have been estimated?

Stargazer is a popular R package that enables users to create well-formatted tables and reports for statistical analysis results.

covid_males <- d[male == 1, lm(temperature_day14 ~ treat_drug)]

covid_females <- d[male == 0, lm(temperature_day14 ~ treat_drug)]

stargazer(covid_males, covid_females,

          title = "",

          type = 'text',

          dep.var.caption = 'Outcome Variable:',

          dep.var.labels = c('Cough on Day 7'),

          se = list(





                                 Outcome Variable:             

                               Temperature on Day 14           

                              (1)                   (2)        

treat_covid_mapp           -2.591***              -0.323*      

                            (0.220)               (0.174)      

Constant                  101.692***             98.487***     

                            (0.153)               (0.102)      

Observations                  37                    63         

R2                           0.798                 0.057       

Adjusted R2                  0.793                 0.041       

Residual Std. Error     0.669 (df = 35)       0.646 (df = 61)  

F Statistic         138.636*** (df = 1; 35) 3.660* (df = 1; 61)


Note:                               *p<0.1; **p<0.05; ***p<0.01

Looking at this regression report, we see that males in control have a temperature of 102; females in control have a temperature of 98.6 (which is very nearly a normal temperature). So, in control, males are worse off. In treatment, males have a temperature of 102 - 2.59 = 99.41. While this is closer to a normal temperature, this is still elevated. Females in treatment have a temperature of 98.5 - .32 = 98.18, which is slightly lower than a normal temperature, and is better than an elevated temperature. It appears that the treatment is able to have a stronger effect among male participants than females because males are *more sick* at baseline.

In conclusion, experimentation offers a fascinating and valuable avenue for primary research, allowing us to address causal questions and enhance our understanding of the world around us. Covariate control helps to isolate the causal effect of the treatment on the outcome variable, ensuring that the observed effect is not driven by confounding factors. Proper control of covariates enhances the internal validity of the study and ensures that the estimated treatment effect is an accurate representation of the true causal relationship. By exploring and accounting for sub groups in data, researchers can identify whether the treatment has different effects on different groups, such as men and women or younger and older individuals. This information can be critical for making informed policy decisions and developing targeted interventions that maximize the benefits for specific groups. The ongoing investigation of experimental methodologies and their potential applications represents a compelling and significant area of inquiry. 

Field experiments, explained

A field experiment is a research method that uses some controlled elements of traditional lab experiments, but takes place in natural, real-world settings. This type of experiment can help scientists explore questions like: Why do people vote the way they do? Why do schools fail? Why are certain people hired less often or paid less money?

University of Chicago economists were early pioneers in the modern use of field experiments and conducted innovative research that impacts our everyday lives—from policymaking to marketing to farming and agriculture.  

Field experiments bridge the highly controlled lab environment and the messy real world. Social scientists have taken inspiration from traditional medical or physical science lab experiments. In a typical drug trial, for instance, participants are randomly assigned into two groups. The control group gets the placebo—a pill that has no effect. The treatment group will receive the new pill. The scientist can then compare the outcomes for each group.

A field experiment works similarly, just in the setting of real life.

It can be difficult to understand why a person chooses to buy one product over another or how effective a policy is when dozens of variables affect the choices we make each day. “That type of thinking, for centuries, caused economists to believe you can't do field experimentation in economics because the market is really messy,” said Prof. John List, a UChicago economist who has used field experiments to study everything from how people use  Uber and  Lyft to  how to close the achievement gap in Chicago-area schools . “There are a lot of things that are simultaneously moving.”

The key to cleaning up the mess is randomization —or assigning participants randomly to either the control group or the treatment group. “The beauty of randomization is that each group has the same amount of bad stuff, or noise or dirt,” List said. “That gets differenced out if you have large enough samples.”

Though lab experiments are still common in the social sciences, field experiments are now often used by psychologists, sociologists and political scientists. They’ve also become an essential tool in the economist’s toolbox.  

Some issues are too big and too complex to study in a lab or on paper—that’s where field experiments come in.

In a laboratory setting, a researcher wants to control as many variables as possible. These experiments are excellent for testing new medications or measuring brain functions, but they aren’t always great for answering complex questions about attitudes or behavior.

Labs are highly artificial with relatively small sample sizes—it’s difficult to know if results will still apply in the real world. Also, people are aware they are being observed in a lab, which can alter their behavior. This phenomenon, sometimes called the Hawthorne effect, can affect results.

Traditional economics often uses theories or existing data to analyze problems. But, when a researcher wants to study if a policy will be effective or not, field experiments are a useful way to look at how results may play out in real life.

In 2019, UChicago economist Michael Kremer (then at Harvard) was awarded the Nobel Prize alongside Abhijit Banerjee and Esther Duflo of MIT for their groundbreaking work using field experiments to help reduce poverty . In the 1990s and 2000s, Kremer conducted several randomized controlled trials in Kenyan schools testing potential interventions to improve student performance. 

In the 1990s, Kremer worked alongside an NGO to figure out if buying students new textbooks made a difference in academic performance. Half the schools got new textbooks; the other half didn’t. The results were unexpected—textbooks had no impact.

“Things we think are common sense, sometimes they turn out to be right, sometimes they turn out to be wrong,” said Kremer on an episode of  the Big Brains podcast. “And things that we thought would have minimal impact or no impact turn out to have a big impact.”

In the early 2000s, Kremer returned to Kenya to study a school-based deworming program. He and a colleague found that providing deworming pills to all students reduced absenteeism by more than 25%. After the study, the program was scaled nationwide by the Kenyan government. From there it was picked up by multiple Indian states—and then by the Indian national government.

“Experiments are a way to get at causal impact, but they’re also much more than that,” Kremer said in  his Nobel Prize lecture . “They give the researcher a richer sense of context, promote broader collaboration and address specific practical problems.”    

Among many other things, field experiments can be used to:

Study bias and discrimination

A 2004 study published by UChicago economists Marianne Bertrand and Sendhil Mullainathan (then at MIT) examined racial discrimination in the labor market. They sent over 5,000 resumes to real job ads in Chicago and Boston. The resumes were exactly the same in all ways but one—the name at the top. Half the resumes bore white-sounding names like Emily Walsh or Greg Baker. The other half sported African American names like Lakisha Washington or Jamal Jones. The study found that applications with white-sounding names were 50% more likely to receive a callback.

Examine voting behavior

Political scientist Harold Gosnell , PhD 1922, pioneered the use of field experiments to examine voting behavior while at UChicago in the 1920s and ‘30s. In his study “Getting out the vote,” Gosnell sorted 6,000 Chicagoans across 12 districts into groups. One group received voter registration info for the 1924 presidential election and the control group did not. Voter registration jumped substantially among those who received the informational notices. Not only did the study prove that get-out-the-vote mailings could have a substantial effect on voter turnout, but also that field experiments were an effective tool in political science.

Test ways to reduce crime and shape public policy

Researchers at UChicago’s  Crime Lab use field experiments to gather data on crime as well as policies and programs meant to reduce it. For example, Crime Lab director and economist Jens Ludwig co-authored a  2015 study on the effectiveness of the school mentoring program  Becoming a Man . Developed by the non-profit Youth Guidance, Becoming a Man focuses on guiding male students between 7th and 12th grade to help boost school engagement and reduce arrests. In two field experiments, the Crime Lab found that while students participated in the program, total arrests were reduced by 28–35%, violent-crime arrests went down by 45–50% and graduation rates increased by 12–19%.

The earliest field experiments took place—literally—in fields. Starting in the 1800s, European farmers began experimenting with fertilizers to see how they affected crop yields. In the 1920s, two statisticians, Jerzy Neyman and Ronald Fisher, were tasked with assisting with these agricultural experiments. They are credited with identifying randomization as a key element of the method—making sure each plot had the same chance of being treated as the next.

The earliest large-scale field experiments in the U.S. took place in the late 1960s to help evaluate various government programs. Typically, these experiments were used to test minor changes to things like electricity pricing or unemployment programs.

Though field experiments were used in some capacity throughout the 20th century, this method didn’t truly gain popularity in economics until the 2000s. Kremer and List were early pioneers and first began experimenting with the method in the 1990s.

In 2004, List co-authored  a seminal paper defining field experiments and arguing for the importance of the method. In 2008,  he and UChicago economist Steven Levitt published another study tracing the history of field experiments and their impact on economics.

In the past few decades, the use of field experiments has exploded. Today, economists often work alongside NGOs or nonprofit organizations to study the efficacy of programs or policies. They also partner with companies to test products and understand how people use services.  

There are several  ethical discussions happening among scholars as field experiments grow in popularity. Chief among them is the issue of informed consent. All studies that involve human test subjects must be approved by an institutional review board (IRB) to ensure that people are protected.

However, participants in field experiments often don’t know they are in an experiment. While an experiment may be given the stamp of approval in the research community, some argue that taking away peoples’ ability to opt out is inherently unethical. Others advocate for stricter review processes as field experiments continue to evolve.

According to List, another major issue in field experiments is the issue of scale . Many experiments only test small groups—say, dozens to hundreds of people. This may mean the results are not applicable to broader situations. For example, if a scientist runs an experiment at one school and finds their method works there, does that mean it will also work for an entire city? Or an entire country?

List believes that in addition to testing option A and option B, researchers need a third option that accounts for the limitations that come with a larger scale. “Option C is what I call critical scale features. I want you to bring in all of the warts, all of the constraints, whether they're regulatory constraints, or constraints by law,” List said. “Option C is like your reality test, or what I call policy-based evidence.”

This problem isn’t unique to field experiments, but List believes tackling the issue of scale is the next major frontier for a new generation of economists.

Get more with UChicago News delivered to your inbox.

Annual Review of Sociology

Volume 43, 2017, review article, field experiments across the social sciences.

  • Delia Baldassarri 1 , and Maria Abascal 2
  • View Affiliations Hide Affiliations Affiliations: 1 Department of Sociology, New York University, New York, New York 10012; email: [email protected] 2 Department of Sociology, Columbia University, New York, New York 10027; email: [email protected]
  • Vol. 43:41-73 (Volume publication date July 2017)
  • First published as a Review in Advance on May 22, 2017
Using field experiments, scholars can identify causal effects via randomization while studying people and groups in their naturally occurring contexts. In light of renewed interest in field experimental methods, this review covers a wide range of field experiments from across the social sciences, with an eye to those that adopt virtuous practices, including unobtrusive measurement, naturalistic interventions, attention to realistic outcomes and consequential behaviors, and application to diverse samples and settings. The review covers four broad research areas of substantive and policy interest: first, randomized controlled trials, with a focus on policy interventions in economic development, poverty reduction, and education; second, experiments on the role that norms, motivations, and incentives play in shaping behavior; third, experiments on political mobilization, social influence, and institutional effects; and fourth, experiments on prejudice and discrimination. We discuss methodological issues concerning generalizability and scalability as well as ethical issues related to field experimental methods. We conclude by arguing that field experiments are well equipped to advance the kind of middle-range theorizing that sociologists value.

Article metrics loading...

Full text loading...

National Academies Press: OpenBook

Implementing Randomized Field Trials in Education: Report of a Workshop (2004)

Chapter: 1 what is a randomized field trial.

Below is the uncorrected machine-read text of this chapter, intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text of each book. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

1 What Is a Randomized Field Trial? P eople behave in widely varying ways, due to many different causes, including their own individual volition (conscious choices). Social scientists often seek to understand whether or not a specific inter- vention may have an influence on human behavior or performance. For example, a researcher might want to examine the effect of a driver safety course on teenage automobile accidents or the effect of a new reading pro- gram on student achievement. But there are many forces that might cause a change in driving or reading skills, so how can the investigator be confident that it was the intervention that made the difference? An effective way to isolate the effect of a specific factor on human behavior and performance is to conduct a randomized field trial, which is a research method used to estimate the effect of an intervention on a particular outcome of interest. As a first step, investigators hypothesize that a particular intervention or "treatment" will cause a change in behavior. Then they seek to test the hypothesis by comparing the average outcome for individuals in the group who were randomly assigned to receive this intervention with the average outcome for individuals in the group who do not. This method helps social scientists to attribute changes in the outcome of interest (e.g., read- ing achievement) to the specific intervention (e.g., the reading program), rather than to the many other possible causes of human behavior and performance. 1

2 IMPLEMENTING RANDOMIZED FIELD TRIALS IN EDUCATION MAJOR FEATURES In this section, we sketch the defining features of randomized field trials. In particular, we focus on the two key concepts of randomization and control and then briefly situate randomized field trials within the broader context of establishing cause-and-effect relationships. A research design is randomized when individuals (or schools or other units of study) are put into an "experimental" group (which receives the intervention) or a "control"1 group (which does not) on the basis of a random process like the toss of a coin.2 The power of this random assign- ment is that, on average, the two groups that result are initially the same, differing only in terms of the intervention.3 This allows researchers to more confidently attribute differences they observe between the two groups to the intervention, rather than to the myriad other factors that influence human behavior and performance. As in any comparative study, research- ers must be careful to observe and account for any other confounding vari- ables that could differentially affect the groups after randomization has taken place. That is, even though randomization creates (statistically) equivalent groups at the outset, once the intervention is under way, other events or programs could take place in one group and not the other, under- mining any attempt to isolate the effect of the intervention. Randomized field trials are also controlled; that is, the investigator controls the process by which individuals (or other entities of study) are assigned to receive the intervention of interest. If the assignment of indi- viduals or entities is outside the investigator's control, then it is generally 1A control group is a comparison group in a randomized field trial that acts as a contrast to the group receiving the intervention of interest. In randomized field trials involving hu- mans, research participants in the control group typically either continue to receive existing services or receive a different intervention. 2Tossing a coin is a useful way of explaining the situation in which the participants have a 50-50 chance of being assigned to either of two groups: the experimental or the control group. Randomized field trials can have more than two groups; as long as the assignment process is conducted on the basis of a statistical process that has known probabilities (0.5 or otherwise), the groups will be balanced on observable and unobservable characteristics. 3It is logically possible that differences between the groups may still be due to idiosyn- cratic differences between individuals assigned to receive the intervention or to be part of the control group. However, with randomization, the chances of this occurring (a) can be explic- itly calculated and (b) can be made very small, typically by a straightforward manipulation like increasing the number of individuals assigned to each group.

WHAT IS A RANDOMIZED FIELD TRIAL? 3 much more difficult to attribute observed outcomes to the intervention being studied. For example, if teachers assigned some students to experi- ence a novel teaching method and some to a comparison group that did not experience it based on their judgment of which students should experience the method, then other factors (such as student aptitude) may confound or obscure the specific effect of the novel teaching method on student learn- ing outcomes.4 Thus, randomization and control are the foundation of a systematic and rigorous process that enables researchers estimating the effect of an intervention to be more confident in the internal validity of their results-- that is, that differences in outcomes can be attributed to the presence or absence of the intervention, rather than to some other factor. External va- lidity--the extent to which findings of effectiveness (or lack of effective- ness) hold in other times, places, and populations--can be established only when the intervention has been subjected to rigorous study across a variety of settings. The ultimate aim of randomized field trials is to help establish cause- and-effect relationships. They cannot, however, uncover all of the multiple causes that may affect human behavior. Instead, randomized field trials are designed to isolate the effect of one or more possible treatments that may or may not be the cause(s) of an observed behavioral outcome (such as an increase in student test scores) (Campbell, 1957). Furthermore, a single study--no matter how strong the design--is rarely sufficient to establish causation. Indeed, establishing a causal relationship is a matter of some complexity. In short, it requires that a coherent theory predict the specific relationship among the program, outcome, and context and that the re- sults from several studies in varying circumstances are consistent with that prediction. A few final clarifications about terminology are in order. Some observ- ers consider the term "randomized field trial" to be limited only to very 4In some cases, an investigator may conduct a randomized field trial when an interven- tion is allocated to individuals based on a random lottery. As discussed in Chapter 3, some school districts have used randomized lotteries to allocate school vouchers, in order to equita- bly distribute scarce resources when demand exceeds available funding for vouchers. In these cases, the investigator typically does not directly control the random assignment process, but as long as the process is truly random, the statistically equivalent groups that result isolate the relationship between group membership (treatment or control) and outcome from confound- ing influences and the essential features of a randomized field trial are retained.

4 IMPLEMENTING RANDOMIZED FIELD TRIALS IN EDUCATION large medical studies or studies conducted by pharmaceutical companies when testing the safety and efficacy of new drugs. Randomized designs, however, can be part of any research in any field aimed at estimating the effect of an intervention, regardless of the size of the study. In this report, we use the term "randomized field trial" to refer to studies that test the effectiveness of social interventions comparing experimental and control groups that have been created through random assignment. Although most of the workshop discussions focused on large-scale randomized field trials, the key elements for education research do not involve the size of the study, but the focus on questions of causation, use of randomization, and the construction of control groups that do not receive the intervention of inter- est. Indeed, even small "pilot" studies can use randomization and control groups to determine the feasibility of scaling an intervention. CURRENT DEBATES AND TRENDS At the workshop, University of Pennsylvania professor Robert Boruch described how randomized field trials have been used in a range of fields over time. Since World War II, he explained, randomized field trials have been used to test the effectiveness of the Salk polio vaccine and the antibi- otic streptomycin, and these designs are now considered the "gold stan- dard" for testing the effects of different interventions in many fields. Boruch went on to describe the growing use of randomized field trials to evaluate social programs since the 1970s (Boruch, de Moya, and Snyder, 2002) and noted that the World Bank, the government of the United Kingdom, the Campbell Collaboration, and the Rockefeller Foundation, all held confer- ences promoting the use of randomized field trials during 2002 and 2003. Trends in other fields notwithstanding, scholars of education have long debated the utility of this design in education research. Those who ques- tion its usefulness frequently argue that the model of causation that under- lies these designs is too simplistic to capture the complexity of teaching and learning in diverse educational settings (e.g., Cronbach et al., 1980; Bruner, 1996; Willinsky, 2001; Berliner, 2002). Others, in contrast, are enthusias- tic about using randomized field trials for addressing causal questions in education, emphasizing the unique ability of the design to isolate the im- pact of interventions on a specified outcome in an unbiased fashion (e.g., Cook and Payne, 2002; Mosteller and Boruch, 2002; Slavin, 2002). In the past five years, as calls for evidence-based education have be- come common, these debates have intensified and expanded beyond aca-

WHAT IS A RANDOMIZED FIELD TRIAL? 5 demic circles to include policy makers and practitioners. Most visibly, the No Child Left Behind Act, passed by Congress in 2001 and signed by the President in 2002, includes many references to "scientifically based" educa- tional programs and services. The law defines scientifically based research as including research that "is evaluated using experimental or quasi-experi- mental designs in which individuals, entities, programs or activities are as- signed to different conditions and with appropriate controls to evaluate the effects of the condition of interest, with a preference for random-assign- ment experiments, or other designs to the extent that those designs contain within-condition or across-condition controls." Furthermore, in its strategic plan for 2002-2007, the U.S. Department of Education has established as its chief goal to "create a culture of achieve- ment" by, among other steps, encouraging the use of "scientifically based methods in federal education programs." The strategic plan also aims to "transform education into an evidence-based field" (U.S. Department of Education, 2002, pp. 14-15). The Institute of Education Sciences, the De- partment of Educations' primary research arm, has established the What Works Clearinghouse to help reach these goals by identifying "interven- tions or approaches in education that have a demonstrated beneficial causal relationship to important student outcomes" (What Works Clearinghouse [2003], can be found at The expert technical advisory group guiding the clearinghouse has established quality standards to review available research on such critical education problems as improving early reading and reducing high school dropout rates. These standards place high priority on randomized field trials, which are seen as "among the most appropriate research designs for identifying the impact or effect of an educational program or practice" (What Works Clearinghouse [2003], can be found at They also acknowledge that there are circumstances in which they are not feasible, suggesting that quasi experiments (which are comparative studies that attempt to isolate the effect of an intervention by means other than random- ization) may be useful under such circumstances.5 5In a quasi-experimental study, researchers may compare naturally existing groups that appear similar except for the intervention being studied. In this research design, investigators often use statistical techniques to attempt to adjust for known confounding variables that are associated with both the intervention and the outcome of interest, thus invoking additional assumptions about the causal effects of the intervention. While these statistical techniques can address known differences between study groups, they may inadequately adjust unknown

6 IMPLEMENTING RANDOMIZED FIELD TRIALS IN EDUCATION The National Research Council report Scientific Research in Education (National Research Council, 2002) was designed to help clarify the nature of scientific inquiry in education in this rapidly changing policy context. That report links design to the research question and, for addressing causal questions (i.e., "what works") about specified outcomes, highlights ran- domized field trials as the most appropriate research designs when they are feasible and ethical. This report summarizes a workshop in which partici- pants addressed the question: When randomized field trials are conducted in social settings like schools and school districts, how can they be imple- mented and what procedures should be used in implementation? confounding variables. The major drawback of quasi-experimental designs is the possibility that the groups are systematically different (a problem known as "selection bias"), and thus investigators may be less confident about conclusions reached using these methods (National Research Council, 2002, p. 113). In contrast, randomization theoretically creates groups that are not systematically influenced by both known and unknown confounding variables.

The central idea of evidence-based education-that education policy and practice ought to be fashioned based on what is known from rigorous research-offers a compelling way to approach reform efforts. Recent federal trends reflect a growing enthusiasm for such change. Most visibly, the 2002 No Child Left Behind Act requires that "scientifically based [education] research" drive the use of federal education funds at the state and local levels. This emphasis is also reflected in a number of government and nongovernment initiatives across the country. As consensus builds around the goals of evidence-based education, consideration of what it will take to make it a reality becomes the crucial next step. In this context, the Center for Education of the National Research Council (NRC) has undertaken a series of activities to address issues related to the quality of scientific education research. In 2002, the NRC released Scientific Research in Education (National Research Council, 2002), a report designed to articulate the nature of scientific education research and to guide efforts aimed at improving its quality. Building on this work, the Committee on Research in Education was convened to advance an improved understanding of a scientific approach to addressing education problems; to engage the field of education research in action-oriented dialogue about how to further the accumulation of scientific knowledge; and to coordinate, support, and promote cross-fertilization among NRC efforts in education research. The main locus of activity undertaken to meet these objectives was a year-long series of workshops. This report is a summary of the third workshop in the series, on the implementation and implications of randomized field trials in education.


A Refresher on Randomized Controlled Experiments

by Amy Gallo

In order to make smart decisions at work, we need data. Where that data comes from and how we analyze it depends on a lot of factors — for example, what we’re trying to do with the results, how accurate we need the findings to be, and how much of a budget we have. There is a spectrum of experiments that managers can do from quick, informal ones, to pilot studies, to field experiments, and to lab research. One of the more structured experiments is the randomized controlled experiment­ .

Handbook of Field Experiments

The last 15 years have seen an explosion in the number, scope, quality, and creativity of field experiments. To take stock of this remarkable progress, we were invited to edit a Handbook of Field Experiments , published at Elsevier. We were fortunate to assemble a volume made of wonderful papers by the best experts in the field. Some chapters are more methodological, while others are focused on results. All of them provide thoughtful reflections on the advances and issues in the field, useful research tips and insights into what the next steps need to be, all of which should be very useful for graduate students. Taken together, these papers offer an incredibly rich overview of the state of literature. This page collects together all the working paper versions of the chapters, and will also link to the final versions as they become available. We hope you enjoy it.

—Abhijit Banerjee and Esther Duflo


An Introduction to the "Handbook of Field Experiments" Abhijit Banerjee and Esther Duflo

Many (though by no means all) of the questions that economists and policymakers ask themselves are causal in nature: What would be the impact of adding computers in classrooms? What is the price elasticity of demand for preventive health products? Would increasing interest rates lead to an increase in default rates? Decades ago, the statistician Fisher (Fisher, 1925) proposed a method to answer such causal questions: Randomized Controlled Trials (RCTs) . In an RCT, the assignment of different units to different treatment groups is chosen randomly. This ensures that no unobservable characteristics of the units are reflected in the assignment, and hence that any difference between treatment and control units reflects the impact of the treatment. While the idea is simple, the implementation in the field can be more involved, and it took some time before randomization was considered to be a practical tool for answering questions in economics.

Some Historical Background

The Politics and Practice of Social Experiments: Seeds of a Revolution Judy Gueron

Between 1970 and the early 2000s, there was a revolution in support for the use of randomized experiments to evaluate social programs. Focusing on the welfare reform studies that helped to speed that transformation in the United States, this chapter describes the major challenges to randomized controlled trials (RCTs), how they emerged and were overcome, and how initial conclusions about conditions necessary to success — strong financial incentives, tight operational control, and small scale — proved to be wrong. The final section discusses lessons from this experience for other fields.

Methodology and Practice of RCTs

The Econometrics of Randomized Experiments Susan Athey and  Guido Imbens

Randomized experiments have a long tradition in agricultural and biomedical settings. In economics they have a much shorter history. Although there have been notable experiments over the years, such as the RAND health care experiment (Manning, Newhouse, Duan, Keeler and Leibowitz, 1987, see the general discussion in Rothstein and von Wachter, 2016) and the Negative Income Tax experiments (e.g., Robins, 1985), it is only recently that there has been a large number of randomized experiments in economics, and development economics in particular. See Duflo, Glennerster, and Kremer (2006) for a survey.  In this chapter we discuss some of the statistical methods that are important for the analysis and design of randomized experiments. A major theme of the chapter is the focus on statistical methods directly justified by randomization, in the spirit of Freedman who wrote “Experiments should be analyzed as experiments, not as observational studies. A simple comparison of rates might be just the right tool, with little value added by ‘sophisticated’ models,” (Freedman, 2006, p. 691) We draw from a variety of literatures. This includes the statistical literature on the analysis and design of experiments, e.g., Wu and Hamada (2009), Cox and Reid (2000), Altman (1991), Cook and DeMets (2008), Kempthorne (1952, 1955), Cochran and Cox (1957), Davies (1954), and Hinkelman and Kempthorne (2005, 2008). We also draw on the literature on causal inference, both in experimental and observational settings, Rosenbaum (1995, 2002, 2009), Rubin (2006), Cox (1992), Morgan and Winship (2007), Morton Williams (2010) and Lee (2005), and Imbens and Rubin (2015). In the economics literature we build on recent guides to practice in randomized experiments in development economics, e.g., Duflo, Glennerster, and Kremer (2006), Glennerster (2016), and Glennerster and Takavarasha (2013) as well as the general empirical micro literature (Angrist and Pischke, 2008).

Decision Theoretic Approaches to Experiment Design and External Validity Abhijit Banerjee, Sylvain Chassang,  and Erik Snowberg

A modern, decision-theoretic framework can help clarify important practical questions of experimental design. Building on our recent work, this chapter begins by summarizing our framework for understanding the goals of experimenters, and applying this to re-randomization.  We then use this framework to shed light on questions related to experimental registries, pre-analysis plans, and most importantly, external validity. Our framework implies that even when large samples can be collected, external decisionmaking remains inherently subjective. We embrace this conclusion, and argue that in order to improve external validity, experimental research needs to create a space for structured speculation.

The Practicalities of Running Randomized Evaluations: Partnerships, Measurement, Ethics, and Transparency Rachel Glennerster

Economists have known for a long time that randomization could help identify causal connections by solving the problem of selection bias. Chapter 1 in this book and Gueron and Rolston (2013) describe the effort in the US to move experiments out of the laboratory into the policy world in the 1960s and 1970s.  This experience was critical in proving the feasibility of field experiments, working through some of the important ethical questions involved, showing how researchers and practitioners could work together, and demonstrating that the results of field experiments were often very different from those generated by observational studies. Interestingly, there was relatively limited academic support for this first wave of field experiments (Gueron and Rolston 2013), most of which were carried out by research groups such as MDRC, Abt, and Mathematica, to evaluate US government programs, and they primarily used individual-level randomization. In contrast, a more recent wave of field experiments starting in the mid-1990s was driven by academics, initially was focused on developing countries, often worked with nongovernmental organizations, and frequently used clustered designs.

The Psychology of Construal in the Design of Field Experiments Elizabeth Levy Paluck and Eldar Shafir

Why might you be interested in this chapter? A fair assumption is that you are reading because you care about good experimental design. To create strong experimental designs that test people’s responses to an intervention, researchers typically consider the classically recognized motivations presumed to drive human behavior.  It does not take extensive psychological training to recognize that several types of motivations could affect an individual’s engagement with and honesty during your experimental paradigm. Such motivations include strategic self-presentation, suspicion, lack of trust, level of education or mastery, and simple utilitarian motives such as least effort and optimization. For example, minimizing the extent to which your findings are attributable to high levels of suspicion among participants, or to their decision to do the least amount possible, is important for increasing the generalizability and reliability of your results.

Understanding Preferences and Preference Change

Field Experiments in Markets Omar Al-Ubaydli and  John List

This is a review of the literature of field experimental studies of markets. The main results covered by the review are as follows: (1) Generally speaking, markets organize the efficient exchange of commodities; (2) There are some behavioral anomalies that impede efficient exchange; (3) Many behavioral anomalies disappear when traders are experienced.

Field Experiments on Discrimination Marianne Bertrand and Esther Duflo

This article reviews the existing field experimentation literature on the prevalence of discrimination, the consequences of such discrimination, and possible approaches to undermine it. We highlight key gaps in the literature and ripe opportunities for future field work.  Section 1 reviews the various experimental methods that have been employed to measure the prevalence of discrimination, most notably audit and correspondence studies; it also describes several other measurement tools commonly used in lab-based work that deserve greater consideration in field research. Section 2 provides an overview of the literature on the costs of being stereotyped or discriminated against, with a focus on self-expectancy effects and self-fulfilling prophecies; section 2 also discusses the thin field-based literature on the consequences of limited diversity in organizations and groups. The final section of the paper, Section 3, reviews the evidence for policies and interventions aimed at weakening discrimination, covering role model and intergroup contact effects, as well as socio-cognitive and technological de-biasing strategies.

Field Experiments on Voter Mobilization: An Overview of a Burgeoning Literature Alan Gerber and Donald Green

In recent years the focus of empirical work in political science has begun to shift from description to an increasing emphasis on the credible estimation of causal effects. A key feature of this change has been the increasing prominence of experimental methods, and especially field experiments. In this chapter we review the use of field experiments to study political participation.  Although several important experiments address political phenomena other than voter participation (Bergan 2009; Butler and Broockman 2015; Butler and Nickerson 2011; Broockman 2013, 2014; Grose 2014), the literature measuring the effect of various interventions on voter turnout is the largest and most fully developed, and it provides a good illustration of how the use of field experiments in political science has proceeded. From an initial focus on the relative effects of different modes of communication, scholars began to explore how theoretical insights from social psychology and behavioral economics might be used to craft messages and how voter mobilization experiments could be employed to test the real world effects of theoretical claims. The existence of a large number of experimental turnout studies was essential, because it provided the background against which unusual and important results could be easily discerned.

Lab in the Field: Measuring Preferences in the Wild Uri Gneezy and Alex Imas

In this chapter, we discuss the “lab-in-the-field” methodology, which combines elements of both lab and field experiments in using standardized, validated paradigms from the lab in targeting relevant populations in naturalistic settings. We begin by examining how the methodology has been used to test economic models with populations of theoretical interest. Next, we outline how lab-in-the-field studies can be used to complement traditional Randomized Control Trials in collecting covariates to test theoretical predictions and explore behavioral mechanisms. We proceed to discuss how the methodology can be utilized to compare behavior across cultures and contexts, and test for the external validity of results obtained in the lab. The chapter concludes with an overview of lessons on how to use the methodology effectively.

Field Experiments in Marketing Duncan Simester

Marketing is a diverse field that draws from a rich array of disciplines and a broad assortment of empirical and theoretical methods. One of those disciplines is economics and one of the methods used to investigate economic questions is field experiments. The history of field experiments in the marketing literature is surprisingly long. Early examples include Curhan (1974) and Eskin and Baron (1977), who vary prices, newspaper advertising, and display variables in grocery stores.  This chapter reviews the recent history of field experiments in marketing by identifying papers published in the last 20 years (between 1995 and 2014). We report how the number of papers published has increased during this period, and evaluate different explanations for this increase. We then group the papers into five topics and review the papers by topic. The chapter concludes by reflecting on the design of field experiments used in marketing, and proposing topics for future research.

The Challenge of Improving Human Capital

Impacts and Determinants of Health Levels in Low-Income Countries Pascaline Dupas and Ted Miguel

Improved health in low-income countries could considerably improve wellbeing and possibly promote economic growth. The last decade has seen a surge in field experiments designed to understand the barriers that households and governments face in investing in health and how these barriers can be overcome, and to assess the impacts of subsequent health gains. This chapter first discusses the methodological pitfalls that field experiments in the health sector are particularly susceptible to, then reviews the evidence that rigorous field experiments have generated so far.  While the link from in utero and child health to later outcomes has increasingly been established, few experiments have estimated the impacts of health on contemporaneous productivity among adults, and few experiments have explored the potential for infrastructural programs to impact health outcomes. Many more studies have examined the determinants of individual health behavior, on the side of consumers as well as among providers of health products and services.

The Production of Human Capital in Developed Countries: Evidence from 196 Randomized Field Experiments Roland Fryer

Randomized field experiments designed to better understand the production of human capital have increased exponentially over the past several decades. This chapter summarizes what we have learned about various partial derivatives of the human capital production function, what important partial derivatives are left to be estimated, and what – together – our collective efforts have taught us about how to produce human capital in developed countries. The chapter concludes with a back of the envelope simulation of how much of the racial wage gap in America might be accounted for if human capital policy focused on best practices gleaned from randomized field experiments.

Field Experiments in Education in Developing Countries Karthik Muralidharan Perhaps no field in development economics in the past decade has benefited as much from the use of experimental methods as the economics of education. The rapid growth in high‐quality studies on education in developing countries (many of which use randomized experiments) is perhaps best highlighted by noting that there have been  several  systematic reviews of this evidence aiming to synthesize findings for research and policy in  just the past three years .   These include Muralidharan 2013 (focused on India), Glewwe et al. 2014 (focused on school inputs), Kremer et al. 2013, Krishnaratne et al. 2013, Conn 2014 (focused on sub‐Saharan Africa), McEwan 2014, Ganimian and Murnane (2016), Evans and Popova (2015), and Glewwe and Muralidharan (2016). While these are not all restricted to experimental studies, they typically provide greater weight to evidence from randomized controlled trials (RCT's).

Designing Effective Social Programs

Social Policy: Mechanism Experiments and Policy Evaluations Bill Congdon,  Jeffrey Kling, Jens Ludwig, and Sendhil Mullainathan

Policymakers and researchers are increasingly interested in using experimental methods to inform the design of social policy. The most common approach, at least in developed countries, is to carry out large-scale randomized trials of the policies of interest, or what we call here policy evaluations. In this chapter we argue that in some circumstances the best way to generate information about the policy of interest may be to test an intervention that is different from the policy being considered, but which can shed light on one or more key mechanisms through which that policy may operate.  What we call mechanism experiments can help address the key external validity challenge that confronts all policy-oriented work in two ways. First, mechanism experiments sometimes generate more policy-relevant information per dollar of research funding than can policy evaluations, which in turn makes it more feasible to test how interventions work in different contexts. Second, mechanism experiments can also help improve our ability to forecast effects by learning more about the way in which local context moderates policy effects, or expand the set of policies for which we can forecast effects. We discuss how mechanism experiments and policy evaluations can complement one another, and provide examples from a range of social policy areas including health insurance, education, labor market policy, savings and retirement, housing, criminal justice, redistribution, and tax policy. Examples focus on the U.S. context.

Field Experiments in Developing Country Agriculture Alain de Janvry, Elisabeth Sadoulet, and Tavneet Suri

This chapter provides a review of the role of field experiments in answering research questions in agriculture that ultimately let us better understand how policy can improve productivity and farmer welfare in developing economies. We first review recent field experiments in this area, highlighting the contributions experiments have already made to this area of research. We then outline areas where experiments can further fill existing gaps in our knowledge on agriculture and how future experiments can address the specific complexities in agriculture.

The Personnel Economics of the State Frederico Finan, Ben Olken, and Rohini Pande

Governments play a central role in facilitating economic development. Yet while economists have long emphasized the importance of government quality, historically they have paid less attention to the internal workings of the state and the individuals who provide the public services. This chapter reviews a nascent but growing body of field experiments that explores the personnel economics of the state.  To place the experimental findings in context, we begin by documenting some stylized facts about how public sector employment differs from that in the private sector. In particular, we show that in most countries throughout the world, public sector employees enjoy a significant wage premium over their private sector counterparts. Moreover, this wage gap is largest among low-income countries, which tends to be precisely where governance issues are most severe. These differences in pay, together with significant information asymmetries within government organizations in low-income countries, provide a prima facie rationale for the emphasis of the recent field experiments on three aspects of the state–employee relationship: selection, incentive structures, and monitoring. We review the findings on all three dimensions and then conclude this survey with directions for future research.

Designing Social Protection Programs: Using Theory and Experimentation to Understand how to Help Combat Poverty Rema Hanna and Dean Karlan

“Anti-poverty” programs come in many varieties, ranging from multi-faceted, complex programs to more simple cash transfers. Articulating and understanding the root problem motivating government and nongovernmental organization intervention is critical for choosing amongst many anti-poverty policies, or combinations thereof. Policies should differ depending on whether the underlying problem is about uninsured shocks, liquidity constraints, information failures, or some combination of all of the above.  Experimental designs and thoughtful data collection can help diagnose the root problems better, thus providing better predictions for what anti-poverty programs to employ in specific conditions and contexts. However, the more complex theories are likewise more challenging to test, requiring larger samples, and often more nuanced experimental designs, as well as detailed data on many aspects of household and community behavior and outcomes. We provide guidance on these design and testing issues for social protection programs, from how to target programs, to who should implement the program, to whether and what conditions to require for program participation. In short, careful experimentation designed testing can help provide a stronger conceptual understanding of why programs do or not work, thereby allowing one to ultimately make stronger policy prescriptions that further the goal of poverty reduction.

Social Experiments in the Labor Market Jesse Rothstein and  Till von Wachter

Power to the People: Evidence from a Randomized Field Experiment on Community-Based Monitoring in Uganda

Martina Björkman, Jakob Svensson, Power to the People: Evidence from a Randomized Field Experiment on Community-Based Monitoring in Uganda, The Quarterly Journal of Economics , Volume 124, Issue 2, May 2009, Pages 735–769,

This paper presents a randomized field experiment on community-based monitoring of public primary health care providers in Uganda. Through two rounds of village meetings, localized nongovernmental organizations encouraged communities to be more involved with the state of health service provision and strengthened their capacity to hold their local health providers to account for performance. A year after the intervention, treatment communities are more involved in monitoring the provider, and the health workers appear to exert higher effort to serve the community. We document large increases in utilization and improved health outcomes—reduced child mortality and increased child weight—that compare favorably to some of the more successful community-based intervention trials reported in the medical literature.

