Have a language expert improve your writing

Run a free plagiarism check in 10 minutes, generate accurate citations for free.

  • Knowledge Base


  • Quasi-Experimental Design | Definition, Types & Examples

Quasi-Experimental Design | Definition, Types & Examples

Published on July 31, 2020 by Lauren Thomas . Revised on January 22, 2024.

Like a true experiment , a quasi-experimental design aims to establish a cause-and-effect relationship between an independent and dependent variable .

However, unlike a true experiment, a quasi-experiment does not rely on random assignment . Instead, subjects are assigned to groups based on non-random criteria.

Quasi-experimental design is a useful tool in situations where true experiments cannot be used for ethical or practical reasons.

Quasi-experimental design vs. experimental design

Table of contents

Differences between quasi-experiments and true experiments, types of quasi-experimental designs, when to use quasi-experimental design, advantages and disadvantages, other interesting articles, frequently asked questions about quasi-experimental designs.

There are several common differences between true and quasi-experimental designs.

True experimental design Quasi-experimental design
Assignment to treatment The researcher subjects to control and treatment groups. Some other, method is used to assign subjects to groups.
Control over treatment The researcher usually . The researcher often , but instead studies pre-existing groups that received different treatments after the fact.
Use of Requires the use of . Control groups are not required (although they are commonly used).

Example of a true experiment vs a quasi-experiment

However, for ethical reasons, the directors of the mental health clinic may not give you permission to randomly assign their patients to treatments. In this case, you cannot run a true experiment.

Instead, you can use a quasi-experimental design.

You can use these pre-existing groups to study the symptom progression of the patients treated with the new therapy versus those receiving the standard course of treatment.

Receive feedback on language, structure, and formatting

Professional editors proofread and edit your paper by focusing on:

  • Academic style
  • Vague sentences
  • Style consistency

See an example

quasi experimental longitudinal study

Many types of quasi-experimental designs exist. Here we explain three of the most common types: nonequivalent groups design, regression discontinuity, and natural experiments.

Nonequivalent groups design

In nonequivalent group design, the researcher chooses existing groups that appear similar, but where only one of the groups experiences the treatment.

In a true experiment with random assignment , the control and treatment groups are considered equivalent in every way other than the treatment. But in a quasi-experiment where the groups are not random, they may differ in other ways—they are nonequivalent groups .

When using this kind of design, researchers try to account for any confounding variables by controlling for them in their analysis or by choosing groups that are as similar as possible.

This is the most common type of quasi-experimental design.

Regression discontinuity

Many potential treatments that researchers wish to study are designed around an essentially arbitrary cutoff, where those above the threshold receive the treatment and those below it do not.

Near this threshold, the differences between the two groups are often so minimal as to be nearly nonexistent. Therefore, researchers can use individuals just below the threshold as a control group and those just above as a treatment group.

However, since the exact cutoff score is arbitrary, the students near the threshold—those who just barely pass the exam and those who fail by a very small margin—tend to be very similar, with the small differences in their scores mostly due to random chance. You can therefore conclude that any outcome differences must come from the school they attended.

Natural experiments

In both laboratory and field experiments, researchers normally control which group the subjects are assigned to. In a natural experiment, an external event or situation (“nature”) results in the random or random-like assignment of subjects to the treatment group.

Even though some use random assignments, natural experiments are not considered to be true experiments because they are observational in nature.

Although the researchers have no control over the independent variable , they can exploit this event after the fact to study the effect of the treatment.

However, as they could not afford to cover everyone who they deemed eligible for the program, they instead allocated spots in the program based on a random lottery.

Although true experiments have higher internal validity , you might choose to use a quasi-experimental design for ethical or practical reasons.

Sometimes it would be unethical to provide or withhold a treatment on a random basis, so a true experiment is not feasible. In this case, a quasi-experiment can allow you to study the same causal relationship without the ethical issues.

The Oregon Health Study is a good example. It would be unethical to randomly provide some people with health insurance but purposely prevent others from receiving it solely for the purposes of research.

However, since the Oregon government faced financial constraints and decided to provide health insurance via lottery, studying this event after the fact is a much more ethical approach to studying the same problem.

True experimental design may be infeasible to implement or simply too expensive, particularly for researchers without access to large funding streams.

At other times, too much work is involved in recruiting and properly designing an experimental intervention for an adequate number of subjects to justify a true experiment.

In either case, quasi-experimental designs allow you to study the question by taking advantage of data that has previously been paid for or collected by others (often the government).

Quasi-experimental designs have various pros and cons compared to other types of studies.

  • Higher external validity than most true experiments, because they often involve real-world interventions instead of artificial laboratory settings.
  • Higher internal validity than other non-experimental types of research, because they allow you to better control for confounding variables than other types of studies do.
  • Lower internal validity than true experiments—without randomization, it can be difficult to verify that all confounding variables have been accounted for.
  • The use of retrospective data that has already been collected for other purposes can be inaccurate, incomplete or difficult to access.

If you want to know more about statistics , methodology , or research bias , make sure to check out some of our other articles with explanations and examples.

  • Normal distribution
  • Degrees of freedom
  • Null hypothesis
  • Discourse analysis
  • Control groups
  • Mixed methods research
  • Non-probability sampling
  • Quantitative research
  • Ecological validity

Research bias

  • Rosenthal effect
  • Implicit bias
  • Cognitive bias
  • Selection bias
  • Negativity bias
  • Status quo bias

A quasi-experiment is a type of research design that attempts to establish a cause-and-effect relationship. The main difference with a true experiment is that the groups are not randomly assigned.

In experimental research, random assignment is a way of placing participants from your sample into different groups using randomization. With this method, every member of the sample has a known or equal chance of being placed in a control group or an experimental group.

Quasi-experimental design is most useful in situations where it would be unethical or impractical to run a true experiment .

Quasi-experiments have lower internal validity than true experiments, but they often have higher external validity  as they can use real-world interventions instead of artificial laboratory settings.

Cite this Scribbr article

If you want to cite this source, you can copy and paste the citation or click the “Cite this Scribbr article” button to automatically add the citation to our free Citation Generator.

Thomas, L. (2024, January 22). Quasi-Experimental Design | Definition, Types & Examples. Scribbr. Retrieved September 23, 2024, from https://www.scribbr.com/methodology/quasi-experimental-design/

Is this article helpful?

Lauren Thomas

Lauren Thomas

Other students also liked, guide to experimental design | overview, steps, & examples, random assignment in experiments | introduction & examples, control variables | what are they & why do they matter, get unlimited documents corrected.

✔ Free APA citation check included ✔ Unlimited document corrections ✔ Specialized in correcting academic texts

  • - Google Chrome

Intended for healthcare professionals

  • My email alerts
  • BMA member login
  • Username * Password * Forgot your log in details? Need to activate BMA Member Log In Log in via OpenAthens Log in via your institution


Search form

  • Advanced search
  • Search responses
  • Search blogs
  • Regression based quasi...

Regression based quasi-experimental approach when randomisation is not an option: interrupted time series analysis

  • Related content
  • Peer review
  • Evangelos Kontopantelis , senior research fellow in biostatistics and health services research 1 2 ,
  • Tim Doran , professor of public health 3 ,
  • David A Springate , research fellow in health informatics 2 4 ,
  • Iain Buchan , professor of health informatics 1 ,
  • David Reeves , reader in statistics 2 4
  • 1 Centre for Health Informatics, Institute of Population Health, University of Manchester, Manchester M13 9GB, UK
  • 2 NIHR School for Primary Care Research, Centre for Primary Care, Institute of Population Health, University of Manchester, UK
  • 3 Department of Health Sciences, University of York, UK
  • 4 Centre for Biostatistics, Institute of Population Health, University of Manchester
  • Correspondence to: E Kontopantelis e.kontopantelis{at}manchester.ac.uk
  • Accepted 10 March 2015

Interrupted time series analysis is a quasi-experimental design that can evaluate an intervention effect, using longitudinal data. The advantages, disadvantages, and underlying assumptions of various modelling approaches are discussed using published examples

Summary points

Interrupted time series analysis is arguably the “next best” approach for dealing with interventions when randomisation is not possible or clinical trial data are not available

Although several assumptions need to be satisfied first, this quasi-experimental design can be useful in providing answers about population level interventions and effects

However, their implementation can be challenging, particularly for non-statisticians


Randomised controlled trials (RCTs) are considered the ideal approach for assessing the effectiveness of interventions. However, not all interventions can be assessed with an RCT, whereas for many interventions trials can be prohibitively expensive. In addition, even well designed RCTs can be susceptible to systematic errors leading to biased estimates, particularly when generalising results to “real world” settings. For example, the external validity of clinical trials in diabetes seems to be poor; the proportion of the Scottish population that met eligibility criteria for seven major clinical trials ranged from 3.5% to 50.7%. 1 One of the greatest concerns is patients with multimorbidity, who are commonly excluded from RCTs. 2

Observational studies can address some of these shortcomings, but the lack of researcher control over confounding variables and the difficulty in establishing causation mean that conclusions from studies using observational approaches are generally considered to be weaker. However, with quasi-experimental study designs researchers are able to estimate causal effects using observational approaches. Interrupted time series (ITS) analysis is a useful quasi-experimental design with which to evaluate the longitudinal effects of interventions, through regression modelling. 3 The term quasi-experimental refers to an absence of randomisation, and ITS analysis is principally a tool for analysing observational data where full randomisation, or a case-control design, is not affordable or possible. Its main advantage over alternative approaches is that it can make full use of the longitudinal nature of the data and account for pre-intervention trends (fig 1 ⇓ ). This design is particularly useful when “natural experiments” in real word settings occur—for example, when a health policy change comes into effect. However, it is not appropriate when trends are not (or cannot be transformed to be) linear, the intervention is introduced gradually or at more than one time point, there are external time varying effects or autocorrelation (for example, seasonality), or the characteristics of the population change over time—although all these can be potentially dealt with through modelling if the relevant information is known.

Fig 1 Interrupted time series analysis components in relation to the Quality and Outcomes Framework intervention

  • Download figure
  • Open in new tab
  • Download powerpoint

Variations on this design are also known as segmented regression or regression discontinuity analysis and have been described elsewhere, 4 but we will focus on longitudinal data and practical modelling. ITS encompasses a wide range of modelling approaches and we describe the steps required to perform simple or more advanced analyses, using previously published analyses from our research group as examples.

The question

We demonstrate a range of ITS models using the “natural experiment” of the introduction of the Quality and Outcomes Framework (QOF) pay for performance scheme in UK primary care. The QOF was introduced in the 2004-05 financial year by the UK government to reward general practices for achieving clinical targets across a range of chronic conditions, as well as other more generic non-clinical targets. This large scale intervention was introduced nationally, without previous assessment in an experimental setting. Because of the great financial rewards it offered, it was adopted almost universally by general practitioners, despite its voluntary nature.

A fundamental research question concerned the effect of this national intervention on quality of care, as measured by the evidence based clinical indicators included in the incentivisation scheme. In operational form, did performance on the incentivised activities improve by the third year of the scheme (2006-07), compared with two years before its introduction (2002-03)? For our analyses we considered the year immediately before the scheme’s introduction (2003-04) to be a preparatory year, as information about the proposed targets was available to practices and this might have affected performance. A basic pre-post analysis would involve an unadjusted or adjusted comparison of mean levels of quality of care across the two comparator years—for example, with a t test or a linear regression controlling for covariates. However, such analyses would fail to account for any trends in performance before the intervention—that is, changes in levels of care from 2000-01 to 2002-03. Importantly, in the context of the QOF, previous performance trends cannot be assumed to be negligible, since quality standards for certain chronic conditions included in the scheme (for example, diabetes) were published in 2001 or earlier. This is where the strength of the ITS approach lies; to evaluate the effect of the intervention accounting for the all important pre-intervention trends (table ⇓ ).

Introduction of the Quality and Outcomes Framework, summary of examples

  • View inline

We describe the processes, assumptions, and limitations across four ITS modelling approaches, starting with the simplest and concluding with the most complex. Code scripts in Stata are provided for all examples (web appendices 1-4).

In its simplest form, an ITS is modelled using a regression model (such as linear, logistic, or Poisson) that includes only three time based covariates, whose regression coefficients estimate the pre-intervention slope, the change in level at the intervention point, and the change in slope from pre-intervention to post-intervention. The pre-intervention slope quantifies the trend for the outcome before the intervention. The level change is an estimate of the change in level that can be attributed to the intervention, between the time points immediately before and immediately after the intervention, and accounting for the pre-intervention trend. The change in slope quantifies the difference between the pre-intervention and post-intervention slopes (fig 1 ⇑ ). The key assumption we have to make is that without the intervention we set out to quantify, the pre-intervention trend would continue unchanged into the post-intervention period and there are no external factors systematically affecting the trends (that is, other “interventions”).

We collected performance data on asthma, diabetes, and coronary heart disease from 42 general practices for four time points: 1998 and 2003 (pre-intervention) and 2005 and 2007 (post-intervention). This was the setup for the 2009 analysis of the Quality in Practice (QuIP) study. 5 We generated the three ITS specific variables and used linear regression modelling. The analysis allowed us to quantify the effect of the intervention on recorded quality of care in the three conditions of interest, on top of what would be expected from the observed pre-intervention trend. We found that the intervention had an effect on quality of care for diabetes and asthma but not for heart disease (fig 2A ⇓ ). Since observations over time within each general practice can be treated as correlated, we used a multilevel regression model to account for clustering of observations within practices. 6 Bootstrap techniques can also be used to obtain more robust standard errors for the estimates. 7

Fig 2 Quality and Outcomes Framework (QOF) performance graphs for four presented examples. (A) Care for asthma, diabetes, and heart disease. Aggregate practice level performance across three clinical domains of interest. 5 (B) Diabetes care by number of comorbidities. Aggregate patient level performance for patients in the diabetes domain, by number of additional conditions. 8 (C) Incentivised and non-incentivised aspects of care. Aggregate practice level performance by incentivisation category and indicator type. 9 (D) Blood pressure measurement indicators. Aggregate practice level performance on blood pressure measurement indicator. 10 FI=fully incentivised, PI=partially incentivised, UI=unincentivised, PM/R=process measurement recording, PT=process treatment, I=intermediate outcome. The number of indicators in each group are in parentheses. CHD, DM, Stroke, and BP relate to the coronary heart disease, diabetes mellitus, stroke, and hypertension QOF clinical domains, respectively

Three important assumptions accompany this form of ITS analysis. Firstly, pre-intervention trends are assumed to be linear. Linearity of trends over time needs to be evaluated and confirmed firstly through visualisation and secondly with appropriate statistical tools for the ITS analysis results to have any credence. However, validating linearity can be a problem when there are only a few pre-intervention time points and is impossible with only two. Secondly, the ITS model estimates have not been controlled for covariates. The models assume that the characteristics of the populations remain unchanged throughout the study period and changes in the population base that might explain changes in the outcome are not accounted for. Thirdly, there is no comparator against which to adjust the results for changes that should not be attributed to the intervention itself.

With some modelling changes one can evaluate whether the intervention varies in relation to population characteristics (practices or patients, in the QOF context). For example, we can assess whether the impact of the QOF on performance of incentivised activities (HbA 1c control ≤7.4% or HbA 1c control ≤10% and retinal screening for patients with diabetes) varies by age group or other patient or practice characteristics. 8 To accomplish this we included “interaction terms” between the covariate (characteristics) of interest and the three ITS components relating to the pre-intervention slope, level change, and change in slope. A separate model needs to be executed for each covariate of interest.

In addition, the estimated pre-intervention slope can be used to compute predictions of what the value of the outcome would have been at post-intervention time points if the intervention had not taken place. These estimates can then be compared against observations for a specific time point, and an overall difference, or “uplift” (fig 1 ⇑ ), attributed to the intervention obtained. This comparison between predictions and observations not only applies to the advanced models, where both main and interaction effects estimates need to be considered, but to simple models as well. Using this approach we found that composite quality for patients with diabetes improved over and above the pre-incentive trend in the first post-intervention year, but by the third year improvement was smaller. The effect of the intervention did not vary by age, sex, or multimorbidity (fig 2B ⇑ ) but did for number of years living with the condition, with the smallest gains observed for newly diagnosed cases. 8 However, the linearity assumption, the lack of adjustment for changes in the population characteristics over time, and the absence of a comparator still apply.

More flexible modelling options are possible in which we can overcome some of the limitations in the basic and advanced designs. Let us assume a patient level analysis of incentivised and non-incentivised aspects of quality of care across a range of clinical indicators, with our aim being to evaluate whether the effect of the QOF on performance varies across fully incentivised and non-incentivised indicators. 9 Using regression modelling we can evaluate the relations between the outcome and covariates of interest (for example, patient age and sex), to obtain estimates that are adjusted for population changes, at specific time points. For example, to calculate the adjusted increase in the outcome above the projected trend, in the first post-intervention year. However, the modelling complexities are formidable and involve numerous steps. Using this approach we found that improvements attributed to financial incentives were achieved at the expense of small detrimental effects on non-incentivised aspects of care (fig 2C ⇑ ). 9

An alternative modelling approach can additionally incorporate “control” factors into the analyses. Let us assume we want to investigate the effect of withdrawing a financial incentive on practice performance. 10 In 2012-13, the QOF underwent a major revision and six clinical indicators were removed from the incentivisation scheme: blood pressure monitoring for coronary heart disease, diabetes, and stroke; cholesterol concentration monitoring for coronary heart disease and diabetes; blood glucose monitoring for diabetes. We used a regression based ITS to quantify the effect of the intervention, in this case the withdrawal of the incentive. We grouped the indicators by process and analysed these as separate groups, including indicators with similar characteristics that remained in the scheme and could act as “controls.” A multilevel mixed effects regression was used to model performance on all these indicators over time, controlled for covariates of interest and including an interaction term between time and indicators, but excluding post-intervention observations for the withdrawn indicators. Predictions and their standard errors were then obtained from the model, for the withdrawn indicators post-intervention and for each practice. These were compared with actual post-intervention observations using advanced meta-analysis methods, 11 to account for variability in the predictions, and obtain estimates of the differences. We found that the withdrawal of the incentive had little or no effect on quality of recorded care (fig 2D ⇑ ). 10

Although randomised controlled trials (RCTs) are considered the ideal approach for assessing the effectiveness of many interventions, we argue that observational data still need to be harnessed and utilised though robust alternative designs, even where trial evidence exists. Large scale population studies, using primary care databases, for example, can be valuable complements to well designed RCT evidence. 12 Sometimes evaluation through randomisation is not possible at all, as was the case with the UK’s primary care pay for performance scheme, which was implemented simultaneously across all UK practices. In either case, well designed observational studies can contribute greatly to the knowledge base, albeit with careful attention required to assess potential confounding and other threats to validity.

To better describe the methods, we drew on examples from our QOF research experiences. This approach allowed us to describe designs of increasing complexity, as well as present their technical details in the appendix code. However, we should also clarify that the ITS design is much more than a tool for QOF analyses, and it can investigate the effect of any policy change or intervention in a longitudinal dataset, provided the underlying assumptions are met. For example, it can investigate the decline in pneumonia admissions after routine childhood immunisation with pneumococcal conjugate vaccine in the United States, 13 the effect of 20 mph traffic zones on road injuries in London, 14 or the impact of infection control interventions and antibiotic use on hospital meticillin resistant Staphylococcus aureus (MRSA) in Scotland. 15

Quasi-experimental designs, and ITS analyses in particular, can help us unlock the potential of “real world” data, the volume and availability of which is increasing at an unprecedented rate. The limitations of quasi-experimental studies are generally well understood by the scientific community, whereas the same might not be true of the shortcomings of RCTs. Although the limitations can be daunting, including autocorrelation, time varying external effects, non-linearity, and unmeasured confounding, quasi-experimental designs are much cheaper and have the capacity, when carefully conducted, to complement trial evidence or even to map uncharted territory.

Sources and selection criteria

We chose to present examples that we ourselves have presented in major clinical journals, including The BMJ

EK and DR are experienced statisticians and health services researchers who have published numerous clinical papers using the described methods. TD is a professor of public health with considerable experience in these methods, who has co-authored most of these publications. DAS is research fellow in health informatics, a more recent addition to the research group, who co-authored our latest interrupted time series analysis. IB is professor of health informatics with wide experience in statistical methodology and its practical implementation

Cite this as: BMJ 2015;350:h2750

Contributors: EK wrote the manuscript. DR, DAS, TD, and IB critically edited the manuscript. EK is the guarantor of this work and, as such, had full access to all the data in the study and takes responsibility for the integrity of the data and the accuracy of the data analysis.

Funding: MRC Health eResearch Centre grant MR/K006665/1 supported the time and facilities of EK and IB. DAS was funded by the National Institute for Health Research (NIHR) School for Primary Care Research (SPCR). The views expressed are those of the authors and not necessarily those of the NHS, the National Institute for Health Research or the Department of Health.

Competing interests: All authors have completed the ICMJE uniform disclosure form at www.icmje.org/coi_disclosure.pdf (available on request from the corresponding author) and declare: No relationships or activities not discussed in the funding statement that could appear to have influenced the submitted work.

Provenance and peer review: Not commissioned; externally peer reviewed.

This is an Open Access article distributed in accordance with the terms of the Creative Commons Attribution (CC BY 4.0) license, which permits others to distribute, remix, adapt and build upon this work, for commercial use, provided the original work is properly cited. See: http://creativecommons.org/licenses/by/4.0/ .

  • ↵ Saunders C, Byrne CD, Guthrie B, et al. External validity of randomized controlled trials of glycaemic control and vascular disease: how representative are participants? Diabetic Med 2013 ; 30 : 300 -8. OpenUrl CrossRef PubMed
  • ↵ Guthrie B, Payne K, Alderson P, et al. Adapting clinical guidelines to take account of multimorbidity. BMJ 2012 ; 345 : e6341 . OpenUrl FREE Full Text
  • ↵ Wagner AK, Soumerai SB, Zhang F, et al. Segmented regression analysis of interrupted time series studies in medication use research. J Clin Pharm Ther 2002 ; 27 : 299 -309. OpenUrl CrossRef PubMed Web of Science
  • ↵ O’Keeffe AG, Geneletti S, Baio G, et al. Regression discontinuity designs: an approach to the evaluation of treatment efficacy in primary care using observational data. BMJ 2014 ; 349 : g5293 . OpenUrl FREE Full Text
  • ↵ Campbell SM, Reeves D, Kontopantelis E, et al. Effects of pay for performance on the quality of primary care in England. N Engl J Med 2009 ; 361 : 368 -78. OpenUrl CrossRef PubMed Web of Science
  • ↵ Rabe-Hesketh S, Skrondal A. Multilevel and longitudinal modeling using Stata. 3rd ed. Stata Press, 2012.
  • ↵ Efron B. The bootstrap and modern statistics. J Am Stat Assoc 2000 ; 95 : 1293 -6. OpenUrl CrossRef Web of Science
  • ↵ Kontopantelis E, Reeves D, Valderas JM, et al. Recorded quality of primary care for patients with diabetes in England before and after the introduction of a financial incentive scheme: a longitudinal observational study. BMJ Qual Saf 2013 ; 22 : 53 -64. OpenUrl Abstract / FREE Full Text
  • ↵ Doran T, Kontopantelis E, Valderas JM, et al. Effect of financial incentives on incentivised and non-incentivised clinical activities: longitudinal analysis of data from the UK Quality and Outcomes Framework. BMJ 2011 ; 342 : d3590 . OpenUrl Abstract / FREE Full Text
  • ↵ Kontopantelis E, Springate D, Reeves D, et al. Withdrawing performance indicators: retrospective analysis of general practice performance under UK Quality and Outcomes Framework. BMJ 2014 ; 348 : g330 . OpenUrl Abstract / FREE Full Text
  • ↵ Kontopantelis E, Springate DA, Reeves D. A re-analysis of the Cochrane Library data: the dangers of unobserved heterogeneity in meta-analyses. Plos One 2013 ; 8 : e69930 . OpenUrl CrossRef PubMed
  • ↵ Silverman SL. From randomized controlled trials to observational studies. Am J Med 2009 ; 122 : 114 -20. OpenUrl CrossRef PubMed Web of Science
  • ↵ Grijalva CG, Nuorti JP, Arbogast PG, et al. Decline in pneumonia admissions after routine childhood immunisation with pneumococcal conjugate vaccine in the USA: a time-series analysis. Lancet 2007 ; 369 : 1179 -86. OpenUrl CrossRef PubMed Web of Science
  • ↵ Grundy C, Steinbach R, Edwards P, et al. Effect of 20 mph traffic speed zones on road injuries in London, 1986-2006: controlled interrupted time series analysis. BMJ 2009 ; 339 : b4469 . OpenUrl Abstract / FREE Full Text
  • ↵ Mahamat A, MacKenzie FM, Brooker K, Monnet DL, Daures JP, Gould IM. Impact of infection control interventions and antibiotic use on hospital MRSA: a multivariate interrupted time-series analysis. Int J Antimicrob Agents 2007 ; 30 : 169 -76. OpenUrl CrossRef PubMed Web of Science

quasi experimental longitudinal study

Our systems are now restored following recent technical disruption, and we’re working hard to catch up on publishing. We apologise for the inconvenience caused. Find out more: https://www.cambridge.org/universitypress/about-us/news-and-blogs/cambridge-university-press-publishing-update-following-technical-disruption

We use cookies to distinguish you from other users and to provide you with a better experience on our websites. Close this message to accept cookies or find out how to manage your cookie settings .

Login Alert

  • > The Cambridge Handbook of Research Methods and Statistics for the Social and Behavioral Sciences
  • > Quasi-Experimental Research

quasi experimental longitudinal study

Book contents

  • The Cambridge Handbook of Research Methods and Statistics for the Social and Behavioral Sciences
  • Cambridge Handbooks in Psychology
  • Copyright page
  • Contributors
  • Part I From Idea to Reality: The Basics of Research
  • Part II The Building Blocks of a Study
  • Part III Data Collection
  • 13 Cross-Sectional Studies
  • 14 Quasi-Experimental Research
  • 15 Non-equivalent Control Group Pretest–Posttest Design in Social and Behavioral Research
  • 16 Experimental Methods
  • 17 Longitudinal Research: A World to Explore
  • 18 Online Research Methods
  • 19 Archival Data
  • 20 Qualitative Research Design
  • Part IV Statistical Approaches
  • Part V Tips for a Successful Research Career

14 - Quasi-Experimental Research

from Part III - Data Collection

Published online by Cambridge University Press:  25 May 2023

In this chapter, we discuss the logic and practice of quasi-experimentation. Specifically, we describe four quasi-experimental designs – one-group pretest–posttest designs, non-equivalent group designs, regression discontinuity designs, and interrupted time-series designs – and their statistical analyses in detail. Both simple quasi-experimental designs and embellishments of these simple designs are presented. Potential threats to internal validity are illustrated along with means of addressing their potentially biasing effects so that these effects can be minimized. In contrast to quasi-experiments, randomized experiments are often thought to be the gold standard when estimating the effects of treatment interventions. However, circumstances frequently arise where quasi-experiments can usefully supplement randomized experiments or when quasi-experiments can fruitfully be used in place of randomized experiments. Researchers need to appreciate the relative strengths and weaknesses of the various quasi-experiments so they can choose among pre-specified designs or craft their own unique quasi-experiments.

Access options

Save book to kindle.

To save this book to your Kindle, first ensure [email protected] is added to your Approved Personal Document E-mail List under your Personal Document Settings on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part of your Kindle email address below. Find out more about saving to your Kindle .

Note you can select to save to either the @free.kindle.com or @kindle.com variations. ‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi. ‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.

Find out more about the Kindle Personal Document Service .

  • Quasi-Experimental Research
  • By Charles S. Reichardt , Daniel Storage , Damon Abraham
  • Edited by Austin Lee Nichols , Central European University, Vienna , John Edlund , Rochester Institute of Technology, New York
  • Book: The Cambridge Handbook of Research Methods and Statistics for the Social and Behavioral Sciences
  • Online publication: 25 May 2023
  • Chapter DOI: https://doi.org/10.1017/9781009010054.015

Save book to Dropbox

To save content items to your account, please confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account. Find out more about saving content to Dropbox .

Save book to Google Drive

To save content items to your account, please confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account. Find out more about saving content to Google Drive .

Logo for BCcampus Open Publishing

Want to create or adapt books like this? Learn more about how Pressbooks supports open publishing practices.

Chapter 7: Nonexperimental Research

Quasi-Experimental Research

Learning Objectives

  • Explain what quasi-experimental research is and distinguish it clearly from both experimental and correlational research.
  • Describe three different types of quasi-experimental research designs (nonequivalent groups, pretest-posttest, and interrupted time series) and identify examples of each one.

The prefix  quasi  means “resembling.” Thus quasi-experimental research is research that resembles experimental research but is not true experimental research. Although the independent variable is manipulated, participants are not randomly assigned to conditions or orders of conditions (Cook & Campbell, 1979). [1] Because the independent variable is manipulated before the dependent variable is measured, quasi-experimental research eliminates the directionality problem. But because participants are not randomly assigned—making it likely that there are other differences between conditions—quasi-experimental research does not eliminate the problem of confounding variables. In terms of internal validity, therefore, quasi-experiments are generally somewhere between correlational studies and true experiments.

Quasi-experiments are most likely to be conducted in field settings in which random assignment is difficult or impossible. They are often conducted to evaluate the effectiveness of a treatment—perhaps a type of psychotherapy or an educational intervention. There are many different kinds of quasi-experiments, but we will discuss just a few of the most common ones here.

Nonequivalent Groups Design

Recall that when participants in a between-subjects experiment are randomly assigned to conditions, the resulting groups are likely to be quite similar. In fact, researchers consider them to be equivalent. When participants are not randomly assigned to conditions, however, the resulting groups are likely to be dissimilar in some ways. For this reason, researchers consider them to be nonequivalent. A  nonequivalent groups design , then, is a between-subjects design in which participants have not been randomly assigned to conditions.

Imagine, for example, a researcher who wants to evaluate a new method of teaching fractions to third graders. One way would be to conduct a study with a treatment group consisting of one class of third-grade students and a control group consisting of another class of third-grade students. This design would be a nonequivalent groups design because the students are not randomly assigned to classes by the researcher, which means there could be important differences between them. For example, the parents of higher achieving or more motivated students might have been more likely to request that their children be assigned to Ms. Williams’s class. Or the principal might have assigned the “troublemakers” to Mr. Jones’s class because he is a stronger disciplinarian. Of course, the teachers’ styles, and even the classroom environments, might be very different and might cause different levels of achievement or motivation among the students. If at the end of the study there was a difference in the two classes’ knowledge of fractions, it might have been caused by the difference between the teaching methods—but it might have been caused by any of these confounding variables.

Of course, researchers using a nonequivalent groups design can take steps to ensure that their groups are as similar as possible. In the present example, the researcher could try to select two classes at the same school, where the students in the two classes have similar scores on a standardized math test and the teachers are the same sex, are close in age, and have similar teaching styles. Taking such steps would increase the internal validity of the study because it would eliminate some of the most important confounding variables. But without true random assignment of the students to conditions, there remains the possibility of other important confounding variables that the researcher was not able to control.

Pretest-Posttest Design

In a  pretest-posttest design , the dependent variable is measured once before the treatment is implemented and once after it is implemented. Imagine, for example, a researcher who is interested in the effectiveness of an antidrug education program on elementary school students’ attitudes toward illegal drugs. The researcher could measure the attitudes of students at a particular elementary school during one week, implement the antidrug program during the next week, and finally, measure their attitudes again the following week. The pretest-posttest design is much like a within-subjects experiment in which each participant is tested first under the control condition and then under the treatment condition. It is unlike a within-subjects experiment, however, in that the order of conditions is not counterbalanced because it typically is not possible for a participant to be tested in the treatment condition first and then in an “untreated” control condition.

If the average posttest score is better than the average pretest score, then it makes sense to conclude that the treatment might be responsible for the improvement. Unfortunately, one often cannot conclude this with a high degree of certainty because there may be other explanations for why the posttest scores are better. One category of alternative explanations goes under the name of  history . Other things might have happened between the pretest and the posttest. Perhaps an antidrug program aired on television and many of the students watched it, or perhaps a celebrity died of a drug overdose and many of the students heard about it. Another category of alternative explanations goes under the name of  maturation . Participants might have changed between the pretest and the posttest in ways that they were going to anyway because they are growing and learning. If it were a yearlong program, participants might become less impulsive or better reasoners and this might be responsible for the change.

Another alternative explanation for a change in the dependent variable in a pretest-posttest design is  regression to the mean . This refers to the statistical fact that an individual who scores extremely on a variable on one occasion will tend to score less extremely on the next occasion. For example, a bowler with a long-term average of 150 who suddenly bowls a 220 will almost certainly score lower in the next game. Her score will “regress” toward her mean score of 150. Regression to the mean can be a problem when participants are selected for further study  because  of their extreme scores. Imagine, for example, that only students who scored especially low on a test of fractions are given a special training program and then retested. Regression to the mean all but guarantees that their scores will be higher even if the training program has no effect. A closely related concept—and an extremely important one in psychological research—is  spontaneous remission . This is the tendency for many medical and psychological problems to improve over time without any form of treatment. The common cold is a good example. If one were to measure symptom severity in 100 common cold sufferers today, give them a bowl of chicken soup every day, and then measure their symptom severity again in a week, they would probably be much improved. This does not mean that the chicken soup was responsible for the improvement, however, because they would have been much improved without any treatment at all. The same is true of many psychological problems. A group of severely depressed people today is likely to be less depressed on average in 6 months. In reviewing the results of several studies of treatments for depression, researchers Michael Posternak and Ivan Miller found that participants in waitlist control conditions improved an average of 10 to 15% before they received any treatment at all (Posternak & Miller, 2001) [2] . Thus one must generally be very cautious about inferring causality from pretest-posttest designs.

Does Psychotherapy Work?

Early studies on the effectiveness of psychotherapy tended to use pretest-posttest designs. In a classic 1952 article, researcher Hans Eysenck summarized the results of 24 such studies showing that about two thirds of patients improved between the pretest and the posttest (Eysenck, 1952) [3] . But Eysenck also compared these results with archival data from state hospital and insurance company records showing that similar patients recovered at about the same rate  without  receiving psychotherapy. This parallel suggested to Eysenck that the improvement that patients showed in the pretest-posttest studies might be no more than spontaneous remission. Note that Eysenck did not conclude that psychotherapy was ineffective. He merely concluded that there was no evidence that it was, and he wrote of “the necessity of properly planned and executed experimental studies into this important field” (p. 323). You can read the entire article here: Classics in the History of Psychology .

Fortunately, many other researchers took up Eysenck’s challenge, and by 1980 hundreds of experiments had been conducted in which participants were randomly assigned to treatment and control conditions, and the results were summarized in a classic book by Mary Lee Smith, Gene Glass, and Thomas Miller (Smith, Glass, & Miller, 1980) [4] . They found that overall psychotherapy was quite effective, with about 80% of treatment participants improving more than the average control participant. Subsequent research has focused more on the conditions under which different types of psychotherapy are more or less effective.

Interrupted Time Series Design

A variant of the pretest-posttest design is the  interrupted time-series design . A time series is a set of measurements taken at intervals over a period of time. For example, a manufacturing company might measure its workers’ productivity each week for a year. In an interrupted time series-design, a time series like this one is “interrupted” by a treatment. In one classic example, the treatment was the reduction of the work shifts in a factory from 10 hours to 8 hours (Cook & Campbell, 1979) [5] . Because productivity increased rather quickly after the shortening of the work shifts, and because it remained elevated for many months afterward, the researcher concluded that the shortening of the shifts caused the increase in productivity. Notice that the interrupted time-series design is like a pretest-posttest design in that it includes measurements of the dependent variable both before and after the treatment. It is unlike the pretest-posttest design, however, in that it includes multiple pretest and posttest measurements.

Figure 7.3 shows data from a hypothetical interrupted time-series study. The dependent variable is the number of student absences per week in a research methods course. The treatment is that the instructor begins publicly taking attendance each day so that students know that the instructor is aware of who is present and who is absent. The top panel of  Figure 7.3 shows how the data might look if this treatment worked. There is a consistently high number of absences before the treatment, and there is an immediate and sustained drop in absences after the treatment. The bottom panel of  Figure 7.3 shows how the data might look if this treatment did not work. On average, the number of absences after the treatment is about the same as the number before. This figure also illustrates an advantage of the interrupted time-series design over a simpler pretest-posttest design. If there had been only one measurement of absences before the treatment at Week 7 and one afterward at Week 8, then it would have looked as though the treatment were responsible for the reduction. The multiple measurements both before and after the treatment suggest that the reduction between Weeks 7 and 8 is nothing more than normal week-to-week variation.

Image description available

Combination Designs

A type of quasi-experimental design that is generally better than either the nonequivalent groups design or the pretest-posttest design is one that combines elements of both. There is a treatment group that is given a pretest, receives a treatment, and then is given a posttest. But at the same time there is a control group that is given a pretest, does  not  receive the treatment, and then is given a posttest. The question, then, is not simply whether participants who receive the treatment improve but whether they improve  more  than participants who do not receive the treatment.

Imagine, for example, that students in one school are given a pretest on their attitudes toward drugs, then are exposed to an antidrug program, and finally are given a posttest. Students in a similar school are given the pretest, not exposed to an antidrug program, and finally are given a posttest. Again, if students in the treatment condition become more negative toward drugs, this change in attitude could be an effect of the treatment, but it could also be a matter of history or maturation. If it really is an effect of the treatment, then students in the treatment condition should become more negative than students in the control condition. But if it is a matter of history (e.g., news of a celebrity drug overdose) or maturation (e.g., improved reasoning), then students in the two conditions would be likely to show similar amounts of change. This type of design does not completely eliminate the possibility of confounding variables, however. Something could occur at one of the schools but not the other (e.g., a student drug overdose), so students at the first school would be affected by it while students at the other school would not.

Finally, if participants in this kind of design are randomly assigned to conditions, it becomes a true experiment rather than a quasi experiment. In fact, it is the kind of experiment that Eysenck called for—and that has now been conducted many times—to demonstrate the effectiveness of psychotherapy.

Key Takeaways

  • Quasi-experimental research involves the manipulation of an independent variable without the random assignment of participants to conditions or orders of conditions. Among the important types are nonequivalent groups designs, pretest-posttest, and interrupted time-series designs.
  • Quasi-experimental research eliminates the directionality problem because it involves the manipulation of the independent variable. It does not eliminate the problem of confounding variables, however, because it does not involve random assignment to conditions. For these reasons, quasi-experimental research is generally higher in internal validity than correlational studies but lower than true experiments.
  • Practice: Imagine that two professors decide to test the effect of giving daily quizzes on student performance in a statistics course. They decide that Professor A will give quizzes but Professor B will not. They will then compare the performance of students in their two sections on a common final exam. List five other variables that might differ between the two sections that could affect the results.
  • regression to the mean
  • spontaneous remission

Image Descriptions

Figure 7.3 image description: Two line graphs charting the number of absences per week over 14 weeks. The first 7 weeks are without treatment and the last 7 weeks are with treatment. In the first line graph, there are between 4 to 8 absences each week. After the treatment, the absences drop to 0 to 3 each week, which suggests the treatment worked. In the second line graph, there is no noticeable change in the number of absences per week after the treatment, which suggests the treatment did not work. [Return to Figure 7.3]

  • Cook, T. D., & Campbell, D. T. (1979). Quasi-experimentation: Design & analysis issues in field settings . Boston, MA: Houghton Mifflin. ↵
  • Posternak, M. A., & Miller, I. (2001). Untreated short-term course of major depression: A meta-analysis of studies using outcomes from studies using wait-list control groups. Journal of Affective Disorders, 66 , 139–146. ↵
  • Eysenck, H. J. (1952). The effects of psychotherapy: An evaluation. Journal of Consulting Psychology, 16 , 319–324. ↵
  • Smith, M. L., Glass, G. V., & Miller, T. I. (1980). The benefits of psychotherapy . Baltimore, MD: Johns Hopkins University Press. ↵

A between-subjects design in which participants have not been randomly assigned to conditions.

The dependent variable is measured once before the treatment is implemented and once after it is implemented.

A category of alternative explanations for differences between scores such as events that happened between the pretest and posttest, unrelated to the study.

An alternative explanation that refers to how the participants might have changed between the pretest and posttest in ways that they were going to anyway because they are growing and learning.

The statistical fact that an individual who scores extremely on a variable on one occasion will tend to score less extremely on the next occasion.

The tendency for many medical and psychological problems to improve over time without any form of treatment.

A set of measurements taken at intervals over a period of time that are interrupted by a treatment.

Research Methods in Psychology - 2nd Canadian Edition Copyright © 2015 by Paul C. Price, Rajiv Jhangiani, & I-Chant A. Chiang is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License , except where otherwise noted.

Share This Book

quasi experimental longitudinal study

  • Technical advance
  • Open access
  • Published: 11 February 2021

Conceptualising natural and quasi experiments in public health

  • Frank de Vocht   ORCID: orcid.org/0000-0003-3631-627X 1 , 2 , 3 ,
  • Srinivasa Vittal Katikireddi 4 ,
  • Cheryl McQuire 1 , 2 ,
  • Kate Tilling 1 , 5 ,
  • Matthew Hickman 1 &
  • Peter Craig 4  

BMC Medical Research Methodology volume  21 , Article number:  32 ( 2021 ) Cite this article

21k Accesses

64 Citations

94 Altmetric

Metrics details

Natural or quasi experiments are appealing for public health research because they enable the evaluation of events or interventions that are difficult or impossible to manipulate experimentally, such as many policy and health system reforms. However, there remains ambiguity in the literature about their definition and how they differ from randomized controlled experiments and from other observational designs. We conceptualise natural experiments in the context of public health evaluations and align the study design to the Target Trial Framework.

A literature search was conducted, and key methodological papers were used to develop this work. Peer-reviewed papers were supplemented by grey literature.

Natural experiment studies (NES) combine features of experiments and non-experiments. They differ from planned experiments, such as randomized controlled trials, in that exposure allocation is not controlled by researchers. They differ from other observational designs in that they evaluate the impact of events or process that leads to differences in exposure. As a result they are, in theory, less susceptible to bias than other observational study designs. Importantly, causal inference relies heavily on the assumption that exposure allocation can be considered ‘as-if randomized’. The target trial framework provides a systematic basis for evaluating this assumption and the other design elements that underpin the causal claims that can be made from NES.


NES should be considered a type of study design rather than a set of tools for analyses of non-randomized interventions. Alignment of NES to the Target Trial framework will clarify the strength of evidence underpinning claims about the effectiveness of public health interventions.

Peer Review reports

When designing a study to estimate the causal effect of an intervention, the experiment (particularly the randomised controlled trial (RCT) is generally considered to be the least susceptible to bias. A defining feature of the experiment is that the researcher controls the assignment of the treatment or exposure. If properly conducted, random assignment balances unmeasured confounders in expectation between the intervention and control groups . In many evaluations of public health interventions, however, it is not possible to conduct randomised experiments. Instead, standard observational epidemiological study designs have traditionally been used. These are known to be susceptible to unmeasured confounding.

Natural experimental studies (NES) have become popular as an alternative evaluation design in public health research, as they have distinct benefits over traditional designs [ 1 ]. In NES, although the allocation and dosage of treatment or exposure are not under the control of the researcher, they are expected to be unrelated to other factors that cause the outcome of interest [ 2 , 3 , 4 , 5 ]. Such studies can provide strong causal information in complex real-world situations, and can generate effect sizes close to the causal estimates from RCTs [ 6 , 7 , 8 ]. The term natural experiment study is sometimes used synonymously with quasi-experiment; a much broader term that can also refer to researcher-led but non-randomised experiments. In this paper we argue for a clearer conceptualisation of natural experiment studies in public health research, and present a framework to improve their design and reporting and facilitate assessment of causal claims.

Natural and quasi-experiments have a long history of use for evaluations of public health interventions. One of the earliest and best-known examples is the case of ‘Dr John Snow and the Broad Street pump’ [ 9 ]. In this study, cholera deaths were significantly lower among residents served by the Lambeth water company, which had moved its intake pipe to an upstream location of the Thames following an earlier outbreak, compared to those served by the Southwark and Vauxhall water company, who did not move their intake pipe. Since houses in the study area were serviced by either company in an essentially random manner, this natural experiment provided strong evidence that cholera was transmitted through water [ 10 ].

Natural and quasi experiments

Natural and quasi experiments are appealing because they enable the evaluation of changes to a system that are difficult or impossible to manipulate experimentally. These include, for example, large events, pandemics and policy changes [ 7 , 11 ]. They also allow for retrospective evaluation when the opportunity for a trial has passed [ 12 ]. They offer benefits over standard observational studies because they exploit variation in exposure that arises from an exogenous ( i.e. not caused by other factors in the analytic model [ 1 ]) event or intervention. This aligns them to the ‘ do -operator’ in the work of Pearl [ 13 ]. Quasi experiments (QES) and NES thus combine features of experiments (exogenous exposure) and non-experiments (observations without a researcher-controlled intervention). As a result, they are generally less susceptible to confounding than many other observational study designs [ 14 ]. However, a common critique of QES and NES is that because the processes producing variation in exposure are outside the control of the research team, there is uncertainty as to whether confounding has been sufficiently minimized or avoided [ 7 ]. For example, a QES of the impact of a voluntary change by a fast food chain to label its menus with information on calories on subsequent purchasing of calories [ 15 ]. Unmeasured differences in the populations that visit that particular chain compared to other fast-food choices could lead to residual confounding.

A distinction is sometimes made between QES and NES. The term ‘natural experiment’ has traditionally referred to the occurrence of an event with a natural cause; a ‘force of nature‘(Fig.  1 a) [ 1 ]. These make for some of the most compelling studies of causation from non-randomised experiments. For example, the Canterbury earthquakes in 2010–2011 have been used to study the causal impact of such disasters because about half of an established birth cohort lived in the affected area with the remainder of the cohort living elsewhere [ 16 ]. More recently, the use of the term ‘natural’ has been understood more broadly as an event which did not involve the deliberate manipulation of exposure for research purposes (for example a policy change), even if human agency was involved [ 17 ]. Compared to natural experiments in QES the research team may be able to influence exposure allocation, even if the event or exposure itself is not under their full control; for example in a phased roll out of a policy [ 18 ]. A well-known example of a natural experiment is the “Dutch Hunger Winter” summarised by Lumey et al. [ 19 ]. During this period in the Second World War the German authorities blocked all food supplies to the occupied West of the Netherlands, which resulted in widespread starvation. Food supplies were restored immediately after the country was liberated, so the exposure was sharply defined by time as well as place. Because there was sufficient food in the occupied and liberated areas of the Netherlands before and after the Hunger Winter, exposure to famine occurred based on an individual’s time and place (of birth) only. Similar examples of such ‘political’ natural experiment studies are the study of the impact of China’s Great Famine [ 20 ] and the ‘special period’ in Cuba’s history following the collapse of the Soviet Union and the imposition of a US blockade [ 21 ]. NES that describe the evaluation of an event which did not involve the deliberate manipulation of an exposure but involved human agency, such as the impact of a new policy, are the mainstay of ‘natural experimental research’ in public health, and the term NES has become increasingly popular to indicate any quasi-experimental design (although it has not completely replaced it).

figure 1

Different conceptualisations of natural and quasi experiments within wider evaluation frameworks

Dunning takes the distinction of a NES further. He defines a NES as a QES where knowledge about the exposure allocation process provides a strong argument that allocation, although not deliberately manipulated by the researcher, is essentially random. This concept is referred to as ‘as-if randomization’ (Fig. 1 b) [ 4 , 8 , 10 ]. Under this definition, NES differ from QES in which the allocation of exposure, whether partly controlled by the researcher or not, does not clearly resemble a random process.

A third distinction between QES and NES has been made that argues that NES describe the study of unplanned events whereas QES describe evaluations of events that are planned (but not controlled by the researcher), such as policies or programmes specifically aimed at influencing an outcome (Fig. 1 c) [ 17 ]. In practice however, the distinction between these can be ambiguous.

When the assignment of exposure is not controlled by the researcher, with rare exceptions (for example lottery-system [ 22 ] or military draft [ 23 ] allocations), it is typically very difficult to prove that true (as-if) randomization occurred. Because of the ambiguity of ‘as-if randomization’ and the fact that the tools to assess this are the same as those used for assessment of internal validity in any observational study [ 12 ], the UK Medical Research Council (MRC) guidance advocates a broader conceptualisation of a NES. Under the MRC guidance, a NES is defined as any study that investigates an event that is not under the control of the research team, and which divides a population into exposed and unexposed groups, or into groups with different levels of exposure (Fig. 1 d).

Here, while acknowledging the remaining ambiguity regarding the precise definition of a NES, in consideration of the definitions above [ 24 ], we argue that:

what distinguishes NES from RCTs is that allocation is not controlled by the researchers and;

what distinguishes NES from other observational designs is that they specifically evaluate the impact of a clearly defined event or process which result in differences in exposure between groups.

A detailed assessment of the allocation mechanism (which determines exposure status) is essential. If we can demonstrate that the allocation process approximates a randomization process, any causal claims from NES will be substantially strengthened. The plausibility of the ‘as-if random’ assumption strongly depends on detailed knowledge of why and how individuals or groups of individuals were assigned to conditions and how the assignment process was implemented [ 10 ]. This plausibility can be assessed quantitatively for observed factors using standard tools for assessment of internal validity of a study [ 12 ], and should ideally be supplemented by a qualitative description of the assignment process. Common with contemporary public health practice, we will use the term ‘natural experiment study’, or NES to refer to both NES and QES, from hereon.

Medline, Embase and Google Scholar were searched using search terms including quasi-experiment, natural experiment, policy evaluation and public health evaluation and key methodological papers were used to develop this work. Peer-reviewed papers were supplemented by grey literature.

Part 1. Conceptualisations of natural experiments

An analytic approach.

Some conceptualisations of NES place their emphasis on the analytic tools that are used to evaluate natural experiments [ 25 , 26 ]. In this conceptualisation NES are understood as being defined by the way in which they are analysed, rather than by their design. An array of different statistical methods is available to analyse natural experiments, including regression adjustments, propensity scores, difference-in-differences, interrupted time series, regression discontinuity, synthetic controls, and instrumental variables. Overviews including strengths and limitations of the different methods are provided in [ 12 , 27 ]. However, an important drawback of this conceptualisation is that it suggests that there is a distinct set of methods for the analysis of NES.

A study design

The popularity of NES has resulted in some conceptual stretching, where the label is applied to a research design that only implausibly meets the definitional features of a NES [ 10 ]. For example, observational studies exploring variation in exposures (rather than the study of an event or change in exposure) have sometimes also been badged as NES. A more stringent classification of NES as a type of study design, rather than a collection of analytic tools, is important because it prevents attempts to incorrectly cover observational studies with a ‘glow of experimental legitimacy’ [ 10 ]. If the design rather than the statistical methodology defines a NES, this allows an open-ended array of statistical tools. These tools are not necessarily constrained by those mentioned above, but could also, for example, include new methods such as synthetic controls that can be utilised to analyse the natural experiments. The choice of appropriate evaluation method should be based on what is most suitable for each particular study, and then depends on the knowledge about the event, the availability of data, and design elements such as its allocation process.

Dunning argues that it is the overall research design, rather than just the statistical methods, that compels conviction when making causal claims. He proposes an evaluation framework for NES along the three dimensions of (1) the plausibility of as-if randomization of treatment, (2) the credibility of causal and statistical models, and (3) the substantive relevance of the treatment. Here, the first dimension is considered key for distinguishing NES from other QES [ 4 ]. NES can be divided into those where a plausible case for ‘as-if random’ assignment can be made (which he defines as NES), and those where confounding from observed factors is directly adjusted for through statistical means. The validity of the latter (which Dunning defines as ‘other quasi experiments’, and we define as ‘weaker NES’) relies on the assumption that unmeasured confounding is absent [ 8 ], and is considered less credible in theory for making causal claims [ 4 ]. In this framework, the ‘as-if-randomised’ NES can be viewed as offering stronger causal evidence than other quasi-experiments. In principle, they offer an opportunity for direct estimates of effects (akin to RCTs) where control for confounding factors would not necessarily be required [ 4 ], rather than relying on adjustment to derive conditional effect estimates [ 10 ]. Of course, the latter may well reach valid and compelling conclusions as well, but causal claims suffer to a higher degree from the familiar threats of bias and unmeasured confounding.

Part 2. A target trial framework for natural experiment studies

In this section, we provide recommendations for evaluation of the ‘as if random’ assumption and provide a unifying Target Trial Framework for NES, which brings together key sets of criteria that can be used to appraise the strength of causal claims from NES and assist with study design and reporting.

In public health, there is considerable overlap between analytic and design-based uses of the term NES. Nevertheless, we argue that if we consider NES a type of study design, causal inference can be strengthened by clear appraisal of the likelihood of ‘as-if’ random allocation of exposure. This should be demonstrated by both empirical evidence and by knowledge and reasoning about the causal question and substantive domain under question [ 8 , 10 ]. Because the concept of ‘as-if’ randomization is difficult, if not impossible to prove, it should be thought of along a ‘continuum of plausibility’ [ 10 ]. Specifically, for claims of ‘as-if’ randomization to be plausible, it must be demonstrated that the variables that determine treatment assignment are exogenous. This means that they are: i) strongly correlated with treatment status but are not caused by the outcome of interest (i.e. no reverse causality) and ii) independent of any other (measured or unmeasured) causes of the outcome of interest [ 8 ].

Given this additional layer of justification, especially with respect to the qualitative knowledge of the assignment process and domain knowledge from practitioners more broadly, we argue where feasible for the involvement of practitioners. This could, for example, be formalized through co-production in which members of the public and policy makers are involved in the development of the evaluation. If we appraise NES as a type of study design, which distinguish themselves from other designs because i) there is a particular change in exposure that is evaluated and ii) causal claims are supported by an argument of the plausibility of as-if randomization, then we guard against conflating NES with other observational designs [ 10 , 28 ].

There is a range of ways of dealing with the problems of selection on measured and unmeasured confounders in NES [ 8 , 10 ] which can be understood in terms of a ‘target trial’ we are trying to emulate, had randomization been possible [ 29 ]. The protocol of a target trial describes seven components common to RCTs (‘eligibility criteria’, ‘treatment strategies’, ‘assignment procedures’, ‘follow-up period’, ‘outcome’, ‘causal contrasts of interest’, and the ‘analysis plan’), and provides a systematic way of improving, reporting and appraising NES relative to a ‘gold standard’ (but often not feasible in practice) trial. In the design phase of a NES deviations from the target trial in each domain can be used to evaluate where improvements and where concessions will have to be made. This same approach can be used to appraise existing NES. The target trial framework also provides a structured way for reporting NES, which will facilitate evaluation of the strength of NES, improve consistency and completeness of reporting, and benefit evidence syntheses.

In Table  1 , we bring together elements of the Target Trial framework and conceptualisations of NES to derive a framework to describe the Target Trial for NES [ 12 ]. By encouraging researchers to address the questions in Table 1 , the framework provides a structured approach to the design, reporting and evaluation of NES across the seven target trial domains. Table 1 also provides recommendations to improve the strength of causal claims from NES, focussing primarily on sensitivity analyses to improve internal validity.

An illustrative example of a well-developed NES based on the criteria outlined in Table 1 is by Reeves et al. [ 39 ]. The NES evaluates the impact of the introduction of a National Minimum Wage on mental health. The study compared a clearly defined intervention group of recipients of a wage increase up to 110% of pre-intervention wage with clearly defined control groups of (1) people ineligible to the intervention because their wage at baseline was just above (100–110%) minimum wage and (2) people who were eligible, but whose companies did not comply and did not increase minimum wage. This study also included several sensitivity tests to strengthen causal arguments. We have aligned this study to the Target Trial framework in Additional file  1 .

The Target Trial Approach for NES (outlined in Table 1 ) provides a straightforward approach to improve, report, and appraise existing NES and to assist in the design of future studies. It focusses on structural design elements and goes beyond the use of quantitative tools alone to assess internal validity [ 12 ]. This work complements the ROBINS-I tool for assessing risk of bias in non-randomised studies of interventions, which similarly adopted the Target Trial framework [ 40 ]. Our approach focusses on the internal validity of a NES, with issues of construct and external validity being outside of the scope of this work (guidelines for these are provided in for example [ 41 ]). It should be acknowledged that less methodologically robust studies can still reach valid and compelling conclusions, even without resembling the notional target trial. However, we believe that drawing on the target trial framework helps highlight occasions when causal inference can be made more confidently.

And finally, the framework does explicitly exclude observational studies that aim to investigate the effects of changes in behaviour without an externally forced driver to do so. For example, although a cohort study can be the basis for the evaluation of a NES in principle, effects of the change of diet of some participants (compared to those who did not change their diet) is not an external cause (i.e. exogenous) and does not fall within the definition of an experiment [ 11 ]. However, such studies are likely to be more convincing than those which do not study within-person changes and we note that the statistical methods used may be similar to NES.

Despite their advantages, NES remain based on observational data and thus biases in assignment of the intervention can never be completely excluded (although for plausibly ‘as if randomised’ natural experiments these should be minimal). It is therefore important that a robust assessment of different potential sources of bias is reported. It has additionally been argued that sensitivity analyses are required to assess whether a pattern of small biases could explain away any ostensible effect of the intervention, because confidence intervals and statistical tests do not do this [ 14 ]. Recommendations that would improve the confidence with which we can make causal claims from NES, derived from work by Rosenbaum [ 14 ], have been outlined in Table 1 . Although sensitivity analyses can place plausible limits on the size of the effects of hidden biases, because such analyses are susceptible to assumptions about the maximum size of omitted biases, they cannot completely rule out residual bias [ 34 ]. Of importance for the strength of causal claims therefore, is the triangulation of NES with other evaluations using different data or study designs susceptible to different sources of bias [ 5 , 42 ].

None of the recommendations outlined in Table 1 will by themselves eliminate bias in a NES, but neither is it required to implement all of them to be able to make a causal claim with some confidence. Instead, a continuum of confidence in the causal claims based on the study design and the data is a more appropriate and practical approach [ 43 ]. Each sensitivity analysis aims to minimise ambiguity of a particular potential bias or biases, and as such a combination of selected sensitivity analyses can strengthen causal claims [ 14 ]. We would generally, but not strictly, consider a well conducted RCT as the design where we are most confident about such claims, followed by natural experiments, and then other observational studies; this would be an extension of the Grading of Recommendations Assessment, Development, and Evaluation (GRADE) framework [ 44 ]. GRADE provides a system for rating the quality (or certainty) of a body of evidence and grading the strength of recommendations for use in systematic reviews, health technology assessments (HTAs), and clinical practice guidelines. It typically only distinguishes between trials and observational studies when making these judgments (note however, that recent guidance does not make this explicit distinction when using ROBINS-I [ 45 ]). Given the increased contribution of NES in public health, especially those based on routine data [ 37 ], the specific inclusion of NES in this system might improve the rating of the evidence from these study designs.

Our recommendations are of particular importance for ensuring rigour in the context of (public) health research where natural experiments have become increasingly popular for a variety of reasons, including the availability of large routinely collected datasets [ 37 ]. Such datasets invite the discovery of natural experiments, even where the data may not be particularly applicable to this design, but also these enable many of the sensitivity analyses to be conducted from within the same dataset or through linkage to other routine datasets.

Finally, alignment to the Target Trial Framework also links natural experiment studies directly to other measures of trial validity, including pre-registration, reporting checklists, and evaluation through risk-of-bias-tools [ 40 ]. This aligns with previous recommendations to use established reporting guidelines such as STROBE, TREND [ 12 ], and TIDieR-PHP [ 46 ] for the reporting of natural experiment studies. These reporting guidelines could be customized to specific research areas (for example, as developed for a systematic review of quasi-experimental studies of prenatal alcohol use and birthweight and neurodevelopment [ 47 ]).

We provide a conceptualisation of natural experiment studies as they apply to public health. We argue for the appreciation of natural experiments as a type of study design rather than a set of tools for the analyses of non-randomised interventions. Although there will always remain some ambiguity about the strength of causal claims, there are clear benefits to harnessing NES rather than relying purely on observational studies. This includes the fact that NES can be based on routinely available data and that timely evidence of real-world relevance can be generated. The inclusion of a discussion of the plausibility of as-if randomization of exposure allocation will provide further confidence in the strength of causal claims.

Aligning NES to the Target Trial framework will guard against conceptual stretching of these evaluations and ensure that the causal claims about whether public health interventions ‘work’ are based on evidence that is considered ‘good enough’ to inform public health action within a ‘practice-based evidence’ framework. This framework describes how evaluations can help reducing critical uncertainties and adjust the compass bearing of existing policy (in contrast to the ‘evidence-based practice’ framework in which RCTs are used to generate ‘definitive’ evidence for particular interventions) [ 48 ].

Availability of data and materials

Data sharing is not applicable to this article as no datasets were generated or analysed during the current study.


Randomised Controlled Trial

Natural Experiment

Stable Unit Treatment Value Assumption


Shadish WR, Cook TD, Campbell DT. Experimental and Quasi-Experimental Designs. 2nd ed. Wadsworth, Cengage Learning: Belmont; 2002.

Google Scholar  

King G, Keohane RO, Verba S. The importance of research Design in Political Science. Am Polit Sci Rev. 1995;89:475–81.

Article   Google Scholar  

Meyer BD. Natural and quasi-experiments in economics. J Bus Econ Stat. 1995;13:151–61.

Dunning T. Natural experiments in the social sciences. A design-based approach. 6th edition. Cambridge: Cambridge University Press; 2012.

Book   Google Scholar  

Craig P, Cooper C, Gunnell D, Haw S, Lawson K, Macintyre S, et al. Using natural experiments to evaluate population health interventions: new medical research council guidance. J Epidemiol Community Health. 2012;66:1182–6.

Cook TD, Shadish WR, Wong VC. Three conditions under which experiments and observational studies produce comparable causal estimates: new findings from within-study comparisons. J Policy Anal Manag. 2008;27:724–50.

Bärnighausen T, Røttingen JA, Rockers P, Shemilt I, Tugwell P. Quasi-experimental study designs series—paper 1: introduction: two historical lineages. J Clin Epidemiol. 2017;89:4–11.

Waddington H, Aloe AM, Becker BJ, Djimeu EW, Hombrados JG, Tugwell P, et al. Quasi-experimental study designs series—paper 6: risk of bias assessment. J Clin Epidemiol. 2017;89:43–52.

Saeed S, Moodie EEM, Strumpf EC, Klein MB. Evaluating the impact of health policies: using a difference-in-differences approach. Int J Public Health. 2019;64:637–42.

Dunning T. Improving causal inference: strengths and limitations of natural experiments. Polit Res Q. 2008;61:282–93.

Bärnighausen T, Tugwell P, Røttingen JA, Shemilt I, Rockers P, Geldsetzer P, et al. Quasi-experimental study designs series—paper 4: uses and value. J Clin Epidemiol. 2017;89:21–9.

Craig P, Katikireddi SV, Leyland A, Popham F. Natural experiments: an overview of methods, approaches, and contributions to public health intervention research. Annu Rev Public Health. 2017;38:39–56.

Pearl J, Mackenzie D. The book of why: the new science of cause and effect. London: Allen Lane; 2018.

Rosenbaum PR. How to see more in observational studies: some new quasi-experimental devices. Annu Rev Stat Its Appl. 2015;2:21–48.

Petimar J, Ramirez M, Rifas-Shiman SL, Linakis S, Mullen J, Roberto CA, et al. Evaluation of the impact of calorie labeling on McDonald’s restaurant menus: a natural experiment. Int J Behav Nutr Phys Act. 2019;16. Article no: 99.

Fergusson DM, Horwood LJ, Boden JM, Mulder RT. Impact of a major disaster on the mental health of a well-studied cohort. JAMA Psychiatry. 2014;71:1025–31.

Remler DK, Van Ryzin GG. Natural and quasi experiments. In: Research methods in practice: strategies for description and causation. 2nd ed. Thousand Oaks: SAGE Publication Inc.; 2014. p. 467–500.

Cook PA, Hargreaves SC, Burns EJ, De Vocht F, Parrott S, Coffey M, et al. Communities in charge of alcohol (CICA): a protocol for a stepped-wedge randomised control trial of an alcohol health champions programme. BMC Public Health. 2018;18. Article no: 522.

Lumey LH, Stein AD, Kahn HS, Van der Pal-de Bruin KM, Blauw GJ, Zybert PA, et al. Cohort profile: the Dutch hunger winter families study. Int J Epidemiol. 2007;36:1196–204.

Article   CAS   Google Scholar  

Meng X, Qian N. The Long Term Consequences of Famine on Survivors: Evidence from a Unique Natural Experiment using China’s Great Famine. Natl Bur Econ Res Work Pap Ser. 2011;NBER Worki.

Franco M, Bilal U, Orduñez P, Benet M, Morejón A, Caballero B, et al. Population-wide weight loss and regain in relation to diabetes burden and cardiovascular mortality in Cuba 1980-2010: repeated cross sectional surveys and ecological comparison of secular trends. BMJ. 2013;346:f1515.

Angrist J, Bettinger E, Bloom E, King E, Kremer M. Vouchers for private schooling in Colombia: evidence from a randomized natural experiment. Am Econ Rev. 2002;92:1535–58.

Angrist JD. Lifetime earnings and the Vietnam era draft lottery: evidence from social security administrative records. Am Econ Rev. 1990;80:313–36.

Dawson A, Sim J. The nature and ethics of natural experiments. J Med Ethics. 2015;41:848–53.

Bärnighausen T, Oldenburg C, Tugwell P, Bommer C, Ebert C, Barreto M, et al. Quasi-experimental study designs series—paper 7: assessing the assumptions. J Clin Epidemiol. 2017;89:53-66.

Tugwell P, Knottnerus JA, McGowan J, Tricco A. Big-5 Quasi-Experimental designs. J Clin Epidemiol. 2017;89:1–3.

Reeves BC, Wells GA, Waddington H. Quasi-experimental study designs series—paper 5: a checklist for classifying studies evaluating the effects on health interventions—a taxonomy without labels. J Clin Epidemiol. 2017;89:30–42.

Rubin DB. For objective causal inference, design trumps analysis. Ann Appl Stat. 2008;2:808–40.

Hernán MA, Robins JM. Using big data to emulate a target trial when a randomized trial is not available. Am J Epidemiol. 2016;183:758–64.

Benjamin-Chung J, Arnold BF, Berger D, Luby SP, Miguel E, Colford JM, et al. Spillover effects in epidemiology: parameters, study designs and methodological considerations. Int J Epidemiol. 2018;47:332–47.

Munafò MR, Tilling K, Taylor AE, Evans DM, Smith GD. Collider scope: when selection bias can substantially influence observed associations. Int J Epidemiol. 2018;47:226–35.

Schwartz S, Gatto NM, Campbell UB. Extending the sufficient component cause model to describe the stable unit treatment value assumption (SUTVA). Epidemiol Perspect Innov. 2012;9:3.

Cawley J, Thow AM, Wen K, Frisvold D. The economics of taxes on sugar-sweetened beverages: a review of the effects on prices, sales, cross-border shopping, and consumption. Annu Rev Nutr. 2019;39:317–38.

Reichardt CS. Nonequivalent Group Designs. In: Quasi-Experimentation. A Guide to Design and Analysis. 1st edition. New York: The Guildford Press; 2019. p. 112–162.

Denzin N. Sociological methods: a sourcebook. 5th ed. New York: Routledges; 2006.

Matthay EC, Hagan E, Gottlieb LM, Tan ML, Vlahov D, Adler NE, et al. Alternative causal inference methods in population health research: evaluating tradeoffs and triangulating evidence. SSM - Popul Heal. 2020;10:10052.

Leatherdale ST. Natural experiment methodology for research: a review of how different methods can support real-world research. Int J Soc Res Methodol. 2019;22:19–35.

Reichardt CS. Quasi-experimentation. A guide to design and analysis. 1st ed. New York: The Guildford Press; 2019.

Reeves A, McKee M, Mackenbach J, Whitehead M, Stuckler D. Introduction of a National Minimum Wage Reduced Depressive Symptoms in Low-Wage Workers: A Quasi-Natural Experiment in the UK. Heal Econ (United Kingdom). 2017;26:639–55.

Sterne JA, Hernán MA, Reeves BC, Savović J, Berkman ND, Viswanathan M, et al. ROBINS-I: a tool for assessing risk of bias in non-randomised studies of interventions. BMJ. 2016;355:i4919.

Shadish WR, Cook TD, Campbell DT. Generalized Causal Inference: A Grounded Theory. In: Experimental and Quasi-Experimental Designs for Generalized Causal Inference. 2nd ed. Belmont: Wadsworth, Cengage Learning; 2002. p. 341–73.

Lawlor DA, Tilling K, Smith GD. Triangulation in aetiological epidemiology. Int J Epidemiol. 2016;45:1866–86.

Hernán MA. The C-word: scientific euphemisms do not improve causal inference from observational data. Am J Public Health. 2018;108:616–9.

Guyatt G, Oxman AD, Akl EA, Kunz R, Vist G, Brozek J, et al. GRADE guidelines: 1. Introduction - GRADE evidence profiles and summary of findings tables. J Clin Epidemiol. 2011;64:383–94.

Schünemann HJ, Cuello C, Akl EA, Mustafa RA, Meerpohl JJ, Thayer K, et al. GRADE guidelines: 18. How ROBINS-I and other tools to assess risk of bias in nonrandomized studies should be used to rate the certainty of a body of evidence. J Clin Epidemiol. 2019;111:105–14.

Campbell M, Katikireddi SV, Hoffmann T, Armstrong R, Waters E, Craig P. TIDieR-PHP: a reporting guideline for population health and policy interventions. BMJ. 2018;361:k1079.

Mamluk L, Jones T, Ijaz S, Edwards HB, Savović J, Leach V, et al. Evidence of detrimental effects of prenatal alcohol exposure on offspring birthweight and neurodevelopment from a systematic review of quasi-experimental studies. Int J Epidemiol. 2021;49(6):1972-95.

Ogilvie D, Adams J, Bauman A, Gregg EW, Panter J, Siegel KR, et al. Using natural experimental studies to guide public health action: turning the evidence-based medicine paradigm on its head. J Epidemiol Community Health. 2019;74:203–8.

Download references


This study is funded by the National Institute for Health Research (NIHR) School for Public Health Research (Grant Reference Number PD-SPH-2015). The views expressed are those of the author(s) and not necessarily those of the NIHR or the Department of Health and Social Care. The funder had no input in the writing of the manuscript or decision to submit for publication. The NIHR School for Public Health Research is a partnership between the Universities of Sheffield; Bristol; Cambridge; Imperial; and University College London; The London School for Hygiene and Tropical Medicine (LSHTM); LiLaC – a collaboration between the Universities of Liverpool and Lancaster; and Fuse - The Centre for Translational Research in Public Health a collaboration between Newcastle, Durham, Northumbria, Sunderland and Teesside Universities. FdV is partly funded by National Institute for Health Research Applied Research Collaboration West (NIHR ARC West) at University Hospitals Bristol NHS Foundation Trust. SVK and PC acknowledge funding from the Medical Research Council (MC_UU_12017/13) and Scottish Government Chief Scientist Office (SPHSU13). SVK acknowledges funding from a NRS Senior Clinical Fellowship (SCAF/15/02). KT works in the MRC Integrative Epidemiology Unit, which is supported by the Medical Research Council (MRC) and the University of Bristol [MC_UU_00011/3].

Author information

Authors and affiliations.

Population Health Sciences, Bristol Medical School, University of Bristol, Canynge Hall, 39 Whatley Road, Bristol, BS8 2PS, UK

Frank de Vocht, Cheryl McQuire, Kate Tilling & Matthew Hickman

NIHR School for Public Health Research, Newcastle, UK

Frank de Vocht & Cheryl McQuire

NIHR Applied Research Collaboration West, Bristol, UK

Frank de Vocht

MRC/CSO Social and Public Health Sciences Unit, University of Glasgow, Bristol, UK

Srinivasa Vittal Katikireddi & Peter Craig

MRC IEU, University of Bristol, Bristol, UK

Kate Tilling

You can also search for this author in PubMed   Google Scholar


FdV conceived of the study. FdV, SVK,CMQ,KT,MH, PC interpretated the evidence and theory. FdV wrote the first version of the manuscript. SVK,CMQ,KT,MH, PC provided substantive revisions to subsequent versions. All authors have read and approved the manuscript. FdV, SVK,CMQ,KT,MH, PC agreed to be personally accountable for their own contributions and will ensure that questions related to the accuracy or integrity of any part of the work, even ones in which the author was not personally involved, are appropriately investigated, resolved, and the resolution documented in the literature.

Corresponding author

Correspondence to Frank de Vocht .

Ethics declarations

Ethics approval and consent to participate.

Not applicable.

Consent for publication

Competing interests.

The authors declare that they have no competing interests.

Additional information

Publisher’s note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Additional file 1..

Online Supplementary Material. Table 1 . the Target Trial for Natural Experiments and Reeves et al. [ 28 ]. Alignment of Reeves et al. (Introduction of a National Minimum Wage Reduced Depressive Symptoms in Low-Wage Workers: A Quasi-Natural Experiment in the UK. Heal Econ. 2017;26:639–55) to the Target Trial framework.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ . The Creative Commons Public Domain Dedication waiver ( http://creativecommons.org/publicdomain/zero/1.0/ ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Cite this article.

de Vocht, F., Katikireddi, S.V., McQuire, C. et al. Conceptualising natural and quasi experiments in public health. BMC Med Res Methodol 21 , 32 (2021). https://doi.org/10.1186/s12874-021-01224-x

Download citation

Received : 14 July 2020

Accepted : 28 January 2021

Published : 11 February 2021

DOI : https://doi.org/10.1186/s12874-021-01224-x

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Public health
  • Public health policy
  • Natural experiments
  • Quasi experiments
  • Evaluations

BMC Medical Research Methodology

ISSN: 1471-2288

quasi experimental longitudinal study

  •   OpenBU
  • Theses & Dissertations
  • Boston University Theses & Dissertations

Cultivating classroom curiosity: a quasi-experimental, longitudinal, study investigating the impact of the question formulation technique on adolescent intellectual curiosity


Date Issued

Share to Facebook

Export Citation

Permanent link, collections.

  • Boston University Theses & Dissertations [9515]

Show Statistical Information

Deposit Materials

U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings

The PMC website is updating on October 15, 2024. Learn More or Try it out now .

  • Advanced Search
  • Journal List
  • HHS Author Manuscripts

Logo of nihpa

Experimental and Quasi-Experimental Designs in Implementation Research

Christopher j. miller.

a VA Boston Healthcare System, Center for Healthcare Organization and Implementation Research (CHOIR), United States Department of Veterans Affairs, Boston, MA, USA

b Department of Psychiatry, Harvard Medical School, Boston, MA, USA

Shawna N. Smith

c Department of Psychiatry, University of Michigan Medical School, Ann Arbor, MI, USA

d Survey Research Center, Institute for Social Research, University of Michigan, Ann Arbor, MI, USA

Marianne Pugatch

Implementation science is focused on maximizing the adoption, appropriate use, and sustainability of effective clinical practices in real world clinical settings. Many implementation science questions can be feasibly answered by fully experimental designs, typically in the form of randomized controlled trials (RCTs). Implementation-focused RCTs, however, usually differ from traditional efficacy- or effectiveness-oriented RCTs on key parameters. Other implementation science questions are more suited to quasi-experimental designs, which are intended to estimate the effect of an intervention in the absence of randomization. These designs include pre-post designs with a non-equivalent control group, interrupted time series (ITS), and stepped wedges, the last of which require all participants to receive the intervention, but in a staggered fashion. In this article we review the use of experimental designs in implementation science, including recent methodological advances for implementation studies. We also review the use of quasi-experimental designs in implementation science, and discuss the strengths and weaknesses of these approaches. This article is therefore meant to be a practical guide for researchers who are interested in selecting the most appropriate study design to answer relevant implementation science questions, and thereby increase the rate at which effective clinical practices are adopted, spread, and sustained.

1. Background

The first documented clinical trial was conducted in 1747 by James Lind, a royal navy physician, who tested the hypothesis that citrus fruit could cure scurvy. Since then, based on foundational work by Fisher and others (1935), the randomized controlled trial (RCT) has emerged as the gold standard for testing the efficacy of treatment versus a control condition for individual patients. Randomization of patients is seen as a crucial to reducing the impact of measured or unmeasured confounding variables, in turn allowing researchers to draw conclusions regarding causality in clinical trials.

As described elsewhere in this special issue, implementation science is ultimately focused on maximizing the adoption, appropriate use, and sustainability of effective clinical practices in real world clinical settings. As such, some implementation science questions may be addressed by experimental designs. For our purposes here, we use the term “experimental” to refer to designs that feature two essential ingredients: first, manipulation of an independent variable; and second, random assignment of subjects. This corresponds to the definition of randomized experiments originally championed by Fisher (1925) . From this perspective, experimental designs usually take the form of RCTs—but implementation- oriented RCTs typically differ in important ways from traditional efficacy- or effectiveness-oriented RCTs. Other implementation science questions require different methodologies entirely: specifically, several forms of quasi-experimental designs may be used for implementation research in situations where an RCT would be inappropriate. These designs are intended to estimate the effect of an intervention despite a lack of randomization. Quasi-experimental designs include pre-post designs with a nonequivalent control group, interrupted time series (ITS), and stepped wedge designs. Stepped wedges are studies in which all participants receive the intervention, but in a staggered fashion. It is important to note that quasi-experimental designs are not unique to implementation science. As we will discuss below, however, each of them has strengths that make them particularly useful in certain implementation science contexts.

Our goal for this manuscript is two-fold. First, we will summarize the use of experimental designs in implementation science. This will include discussion of ways that implementation-focused RCTs may differ from efficacy- or effectiveness-oriented RCTs. Second, we will summarize the use of quasi-experimental designs in implementation research. This will include discussion of the strengths and weaknesses of these types of approaches in answering implementation research questions. For both experimental and quasi-experimental designs, we will discuss a recent implementation study as an illustrative example of one approach.

1. Experimental Designs in Implementation Science

RCTs in implementation science share the same basic structure as efficacy- or effectiveness-oriented RCTs, but typically feature important distinctions. In this section we will start by reviewing key factors that separate implementation RCTs from more traditional efficacy- or effectiveness-oriented RCTs. We will then discuss optimization trials, which are a type of experimental design that is especially useful for certain implementation science questions. We will then briefly turn our attention to single subject experimental designs (SSEDs) and on-off-on (ABA) designs.

The first common difference that sets apart implementation RCTs from more traditional clinical trials is the primary research question they aim to address. For most implementation trials, the primary research question is not the extent to which a particular treatment or evidence-based practice is more effective than a comparison condition, but instead the extent to which a given implementation strategy is more effective than a comparison condition. For more detail on this pivotal issue, see Drs. Bauer and Kirchner in this special issue.

Second, as a corollary of this point, implementation RCTs typically feature different outcome measures than efficacy or effectiveness RCTs, with an emphasis on the extent to which a health intervention was successfully implemented rather than an evaluation of the health effects of that intervention ( Proctor et al., 2011 ). For example, typical implementation outcomes might include the number of patients who receive the intervention, or the number of providers who administer the intervention as intended. A variety of evaluation-oriented implementation frameworks may guide the choices of such measures (e.g. RE-AIM; Gaglio et al., 2013 ; Glasgow et al., 1999 ). Hybrid implementation-effectiveness studies attend to both effectiveness and implementation outcomes ( Curran et al., 2012 ); these designs are also covered in more detail elsewhere in this issue (Landes, this issue).

Third, given their focus, implementation RCTs are frequently cluster-randomized (i.e. with sites or clinics as the unit of randomization, and patients nested within those sites or clinics). For example, consider a hypothetical RCT that aims to evaluate the implementation of a training program for cognitive behavioral therapy (CBT) in community clinics. Randomizing at the patient level for such a trial would be inappropriate due to the risk of contamination, as providers trained in CBT might reasonably be expected to incorporate CBT principles into their treatment even to patients assigned to the control condition. Randomizing at the provider level would also risk contamination, as providers trained in CBT might discuss this treatment approach with their colleagues. Thus, many implementation trials are cluster randomized at the site or clinic level. While such clustering minimizes the risk of contamination, it can unfortunately create commensurate problems with confounding, especially for trials with very few sites to randomize. Stratification may be used to at least partially address confounding issues in cluster- randomized and more traditional trials alike, by ensuring that intervention and control groups are broadly similar on certain key variables. Furthermore, such allocation schemes typically require analytic models that account for this clustering and the resulting correlations among error structures (e.g., generalized estimating equations [GEE] or mixed-effects models; Schildcrout et al., 2018 ).

1.1. Optimization trials

Key research questions in implementation science often involve determining which implementation strategies to provide, to whom, and when, to achieve optimal implementation success. As such, trials designed to evaluate comparative effectiveness, or to optimize provision of different types or intensities of implementation strategies, may be more appealing than traditional effectiveness trials. The methods described in this section are not unique to implementation science, but their application in the context of implementation trials may be particularly useful for informing implementation strategies.

While two-arm RCTs can be used to evaluate comparative effectiveness, trials focused on optimizing implementation support may use alternative experimental designs ( Collins et al., 2005 ; Collins et al., 2007 ). For example, in certain clinical contexts, multi-component “bundles” of implementation strategies may be warranted (e.g. a bundle consisting of clinician training, technical assistance, and audit/feedback to encourage clinicians to use a new evidence-based practice). In these situations, implementation researchers might consider using factorial or fractional-factorial designs. In the context of implementation science, these designs randomize participants (e.g. sites or providers) to different combinations of implementation strategies, and can be used to evaluate the effectiveness of each strategy individually to inform an optimal combination (e.g. Coulton et al., 2009 ; Pellegrini et al., 2014 ; Wyrick, et al., 2014 ). Such designs can be particularly useful in informing multi-component implementation strategies that are not redundant or overly burdensome ( Collins et al., 2014a ; Collins et al., 2009 ; Collins et al., 2007 ).

Researchers interested in optimizing sequences of implementation strategies that adapt to ongoing needs over time may be interested in a variant of factorial designs known as the sequential, multiple-assignment randomized trial (SMART; Almirall et al., 2012 ; Collins et al., 2014b ; Kilbourne et al., 2014b ; Lei et al., 2012 ; Nahum-Shani et al., 2012 ; NeCamp et al., 2017 ). SMARTs are multistage randomized trials in which some or all participants are randomized more than once, often based on ongoing information (e.g., treatment response). In implementation research, SMARTs can inform optimal sequences of implementation strategies to maximize downstream clinical outcomes. Thus, such designs are well-suited to answering questions about what implementation strategies should be used, in what order, to achieve the best outcomes in a given context.

One example of an implementation SMART is the Adaptive Implementation of Effective Program Trial (ADEPT; Kilbourne et al., 2014a ). ADEPT was a clustered SMART ( NeCamp et al., 2017 ) designed to inform an adaptive sequence of implementation strategies for implementing an evidence-based collaborative chronic care model, Life Goals ( Kilbourne et al., 2014c ; Kilbourne et al., 2012a ), into community-based practices. Life Goals, the clinical intervention being implemented, has proven effective at improving physical and mental health outcomes for patients with unipolar and bipolar depression by encouraging providers to instruct patients in self-management, and improving clinical information systems and care management across physical and mental health providers ( Bauer et al., 2006 ; Kilbourne et al., 2012a ; Kilbourne et al., 2008 ; Simon et al., 2006 ). However, in spite of its established clinical effectiveness, community-based clinics experienced a number of barriers in trying to implement the Life Goals model, and there were questions about how best to efficiently and effectively augment implementation strategies for clinics that struggled with implementation.

The ADEPT study was thus designed to determine the best sequence of implementation strategies to offer sites interested in implementing Life Goals. The ADEPT study involved use of three different implementation strategies. First, all sites received implementation support based on Replicating Effective Programs (REP), which offered an implementation manual, brief training, and low- level technical support ( Kilbourne et al., 2007 ; Kilbourne et al., 2012b ; Neumann and Sogolow, 2000 ). REP implementation support had been previously found to be low-cost and readily scalable, but also insufficient for uptake for many community-based settings ( Kilbourne et al., 2015 ). For sites that failed to implement Life Goals under REP, two additional implementation strategies were considered as augmentations to REP: External Facilitation (EF; Kilbourne et al., 2014b ; Stetler et al., 2006 ), consisting of phone-based mentoring in strategic skills from a study team member; and Internal Facilitation (IF; Kirchner et al., 2014 ), which supported protected time for a site employee to address barriers to program adoption.

The ADEPT study was designed to evaluate the best way to augment support for these sites that were not able to implement Life Goals under REP, specifically querying whether it was better to augment REP with EF only or the more intensive EF/IF, and whether augmentations should be provided all at once, or staged. Intervention assignments are mapped in Figure 1 . Seventy-nine community-based clinics across Michigan and Colorado were provided with initial implementation support under REP. After six months, implementation of the clinical intervention, Life Goals, was evaluated at all sites. Sites that had failed to reach an adequate level of delivery (defined as those sites enrolling fewer than ten patients in Life Goals, or those at which fewer than 50% of enrolled patients had received at least three Life Goals sessions) were considered non-responsive to REP and randomized to receive additional support through either EF or combined EF/IF. After six further months, Life Goals implementation at these sites was again evaluated. Sites surpassing the implementation response benchmark had their EF or EF/IF support discontinued. EF/IF sites that remained non-responsive continued to receive EF/IF for an additional six months. EF sites that remained non-responsive were randomized a second time to either continue with EF or further augment with IF. This design thus allowed for comparison of three different adaptive implementation interventions for sites that were initially non-responsive to REP to determine the best adaptive sequence of implementation support for sites that were initially non-responsive under REP:

An external file that holds a picture, illustration, etc.
Object name is nihms-1533574-f0001.jpg

SMART design from ADEPT trial.

  • Provide EF for 6 months; continue EF for a further six months for sites that remain nonresponsive; discontinue EF for sites that are responsive;
  • Provide EF/IF for 6 months; continue EF/IF for a further six months for sites that remain non-responsive; discontinue EF/IF for sites that are responsive; and
  • Provide EF for 6 months; step up to EF/IF for a further six months for sites that remain non-responsive; discontinue EF for sites that are responsive.

While analyses of this study are still ongoing, including the comparison of these three adaptive sequences of implementation strategies, results have shown that patients at sites that were randomized to receive EF as the initial augmentation to REP saw more improvement in clinical outcomes (SF-12 mental health quality of life and PHQ-9 depression scores) after 12 months than patients at sites that were randomized to receive the more intensive EF/IF augmentation.

1.2. Single Subject Experimental Designs and On-Off-On (ABA) Designs

We also note that there are a variety of Single Subject Experimental Designs (SSEDs; Byiers et al., 2012 ), including withdrawal designs and alternating treatment designs, that can be used in testing evidence-based practices. Similarly, an implementation strategy may be used to encourage the use of a specific treatment at a particular site, followed by that strategy’s withdrawal and subsequent reinstatement, with data collection throughout the process (on-off-on or ABA design). A weakness of these approaches in the context of implementation science, however, is that they usually require reversibility of the intervention (i.e. that the withdrawal of implementation support truly allows the healthcare system to revert to its pre-implementation state). When this is not the case—for example, if a hypothetical study is focused on training to encourage use of an evidence-based psychotherapy—then these designs may be less useful.

2. Quasi-Experimental Designs in Implementation Science

In some implementation science contexts, policy-makers or administrators may not be willing to have a subset of participating patients or sites randomized to a control condition, especially for high-profile or high-urgency clinical issues. Quasi-experimental designs allow implementation scientists to conduct rigorous studies in these contexts, albeit with certain limitations. We briefly review the characteristics of these designs here; other recent review articles are available for the interested reader (e.g. Handley et al., 2018 ).

2.1. Pre-Post with Non-Equivalent Control Group

The pre-post with non-equivalent control group uses a control group in the absence of randomization. Ideally, the control group is chosen to be as similar to the intervention group as possible (e.g. by matching on factors such as clinic type, patient population, geographic region, etc.). Theoretically, both groups are exposed to the same trends in the environment, making it plausible to decipher if the intervention had an effect. Measurement of both treatment and control conditions classically occurs pre- and post-intervention, with differential improvement between the groups attributed to the intervention. This design is popular due to its practicality, especially if data collection points can be kept to a minimum. It may be especially useful for capitalizing on naturally occurring experiments such as may occur in the context of certain policy initiatives or rollouts—specifically, rollouts in which it is plausible that a control group can be identified. For example, Kirchner and colleagues (2014) used this type of design to evaluate the integration of mental health services into primary care clinics at seven US Department of Veterans Affairs (VA) medical centers and seven matched controls.

One overarching drawback of this design is that it is especially vulnerable to threats to internal validity ( Shadish, 2002 ), because pre-existing differences between the treatment and control group could erroneously be attributed to the intervention. While unmeasured differences between treatment and control groups are always a possibility in healthcare research, such differences are especially likely to occur in the context of these designs due to the lack of randomization. Similarly, this design is particularly sensitive to secular trends that may differentially affect the treatment and control groups ( Cousins et al., 2014 ; Pape et al., 2013 ), as well as regression to the mean confounding study results ( Morton and Torgerson, 2003 ). For example, if a study site is selected for the experimental condition precisely because it is underperforming in some way, then regression to the mean would suggest that the site will show improvement regardless of any intervention; in the context of a pre-post with non-equivalent control group study, however, this improvement would erroneously be attributed to the intervention itself (Type I error).

There are, however, various ways that implementation scientists can mitigate these weaknesses. First, as mentioned briefly above, it is important to select a control group that is as similar as possible to the intervention site(s), which can include matching at both the health care network and clinic level (e.g. Kirchner et al., 2014 ). Second, propensity score weighting (e.g. Morgan, 2018 ) can statistically mitigate internal validity concerns, although this approach may be of limited utility when comparing secular trends between different study cohorts ( Dimick and Ryan, 2014 ). More broadly, qualitative methods (e.g. periodic interviews with staff at intervention and control sites) can help uncover key contextual factors that may be affecting study results above and beyond the intervention itself.

2.2. Interrupted Time Series

Interrupted time series (ITS; Shadish, 2002 ; Taljaard et al., 2014 ; Wagner et al., 2002 ) designs represent one of the most robust categories of quasi-experimental designs. Rather than relying on a non-equivalent control group, ITS designs rely on repeated data collections from intervention sites to determine whether a particular intervention is associated with improvement on a given metric relative to the pre-intervention secular trend. They are particularly useful in cases where a comparable control group cannot be identified—for example, following widespread implementation of policy mandates, quality improvement initiatives, or dissemination campaigns ( Eccles et al., 2003 ). In ITS designs, data are collected at multiple time points both before and after an intervention (e.g., policy change, implementation effort), and analyses explore whether the intervention was associated with the outcome beyond any pre-existing secular trend. More formally, ITS evaluations focus on identifying whether there is discontinuity in the trend (change in slope or level) after the intervention relative to before the intervention, using segmented regression to model pre- and post-intervention trends ( Gebski et al., 2012 ; Penfold and Zhang, 2013 ; Taljaard et al., 2014 ; Wagner et al., 2002 ). A number of recent implementation studies have used ITS designs, including an evaluation of implementation of a comprehensive smoke-free policy in a large UK mental health organization to reduce physical assaults ( Robson et al., 2017 ); the impact of a national policy limiting alcohol availability on suicide mortality in Slovenia ( Pridemore and Snowden, 2009 ); and the effect of delivery of a tailored intervention for primary care providers to increase psychological referrals for women with mild to moderate postnatal depression ( Hanbury et al., 2013 ).

ITS designs are appealing in implementation work for several reasons. Relative to uncontrolled pre-post analyses, ITS analyses reduce the chances that intervention effects are confounded by secular trends ( Bernal et al., 2017 ; Eccles et al., 2003 ). Time-varying confounders, such as seasonality, can also be adjusted for, provided adequate data ( Bernal et al., 2017 ). Indeed, recent work has confirmed that ITS designs can yield effect estimates similar to those derived from cluster-randomized RCTs ( Fretheim et al., 2013 ; Fretheim et al., 2015 ). Relative to an RCT, ITS designs can also allow for a more comprehensive assessment of the longitudinal effects of an intervention (positive or negative), as effects can be traced over all included time points ( Bernal et al., 2017 ; Penfold and Zhang, 2013 ).

ITS designs also present a number of challenges. First, the segmented regression approach requires clear delineation between pre- and post-intervention periods; interventions with indeterminate implementation periods are likely not good candidates for ITS. While ITS designs that include multiple ‘interruptions’ (e.g. introductions of new treatment components) are possible, they will require collection of enough time points between interruptions to ensure that each intervention’s effects can be ascertained individually ( Bernal et al., 2017 ). Second, collecting data from sufficient time points across all sites of interest, especially for the pre-intervention period, can be challenging ( Eccles et al., 2003 ): a common recommendation is at least eight time points both pre- and post-intervention ( Penfold and Zhang, 2013 ). This may be onerous, particularly if the data are not routinely collected by the health system(s) under study. Third, ITS cannot protect against confounding effects from other interventions that begin contemporaneously and may impact similar outcomes ( Eccles et al., 2003 ).

2.3. Stepped Wedge Designs

Stepped wedge trials are another type of quasi-experimental design. In a stepped wedge, all participants receive the intervention, but are assigned to the timing of the intervention in a staggered fashion ( Betran et al., 2018 ; Brown and Lilford, 2006 ; Hussey and Hughes, 2007 ), typically at the site or cluster level. Stepped wedge designs have their analytic roots in balanced incomplete block designs, in which all pairs of treatments occur an equal number of times within each block ( Hanani, 1961 ). Traditionally, all sites in stepped wedge trials have outcome measures assessed at all time points, thus allowing sites that receive the intervention later in the trial to essentially serve as controls for early intervention sites. A recent special issue of the journal Trials includes more detail on these designs ( Davey et al., 2015 ), which may be ideal for situations in which it is important for all participating patients or sites to receive the intervention during the trial. Stepped wedge trials may also be useful when resources are scarce enough that intervening at all sites at once (or even half of the sites as in a standard treatment-versus-control RCT) would not be feasible. If desired, the administration of the intervention to sites in waves allows for lessons learned in early sites to be applied to later sites (via formative evaluation; see Elwy et al., this issue).

The Behavioral Health Interdisciplinary Program (BHIP) Enhancement Project is a recent example of a stepped-wedge implementation trial ( Bauer et al., 2016 ; Bauer et al., 2019 ). This study involved using blended facilitation (including internal and external facilitators; Kirchner et al., 2014 ) to implement care practices consistent with the collaborative chronic care model (CCM; Bodenheimer et al., 2002a , b ; Wagner et al., 1996 ) in nine outpatient mental health teams in VA medical centers. Figure 2 illustrates the implementation and stepdown periods for that trial, with black dots representing primary data collection points.

An external file that holds a picture, illustration, etc.
Object name is nihms-1533574-f0002.jpg

BHIP Enhancement Project stepped wedge (adapted form Bauer et al., 2019).

The BHIP Enhancement Project was conducted as a stepped wedge for several reasons. First, the stepped wedge design allowed the trial to reach nine sites despite limited implementation resources (i.e. intervening at all nine sites simultaneously would not have been feasible given study funding). Second, the stepped wedge design aided in recruitment and retention, as all participating sites were certain to receive implementation support during the trial: at worst, sites that were randomized to later- phase implementation had to endure waiting periods totaling about eight months before implementation began. This was seen as a major strength of the design by its operational partner, the VA Office of Mental Health and Suicide Prevention. To keep sites engaged during the waiting period, the BHIP Enhancement Project offered a guiding workbook and monthly technical support conference calls.

Three additional features of the BHIP Enhancement Project deserve special attention. First, data collection for late-implementing sites did not begin until immediately before the onset of implementation support (see Figure 2 ). While this reduced statistical power, it also significantly reduced data collection burden on the study team. Second, onset of implementation support was staggered such that wave 2 began at the end of month 4 rather than month 6. This had two benefits: first, this compressed the overall amount of time required for implementation during the trial. Second, it meant that the study team only had to collect data from one site at a time, with data collection periods coming every 2–4 months. More traditional stepped wedge approaches typically have data collection across sites temporally aligned (e.g. Betran et al., 2018 ). Third, the BHIP Enhancement Project used a balancing algorithm ( Lew et al., 2019 ) to assign sites to waves, retaining some of the benefits of randomization while ensuring balance on key site characteristics (e.g. size, geographic region).

Despite their utility, stepped wedges have some important limitations. First, because they feature delayed implementation at some sites, stepped wedges typically take longer than similarly-sized parallel group RCTs. This increases the chances that secular trends, policy changes, or other external forces impact study results. Second, as with RCTs, imbalanced site assignment can confound results. This may occur deliberately in some cases—for example, if sites that develop their implementation plans first are assigned to earlier waves. Even if sites are randomized, however, early and late wave sites may still differ on important characteristics such as size, rurality, and case mix. The resulting confounding between site assignment and time can threaten the internal validity of the study—although, as above, balancing algorithms can reduce this risk. Third, the use of formative evaluation (Elwy, this issue), while useful for maximizing the utility of implementation efforts in a stepped wedge, can mean that late-wave sites receive different implementation strategies than early-wave sites. Similarly, formative evaluation may inform midstream adaptations to the clinical innovation being implemented. In either case, these changes may again threaten internal validity. Overall, then, stepped wedges represent useful tools for evaluating the impact of health interventions that (as with all designs) are subject to certain weaknesses and limitations.

3. Conclusions and Future Directions

Implementation science is focused on maximizing the extent to which effective healthcare practices are adopted, used, and sustained by clinicians, hospitals, and systems. Answering questions in these domains frequently requires different research methods than those employed in traditional efficacy- or effectiveness-oriented randomized clinical trials (RCTs). Implementation-oriented RCTs typically feature cluster or site-level randomization, and emphasize implementation outcomes (e.g. the number of patients receiving the new treatment as intended) rather than traditional clinical outcomes. Hybrid implementation-effectiveness designs incorporate both types of outcomes; more details on these approaches can be found elsewhere in this special issue (Landes, this issue). Other methodological innovations, such as factorial designs or sequential, multiple-assignment randomized trials (SMARTs), can address questions about multi-component or adaptive interventions, still under the umbrella of experimental designs. These types of trials may be especially important for demystifying the “black box” of implementation—that is, determining what components of an implementation strategy are most strongly associated with implementation success. In contrast, pre-post designs with non-equivalent control groups, interrupted time series (ITS), and stepped wedge designs are all examples of quasiexperimental designs that may serve implementation researchers when experimental designs would be inappropriate. A major theme cutting across each of these designs is that there are relative strengths and weaknesses associated with any study design decision. Determining what design to use ultimately will need to be informed by the primary research question to be answered, while simultaneously balancing the need for internal validity, external validity, feasibility, and ethics.

New innovations in study design are constantly being developed and refined. Several such innovations are covered in other articles within this special issue (e.g. Kim et al., this issue). One future direction relevant to the study designs presented in this article is the potential for adaptive trial designs, which allow information gleaned during the trial to inform the adaptation of components like treatment allocation, sample size, or study recruitment in the later phases of the same trial ( Pallmann et al., 2018 ). These designs are becoming increasingly popular in clinical treatment ( Bhatt and Mehta, 2016 ) but could also hold promise for implementation scientists, especially as interest grows in rapid-cycle testing of implementation strategies or efforts. Adaptive designs could potentially be incorporated into both SMART designs and stepped wedge studies, as well as traditional RCTs to further advance implementation science ( Cheung et al., 2015 ). Ideally, these and other innovations will provide researchers with increasingly robust and useful methodologies for answering timely implementation science questions.

  • Many implementation science questions can be addressed by fully experimental designs (e.g. randomized controlled trials [RCTs]).
  • Implementation trials differ in important ways, however, from more traditional efficacy- or effectiveness-oriented RCTs.
  • Adaptive designs represent a recent innovation to determine optimal implementation strategies within a fully experimental framework.
  • Quasi-experimental designs can be used to answer implementation science questions in the absence of randomization.
  • The choice of study designs in implementation science requires careful consideration of scientific, pragmatic, and ethical issues.


This work was supported by Department of Veterans Affairs grants QUE 15–289 (PI: Bauer) and CIN 13403 and National Institutes of Health grant RO1 MH 099898 (PI: Kilbourne).

Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

  • Almirall D, Compton SN, Gunlicks-Stoessel M, Duan N, Murphy SA, 2012. Designing a pilot sequential multiple assignment randomized trial for developing an adaptive treatment strategy . Stat Med 31 ( 17 ), 1887–1902. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Bauer MS, McBride L, Williford WO, Glick H, Kinosian B, Altshuler L, Beresford T, Kilbourne AM, Sajatovic M, Cooperative Studies Program 430 Study, T., 2006. Collaborative care for bipolar disorder: Part II. Impact on clinical outcome, function, and costs . Psychiatr Serv 57 ( 7 ), 937–945. [ PubMed ] [ Google Scholar ]
  • Bauer MS, Miller C, Kim B, Lew R, Weaver K, Coldwell C, Henderson K, Holmes S, Seibert MN, Stolzmann K, Elwy AR, Kirchner J, 2016. Partnering with health system operations leadership to develop a controlled implementation trial . Implement Sci 11 , 22. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Bauer MS, Miller CJ, Kim B, Lew R, Stolzmann K, Sullivan J, Riendeau R, Pitcock J, Williamson A, Connolly S, Elwy AR, Weaver K, 2019. Effectiveness of Implementing a Collaborative Chronic Care Model for Clinician Teams on Patient Outcomes and Health Status in Mental Health: A Randomized Clinical Trial . JAMA Netw Open 2 ( 3 ), e190230. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Bernal JL, Cummins S, Gasparrini A, 2017. Interrupted time series regression for the evaluation of public health interventions: a tutorial . Int J Epidemiol 46 ( 1 ), 348–355. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Betran AP, Bergel E, Griffin S, Melo A, Nguyen MH, Carbonell A, Mondlane S, Merialdi M, Temmerman M, Gulmezoglu AM, 2018. Provision of medical supply kits to improve quality of antenatal care in Mozambique: a stepped-wedge cluster randomised trial . Lancet Glob Health 6 ( 1 ), e57–e65. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Bhatt DL, Mehta C, 2016. Adaptive Designs for Clinical Trials . N Engl J Med 375 ( 1 ), 65–74. [ PubMed ] [ Google Scholar ]
  • Bodenheimer T, Wagner EH, Grumbach K, 2002a. Improving primary care for patients with chronic illness . JAMA 288 ( 14 ), 1775–1779. [ PubMed ] [ Google Scholar ]
  • Bodenheimer T, Wagner EH, Grumbach K, 2002b. Improving primary care for patients with chronic illness: the chronic care model, Part 2 . JAMA 288 ( 15 ), 1909–1914. [ PubMed ] [ Google Scholar ]
  • Brown CA, Lilford RJ, 2006. The stepped wedge trial design: a systematic review . BMC medical research methodology 6 ( 1 ), 54. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Byiers BJ, Reichle J, Symons FJ, 2012. Single-subject experimental design for evidence-based practice . Am J Speech Lang Pathol 21 ( 4 ), 397–414. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Cheung YK, Chakraborty B, Davidson KW, 2015. Sequential multiple assignment randomized trial (SMART) with adaptive randomization for quality improvement in depression treatment program . Biometrics 71 ( 2 ), 450–459. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Collins LM, Dziak JJ, Kugler KC, Trail JB, 2014a. Factorial experiments: efficient tools for evaluation of intervention components . Am J Prev Med 47 ( 4 ), 498–504. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Collins LM, Dziak JJ, Li R, 2009. Design of experiments with multiple independent variables: a resource management perspective on complete and reduced factorial designs . Psychol Methods 14 ( 3 ), 202–224. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Collins LM, Murphy SA, Bierman KL, 2004. A conceptual framework for adaptive preventive interventions . Prev Sci 5 ( 3 ), 185–196. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Collins LM, Murphy SA, Nair VN, Strecher VJ, 2005. A strategy for optimizing and evaluating behavioral interventions . Ann Behav Med 30 ( 1 ), 65–73. [ PubMed ] [ Google Scholar ]
  • Collins LM, Murphy SA, Strecher V, 2007. The multiphase optimization strategy (MOST) and the sequential multiple assignment randomized trial (SMART): new methods for more potent eHealth interventions . Am J Prev Med 32 ( 5 Suppl ), S112–118. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Collins LM, Nahum-Shani I, Almirall D, 2014b. Optimization of behavioral dynamic treatment regimens based on the sequential, multiple assignment, randomized trial (SMART) . Clin Trials 11 ( 4 ), 426–434. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Coulton S, Perryman K, Bland M, Cassidy P, Crawford M, Deluca P, Drummond C, Gilvarry E, Godfrey C, Heather N, Kaner E, Myles J, Newbury-Birch D, Oyefeso A, Parrott S, Phillips T, Shenker D, Shepherd J, 2009. Screening and brief interventions for hazardous alcohol use in accident and emergency departments: a randomised controlled trial protocol . BMC Health Serv Res 9 , 114. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Cousins K, Connor JL, Kypri K, 2014. Effects of the Campus Watch intervention on alcohol consumption and related harm in a university population . Drug Alcohol Depend 143 , 120–126. [ PubMed ] [ Google Scholar ]
  • Curran GM, Bauer M, Mittman B, Pyne JM, Stetler C, 2012. Effectiveness-implementation hybrid designs: combining elements of clinical effectiveness and implementation research to enhance public health impact . Med Care 50 ( 3 ), 217–226. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Davey C, Hargreaves J, Thompson JA, Copas AJ, Beard E, Lewis JJ, Fielding KL, 2015. Analysis and reporting of stepped wedge randomised controlled trials: synthesis and critical appraisal of published studies, 2010 to 2014 . Trials 16 ( 1 ), 358. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Dimick JB, Ryan AM, 2014. Methods for evaluating changes in health care policy: the difference-in- differences approach . JAMA 312 ( 22 ), 2401–2402. [ PubMed ] [ Google Scholar ]
  • Eccles M, Grimshaw J, Campbell M, Ramsay C, 2003. Research designs for studies evaluating the effectiveness of change and improvement strategies . Qual Saf Health Care 12 ( 1 ), 47–52. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Fisher RA, 1925, July Theory of statistical estimation In Mathematical Proceedings of the Cambridge Philosophical Society (Vol. 22, No. 5, pp. 700–725). Cambridge University Press. [ Google Scholar ]
  • Fisher RA, 1935. The design of experiments . Oliver and Boyd, Edinburgh. [ Google Scholar ]
  • Fretheim A, Soumerai SB, Zhang F, Oxman AD, Ross-Degnan D, 2013. Interrupted time-series analysis yielded an effect estimate concordant with the cluster-randomized controlled trial result . Journal of Clinical Epidemiology 66 ( 8 ), 883–887. [ PubMed ] [ Google Scholar ]
  • Fretheim A, Zhang F, Ross-Degnan D, Oxman AD, Cheyne H, Foy R, Goodacre S, Herrin J, Kerse N, McKinlay RJ, Wright A, Soumerai SB, 2015. A reanalysis of cluster randomized trials showed interrupted time-series studies were valuable in health system evaluation . J Clin Epidemiol 68 ( 3 ), 324–333. [ PubMed ] [ Google Scholar ]
  • Gaglio B, Shoup JA, Glasgow RE, 2013. The RE-AIM framework: a systematic review of use over time . Am J Public Health 103 ( 6 ), e38–46. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Gebski V, Ellingson K, Edwards J, Jernigan J, Kleinbaum D, 2012. Modelling interrupted time series to evaluate prevention and control of infection in healthcare . Epidemiol Infect 140 ( 12 ), 2131–2141. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Glasgow RE, Vogt TM, Boles SM, 1999. Evaluating the public health impact of health promotion interventions: the RE-AIM framework . Am J Public Health 89 ( 9 ), 1322–1327. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Hanani H, 1961. The existence and construction of balanced incomplete block designs . The Annals of Mathematical Statistics 32 ( 2 ), 361–386. [ Google Scholar ]
  • Hanbury A, Farley K, Thompson C, Wilson PM, Chambers D, Holmes H, 2013. Immediate versus sustained effects: interrupted time series analysis of a tailored intervention . Implement Sci 8 , 130. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Handley MA, Lyles CR, McCulloch C, Cattamanchi A, 2018. Selecting and Improving Quasi-Experimental Designs in Effectiveness and Implementation Research . Annu Rev Public Health 39 , 5–25. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Hussey MA, Hughes JP, 2007. Design and analysis of stepped wedge cluster randomized trials . Contemp Clin Trials 28 ( 2 ), 182–191. [ PubMed ] [ Google Scholar ]
  • Kilbourne AM, Almirall D, Eisenberg D, Waxmonsky J, Goodrich DE, Fortney JC, Kirchner JE, Solberg LI, Main D, Bauer MS, Kyle J, Murphy SA, Nord KM, Thomas MR, 2014a. Protocol: Adaptive Implementation of Effective Programs Trial (ADEPT): cluster randomized SMART trial comparing a standard versus enhanced implementation strategy to improve outcomes of a mood disorders program . Implement Sci 9 , 132. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Kilbourne AM, Almirall D, Goodrich DE, Lai Z, Abraham KM, Nord KM, Bowersox NW, 2014b. Enhancing outreach for persons with serious mental illness: 12-month results from a cluster randomized trial of an adaptive implementation strategy . Implement Sci 9 , 163. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Kilbourne AM, Bramlet M, Barbaresso MM, Nord KM, Goodrich DE, Lai Z, Post EP, Almirall D, Verchinina L, Duffy SA, Bauer MS, 2014c. SMI life goals: description of a randomized trial of a collaborative care model to improve outcomes for persons with serious mental illness . Contemp Clin Trials 39 ( 1 ), 74–85. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Kilbourne AM, Goodrich DE, Lai Z, Clogston J, Waxmonsky J, Bauer MS, 2012a. Life Goals Collaborative Care for patients with bipolar disorder and cardiovascular disease risk . Psychiatr Serv 63 ( 12 ), 1234–1238. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Kilbourne AM, Goodrich DE, Nord KM, Van Poppelen C, Kyle J, Bauer MS, Waxmonsky JA, Lai Z, Kim HM, Eisenberg D, Thomas MR, 2015. Long-Term Clinical Outcomes from a Randomized Controlled Trial of Two Implementation Strategies to Promote Collaborative Care Attendance in Community Practices . Adm Policy Ment Health 42 ( 5 ), 642–653. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Kilbourne AM, Neumann MS, Pincus HA, Bauer MS, Stall R, 2007. Implementing evidence-based interventions in health care: application of the replicating effective programs framework . Implement Sci 2 , 42. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Kilbourne AM, Neumann MS, Waxmonsky J, Bauer MS, Kim HM, Pincus HA, Thomas M, 2012b. Public-academic partnerships: evidence-based implementation: the role of sustained community-based practice and research partnerships . Psychiatr Serv 63 ( 3 ), 205–207. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Kilbourne AM, Post EP, Nossek A, Drill L, Cooley S, Bauer MS, 2008. Improving medical and psychiatric outcomes among individuals with bipolar disorder: a randomized controlled trial . Psychiatr Serv 59 ( 7 ), 760–768. [ PubMed ] [ Google Scholar ]
  • Kirchner JE, Ritchie MJ, Pitcock JA, Parker LE, Curran GM, Fortney JC, 2014. Outcomes of a partnered facilitation strategy to implement primary care-mental health . J Gen Intern Med 29 Suppl 4 , 904–912. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Lei H, Nahum-Shani I, Lynch K, Oslin D, Murphy SA, 2012. A “SMART” design for building individualized treatment sequences . Annu Rev Clin Psychol 8 , 21–48. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Lew RA, Miller CJ, Kim B, Wu H, Stolzmann K, Bauer MS, 2019. A robust method to reduce imbalance for site-level randomized controlled implementation trial designs . Implementation Sci , 14 , 46. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Morgan CJ, 2018. Reducing bias using propensity score matching . J Nucl Cardiol 25 ( 2 ), 404–406. [ PubMed ] [ Google Scholar ]
  • Morton V, Torgerson DJ, 2003. Effect of regression to the mean on decision making in health care . BMJ 326 ( 7398 ), 1083–1084. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Nahum-Shani I, Qian M, Almirall D, Pelham WE, Gnagy B, Fabiano GA, Waxmonsky JG, Yu J, Murphy SA, 2012. Experimental design and primary data analysis methods for comparing adaptive interventions . Psychol Methods 17 ( 4 ), 457–477. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • NeCamp T, Kilbourne A, Almirall D, 2017. Comparing cluster-level dynamic treatment regimens using sequential, multiple assignment, randomized trials: Regression estimation and sample size considerations . Stat Methods Med Res 26 ( 4 ), 1572–1589. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Neumann MS, Sogolow ED, 2000. Replicating effective programs: HIV/AIDS prevention technology transfer . AIDS Educ Prev 12 ( 5 Suppl ), 35–48. [ PubMed ] [ Google Scholar ]
  • Pallmann P, Bedding AW, Choodari-Oskooei B, Dimairo M, Flight L, Hampson LV, Holmes J, Mander AP, Odondi L.o., Sydes MR, Villar SS, Wason JMS, Weir CJ, Wheeler GM, Yap C, Jaki T, 2018. Adaptive designs in clinical trials: why use them, and how to run and report them . BMC medicine 16 ( 1 ), 29–29. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Pape UJ, Millett C, Lee JT, Car J, Majeed A, 2013. Disentangling secular trends and policy impacts in health studies: use of interrupted time series analysis . J R Soc Med 106 ( 4 ), 124–129. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Pellegrini CA, Hoffman SA, Collins LM, Spring B, 2014. Optimization of remotely delivered intensive lifestyle treatment for obesity using the Multiphase Optimization Strategy: Opt-IN study protocol . Contemp Clin Trials 38 ( 2 ), 251–259. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Penfold RB, Zhang F, 2013. Use of Interrupted Time Series Analysis in Evaluating Health Care Quality Improvements . Academic Pediatrics 13 ( 6, Supplement ), S38–S44. [ PubMed ] [ Google Scholar ]
  • Pridemore WA, Snowden AJ, 2009. Reduction in suicide mortality following a new national alcohol policy in Slovenia: an interrupted time-series analysis . Am J Public Health 99 ( 5 ), 915–920. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Proctor E, Silmere H, Raghavan R, Hovmand P, Aarons G, Bunger A, Griffey R, Hensley M, 2011. Outcomes for implementation research: conceptual distinctions, measurement challenges, and research agenda . Adm Policy Ment Health 38 ( 2 ), 65–76. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Robson D, Spaducci G, McNeill A, Stewart D, Craig TJK, Yates M, Szatkowski L, 2017. Effect of implementation of a smoke-free policy on physical violence in a psychiatric inpatient setting: an interrupted time series analysis . Lancet Psychiatry 4 ( 7 ), 540–546. [ PubMed ] [ Google Scholar ]
  • Schildcrout JS, Schisterman EF, Mercaldo ND, Rathouz PJ, Heagerty PJ, 2018. Extending the Case-Control Design to Longitudinal Data: Stratified Sampling Based on Repeated Binary Outcomes . Epidemiology 29 ( 1 ), 67–75. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Shadish WR, Cook Thomas D., Campbell Donald T, 2002. Experimental and quasi-experimental designs for generalized causal inference . Houghton Miffflin Company, Boston, MA. [ Google Scholar ]
  • Simon GE, Ludman EJ, Bauer MS, Unutzer J, Operskalski B, 2006. Long-term effectiveness and cost of a systematic care program for bipolar disorder . Arch Gen Psychiatry 63 ( 5 ), 500–508. [ PubMed ] [ Google Scholar ]
  • Stetler CB, Legro MW, Rycroft-Malone J, Bowman C, Curran G, Guihan M, Hagedorn H, Pineros S, Wallace CM, 2006. Role of “external facilitation” in implementation of research findings: a qualitative evaluation of facilitation experiences in the Veterans Health Administration . Implement Sci 1 , 23. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Taljaard M, McKenzie JE, Ramsay CR, Grimshaw JM, 2014. The use of segmented regression in analysing interrupted time series studies: an example in pre-hospital ambulance care . Implement Sci 9 , 77. [ PMC free article ] [ PubMed ] [ Google Scholar ]
  • Wagner AK, Soumerai SB, Zhang F, Ross-Degnan D, 2002. Segmented regression analysis of interrupted time series studies in medication use research . J Clin Pharm Ther 27 ( 4 ), 299–309. [ PubMed ] [ Google Scholar ]
  • Wagner EH, Austin BT, Von Korff M, 1996. Organizing care for patients with chronic illness . Milbank Q 74 ( 4 ), 511–544. [ PubMed ] [ Google Scholar ]
  • Wyrick DL, Rulison KL, Fearnow-Kenney M, Milroy JJ, Collins LM, 2014. Moving beyond the treatment package approach to developing behavioral interventions: addressing questions that arose during an application of the Multiphase Optimization Strategy (MOST) . Transl Behav Med 4 ( 3 ), 252–259. [ PMC free article ] [ PubMed ] [ Google Scholar ]

U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings
  • My Bibliography
  • Collections
  • Citation manager

Save citation to file

Email citation, add to collections.

  • Create a new collection
  • Add to an existing collection

Add to My Bibliography

Your saved search, create a file for external citation management software, your rss feed.

  • Search in PubMed
  • Search in NLM Catalog
  • Add to Search

Quasi-experimental longitudinal designs to evaluate drug benefit policy changes with low policy compliance


  • 1 Divison of Pharmacoepidemiology and Pharmacoeconomics, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, 221 Longwood Ave (BLI-341), Boston, MA 02115, USA. [email protected]
  • PMID: 12384199
  • DOI: 10.1016/s0895-4356(02)00437-7

A causal relation between drug benefit policy change and the increase in adverse outcomes can be tested by comparing the experience of a group of patients affected by the policy vs. the (counterfactual) experience of the same patients if the policy had not been implemented. Because counterfactual experiences cannot be observed, it must be assumed that the counterfactual is correctly described by extrapolating from the same population's previous experience. The null hypothesis of no policy effect can be empirically tested using quasi-experimental longitudinal designs with repeated measures. If compliance to a policy is low, results may be biased towards the null, but a subgroup analysis of compliers may be biased by nonignorable treatment selection. Using the example of reference drug pricing in British Columbia we discuss assumptions for causal interpretations of such analyses, and provide supplementary analyses to assess and improve the validity of findings. Results from nonrandomized comparisons of subgroups defined by their compliance to a policy change should generally be interpreted cautiously, and several biases should be explored.

PubMed Disclaimer

Similar articles

  • On the evaluation of drug benefits policy changes with longitudinal claims data: the policy maker's versus the clinician's perspective. Schneeweiss S, Maclure M, Walker AM, Grootendorst P, Soumerai SB. Schneeweiss S, et al. Health Policy. 2001 Feb;55(2):97-109. doi: 10.1016/s0168-8510(00)00120-2. Health Policy. 2001. PMID: 11163649
  • Assessing the effect of Taiwan's outpatient prescription drug copayment policy in the elderly. Liu SZ, Romeis JC. Liu SZ, et al. Med Care. 2003 Dec;41(12):1331-42. doi: 10.1097/01.MLR.0000100579.91550.C4. Med Care. 2003. PMID: 14668666
  • Using policy simulation to predict drug plan expenditure when planning reimbursement changes. Dormuth CR, Burnett S, Schneeweiss S. Dormuth CR, et al. Pharmacoeconomics. 2005;23(10):1021-30. doi: 10.2165/00019053-200523100-00005. Pharmacoeconomics. 2005. PMID: 16235975
  • Confusing inequitable medicare prescription drug benefit. Shearer GE. Shearer GE. J Gen Intern Med. 2007 Feb;22(2):286-8. doi: 10.1007/s11606-006-0080-5. J Gen Intern Med. 2007. PMID: 17357002 Free PMC article. Review. No abstract available.
  • Reference-based pricing in British Columbia: implications for cardiologists--an analysis. Boulet AP, Tessier G. Boulet AP, et al. Can J Cardiol. 1997 Jan;13(1):46-51. Can J Cardiol. 1997. PMID: 9039064 Review.
  • An Evaluation of a Clinical Pre-Exposure Prophylaxis Education Intervention among Men Who Have Sex with Men. Raifman J, Nunn A, Oldenburg CE, Montgomery MC, Almonte A, Agwu AL, Arrington-Sanders R, Chan PA. Raifman J, et al. Health Serv Res. 2018 Aug;53(4):2249-2267. doi: 10.1111/1475-6773.12746. Epub 2017 Jul 25. Health Serv Res. 2018. PMID: 28744983 Free PMC article.
  • Outcomes Associated with Generic Drugs Approved Using Product-Specific Determinations of Therapeutic Equivalence. Gagne JJ, Polinski JM, Jiang W, Dutcher SK, Xie J, Lii J, Fulchino LA, Kesselheim AS. Gagne JJ, et al. Drugs. 2017 Mar;77(4):427-433. doi: 10.1007/s40265-017-0696-2. Drugs. 2017. PMID: 28181177
  • Unintentional Continuation of Medications Intended for Acute Illness After Hospital Discharge: A Population-Based Cohort Study. Scales DC, Fischer HD, Li P, Bierman AS, Fernandes O, Mamdani M, Rochon P, Urbach DR, Bell CM. Scales DC, et al. J Gen Intern Med. 2016 Feb;31(2):196-202. doi: 10.1007/s11606-015-3501-5. J Gen Intern Med. 2016. PMID: 26369941 Free PMC article.
  • Effect of Pediatric Behavioral Health Screening and Colocated Services on Ambulatory and Inpatient Utilization. Hacker KA, Penfold RB, Arsenault LN, Zhang F, Soumerai SB, Wissow LS. Hacker KA, et al. Psychiatr Serv. 2015 Nov;66(11):1141-8. doi: 10.1176/appi.ps.201400315. Epub 2015 Jul 1. Psychiatr Serv. 2015. PMID: 26129994 Free PMC article.
  • Refilling and switching of antiepileptic drugs and seizure-related events. Gagne JJ, Avorn J, Shrank WH, Schneeweiss S. Gagne JJ, et al. Clin Pharmacol Ther. 2010 Sep;88(3):347-53. doi: 10.1038/clpt.2010.90. Epub 2010 Jul 14. Clin Pharmacol Ther. 2010. PMID: 20631693 Free PMC article.

Publication types

  • Search in MeSH

Grants and funding

  • R01 AG18833/AG/NIA NIH HHS/United States
  • R01 HS10881/HS/AHRQ HHS/United States
  • R03 HS09855/HS/AHRQ HHS/United States

LinkOut - more resources

Full text sources.

  • Elsevier Science
  • Citation Manager

NCBI Literature Resources

MeSH PMC Bookshelf Disclaimer

The PubMed wordmark and PubMed logo are registered trademarks of the U.S. Department of Health and Human Services (HHS). Unauthorized use of these marks is strictly prohibited.

Mechanical Properties of Auxetic Honeycombs Realized via Material Extrusion Additive Manufacturing: Experimental Testing and Numerical Studies

  • Published: 25 September 2024

Cite this article

quasi experimental longitudinal study

  • B. Uspensky 1 ,
  • I. Derevianko 1 , 3 ,
  • Konstantin Avramov 1 , 2 , 4 ,
  • K. Maksymenko-Sheiko 1 &
  • M. Chernobryvko 1  

Combination of experimental testing and numerical analysis is suggested to determine static mechanical properties of the auxetic honeycombs realized via material extrusion. Special specimens, which consist of two honeycombs plates and three steel plates, are used to analyze experimentally shear mechanical properties of honeycombs. Shear testing is simulated using the finite elements software ANSYS. The tests on tension of honeycombs are carried out. These tests are simulated by finite elements software. Plasticity of the honeycomb material and geometrically nonlinear deformations of the honeycomb walls are accounted in honeycomb model. The experimental data and calculations results are close.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save.

  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime

Price excludes VAT (USA) Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Rent this article via DeepDyve

Institutional subscriptions

quasi experimental longitudinal study

Author information

Authors and affiliations.

Anatolii Pidhornyi Institute of Power Machines and Systems of the National academy of sciences of Ukraine, Kharkiv, Ukraine

B. Uspensky, I. Derevianko, Konstantin Avramov, K. Maksymenko-Sheiko & M. Chernobryvko

Department of Technical Systems, Kharkiv National University of Radio Electronics, Kharkov, Ukraine

Konstantin Avramov

Yangel Yuzhnoye State Design Office, Dnipro, Ukraine

I. Derevianko

Department of Aircraft Strength, National Aerospace University N.Ye. Zhukovsky “KhAI”, Kharkov, Ukraine

You can also search for this author in PubMed   Google Scholar


Uspensky performed numerical simulations. Derevianko carried out all experimental work. Avramov suggested methodology of research and wrote the paper. Maksymenko-Sheiko printed the samples. Chernobryvko added the placticity model.

Corresponding author

Correspondence to Konstantin Avramov .

Ethics declarations

Ethical approval.

This declaration is not applicable.

Competing Interests

The authors declare no competing interests.

Additional information

Publisher’s note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Uspensky, B., Derevianko, I., Avramov, K. et al. Mechanical Properties of Auxetic Honeycombs Realized via Material Extrusion Additive Manufacturing: Experimental Testing and Numerical Studies. Appl Compos Mater (2024). https://doi.org/10.1007/s10443-024-10269-2

Download citation

Received : 08 May 2024

Accepted : 11 September 2024

Published : 25 September 2024

DOI : https://doi.org/10.1007/s10443-024-10269-2

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Material Extrusion
  • Tension of Honeycomb
  • Geometrically Nonlinear Deformation
  • Find a journal
  • Publish with us
  • Track your research


  1. desain quasi experimental adalah

    quasi experimental longitudinal study

  2. Quasi Experimental Study Design

    quasi experimental longitudinal study

  3. Figure 1 from An Exploratory Investigation into the Relationshps

    quasi experimental longitudinal study

  4. Figure 2 from An Exploratory Investigation into the Relationshps

    quasi experimental longitudinal study

  5. CONSORT diagram of quasi-experimental study with two groups

    quasi experimental longitudinal study

  6. (PDF) The Clinical Effectiveness of Virtual Nursing Process Education

    quasi experimental longitudinal study


  1. Chapter 5. Alternatives to Experimentation: Correlational and Quasi Experimental Designs

  2. Class 10th/11th

  3. What is an Longitudinal Research?

  4. Types of Quasi Experimental Research Design

  5. Longitudinal analysis of latent variables

  6. How To Conduct Quasi Experimental Study: A Real Life Example


  1. Quasi-Experimental Design

    True experimental design Quasi-experimental design; Assignment to treatment: The researcher randomly assigns subjects to control and treatment groups.: Some other, non-random method is used to assign subjects to groups. Control over treatment: The researcher usually designs the treatment.: The researcher often does not have control over the treatment, but instead studies pre-existing groups ...

  2. The Use and Interpretation of Quasi-Experimental Studies in Medical

    In medical informatics, the quasi-experimental, sometimes called the pre-post intervention, design often is used to evaluate the benefits of specific interventions. The increasing capacity of health care institutions to collect routine clinical data has led to the growing use of quasi-experimental study designs in the field of medical ...

  3. Research Designs: Quasi-Experimental, Case Studies & Correlational

    Three types of quasi-experimental research are: cross-sectional, longitudinal, and cross-sequential. Cross-sectional research studies make a comparison of different groups at the same time.

  4. Selecting and Improving Quasi-Experimental Designs in Effectiveness and

    Collection of longitudinal data relevant to implementation processes that could impact interpretation of findings such as academic vs community affiliation, urban vs rural (bed size) ... "Experimental and Quasi-Experimental Designs for Research on Teaching." In Gage NL (ed.), Handbook of Research on Teaching. Chicago: Rand McNally, 1963 ...

  5. Quasi-experiment

    A quasi-experiment is an empirical interventional study used to estimate the causal impact of an intervention on target population without random assignment. Quasi-experimental research shares similarities with the traditional experimental design or randomized controlled trial, but it specifically lacks the element of random assignment to ...

  6. PDF Quantitative Research Designs: Experimental, Quasi-Experimental, and

    longitudinal study, participants are observed and measurements are taken over a long period of time. Longitudinal studies either go forward in time (prospec- ... Quasi-experimental research does not have randomization of participants to groups. 7. In a human intervention study, will participants, researchers, and staff be blinded from knowing ...

  7. Quasi-experimental study designs series—paper 5: a checklist for

    Quasi-experimental study designs series—paper 5: a checklist for classifying studies evaluating the effects on health interventions—a taxonomy without labels ... Analysis of a cohort with longitudinal "panel" data sets. In rare cases, the unit of analysis will be measured at the disaggregate level (i.e., the same people measured ...

  8. Regression based quasi-experimental approach when ...

    However, with quasi-experimental study designs researchers are able to estimate causal effects using observational approaches. Interrupted time series (ITS) analysis is a useful quasi-experimental design with which to evaluate the longitudinal effects of interventions, through regression modelling.3 The term quasi-experimental refers to an ...

  9. 14

    Specifically, we describe four quasi-experimental designs - one-group pretest-posttest designs, non-equivalent group designs, regression discontinuity designs, and interrupted time-series designs - and their statistical analyses in detail. Both simple quasi-experimental designs and embellishments of these simple designs are presented.

  10. PDF Quasi-experimental and Single-case Experimental Designs

    Quasi-Experimental Designs In this major section, we introduce a common type of research design called the quasi-experimental research design. The quasi-experimental research design, also defined in A quasi-experimental research design is the use of methods and procedures to make observations in a study that is structured similar to an experiment,

  11. Quasi-Experimental Research

    The prefix quasi means "resembling." Thus quasi-experimental research is research that resembles experimental research but is not true experimental research. Although the independent variable is manipulated, participants are not randomly assigned to conditions or orders of conditions (Cook & Campbell, 1979). [1] Because the independent variable is manipulated before the dependent variable ...

  12. Conceptualising natural and quasi experiments in public health

    Natural or quasi experiments are appealing for public health research because they enable the evaluation of events or interventions that are difficult or impossible to manipulate experimentally, such as many policy and health system reforms. However, there remains ambiguity in the literature about their definition and how they differ from randomized controlled experiments and from other ...

  13. Neighborhood Design and Walking : A Quasi-Experimental Longitudinal Study

    Methods. Using cross-sectional (n=70) and longitudinal (n=32) data (collected 2003-2006), associations of neighborhood design and demographics with walking were examined.Participants were low-income, primarily African-American women in the southeastern U.S. Through a natural experiment, some women relocated to neo-traditional communities (experimental group) and others moved to conventional ...

  14. Quasi-Experimental Design

    In a longitudinal time-series experimental design, the same sample is kept throughout each observation over time. ... For example, in a quasi-experimental study, researchers may be interested in ...

  15. Cultivating classroom curiosity: a quasi-experimental, longitudinal

    Accordingly, the present study conducted a quasi-experimental, longitudinal investigation to examine the relationship between exposure to the Question Formulation Technique (QFT), a classroom-based intervention that seeks to teach students how to ask their own questions, and scores on curiosity and related strengths in a sample of Northeast ...

  16. Experimental and Quasi-Experimental Designs in Implementation Research

    Quasi-experimental designs allow implementation scientists to conduct rigorous studies in these contexts, albeit with certain limitations. We briefly review the characteristics of these designs here; other recent review articles are available for the interested reader (e.g. Handley et al., 2018). 2.1.

  17. PDF Checklist for Quasi-experimental Studies (Non-randomized Experimental

    Critical Appraisal Checklist for Quasi-Experimental Studies - 4 EXPLANATION FOR THE CRITICAL APPRAISAL TOOL FOR QUASI-EXPERIMENTAL STUDIES How to cite: Tufanaru C, Munn Z, Aromataris E, Campbell J, Hopp L. Chapter 3: Systematic reviews of effectiveness. In: Aromataris E, Munn Z (Editors). JBI Manual for Evidence Synthesis. JBI, 2020. Available

  18. A Longitudinal Quasi-Experimental Study of Violence and Disorder

    A Longitudinal Quasi-Experimental Study of Violence and Disorder Impacts of Urban CCTV Camera Clusters. Jerry H. Ratcliffe https ... (2011). Police-monitored CCTV cameras in Newark, NJ: A quasi-experimental test of crime deterrence. Journal of Experimental Criminology, 7, 255-274. Crossref. ISI. Google Scholar. Chatfield C. (1989). The ...

  19. Engagement With a Digital Platform for Multimodal Cognitive ...

    Methods: This feasibility investigation utilized a quasi-experimental, single-arm, nonrandomized, longitudinal design where participants engaged in the behavioral intervention on a smartphone. Of the 559 participants that initially enrolled (age: mean 51 years, SD 7.5 years; 51.7% female [289/559]), 242 completed the final testing trial.

  20. Longitudinal-Experimental Studies

    The longitudinal and experimental elements are also complementary in that the experiment can demonstrate (with high internal validity) the effect of only one or two independent variables, whereas the longitudinal study can demonstrate (with somewhat lower internal validity in quasi-experimental analyses), the relative effects of many ...

  21. longitudinal quasi-experimental design: Topics by Science.gov

    A naturally occurring quasi-experimental longitudinal field study of 87 municipal employees using pretest and posttest measures investigated the effects of an office workstation ergonomics intervention program on employees' perceptions of their workstation characteristics, levels of persistent pain, eyestrain, and workstation satisfaction. The ...

  22. Quasi-experimental longitudinal designs to evaluate drug ...

    The null hypothesis of no policy effect can be empirically tested using quasi-experimental longitudinal designs with repeated measures. If compliance to a policy is low, results may be biased towards the null, but a subgroup analysis of compliers may be biased by nonignorable treatment selection. ... Longitudinal Studies Patient Compliance ...

  23. (PDF) Does music training enhance working memory ...

    Findings from a quasi-experimental longitudinal study. January 2013; Psychology of Music 42(2):284-298; ... performed at three dependent time points in this study. The quasi-experimental design ...

  24. Assessment of the HIV Enhanced Access Testing in the Emergency

    This quasi-experimental prospective study evaluated implementation of the HEATED program using the Reach, Effectiveness, Adoption, Implementation, and Maintenance ... CPD-Reaction scores from post-implementation period 2, were used in assessment of maintenance, along with the longitudinal systems-level trends in HTS delivery.

  25. Mechanical Properties of Auxetic Honeycombs Realized via Material

    2.1 Tension Tests of PLA Samples. Material extrusion additive technology is used to manufacture auxetic honeycombs from PLA material. As follows from the experimental studies of the parts manufactured by material extrusion technology [3, 37,38,39,40], their materials are orthotropic.Mechanical properties of PLA materials are necessary for experimental testing and numerical studies of the ...