• Skip to main content
  • Skip to primary sidebar
  • Skip to footer
  • QuestionPro

survey software icon

  • Solutions Industries Gaming Automotive Sports and events Education Government Travel & Hospitality Financial Services Healthcare Cannabis Technology Use Case AskWhy Communities Audience Contactless surveys Mobile LivePolls Member Experience GDPR Positive People Science 360 Feedback Surveys
  • Resources Blog eBooks Survey Templates Case Studies Training Help center

post experimental questionnaire

Home Market Research

Post-Test Surveys: Definition, Elements & How to Create One

Companies use post-test surveys to learn how well their products or services are doing. Figure out what's working and what needs improvement.

Pre- and post-test surveys are valuable tools for gathering information and measuring change over time. The pre-test survey gathers baseline information before any changes or interventions occur, providing a starting point. Then, the post-test survey comes into play, collecting follow-up data after the changes or treatments have been applied. 

This two-step approach effectively measures the impact, progress, or effectiveness of a program, course, or intervention, making it a valuable tool in education, research, and various other fields. 

These pre-tests and post-tests serve as feedback mechanisms that lead to better outcomes and informed decision-making in various fields.

In this blog, we’ll dive into the world of post-test surveys, understanding what they are, the essential elements that make them effective, and how to create one that delivers meaningful results.

What are post-test surveys?

Post-test surveys are like feedback forms you receive after you’ve finished a class, training program, or any kind of event. These surveys ask you questions about your experience, what you’ve learned, and how you felt about the whole thing.

Imagine you’ve just completed an online course. After taking all the lessons and quizzes, the course may ask you to complete a survey. In that survey, they might want to know if you found the content easy to understand if the course met your expectations, and if you have any suggestions for improvement.

The organizers or instructors use these surveys to get a sense of how well they did and where they can do better next time. It’s like getting a report card for the course or event, but you’re the one giving the grades and comments. 

The feedback from these surveys helps them make improvements for future participants to have a better experience. So, these surveys are an essential tool for learning from your experiences and making things even better in the future.

Benefits of post-test surveys

These surveys play a crucial role in various fields, from education to business, by offering a range of valuable advantages:

Collecting valuable feedback

When you finish a course or training, taking a post-test survey is like sharing your thoughts and opinions. It’s a way for you to tell the organizers what you liked or didn’t like. Your feedback is really important because it helps them understand how you felt about the whole experience. 

This way, they can make changes and improvements based on what you say. Your opinions matter and can shape how things are done in the future.

Identifying knowledge gaps

Sometimes, you might not realize what you didn’t understand until you see the post-test survey questions. These surveys can help you spot the areas where you might have missed something in the course or training. 

By highlighting these gaps in your knowledge, you get a chance to go back and review those parts or seek additional help. It’s like a map showing where to focus on learning more.

Improving course content

Your feedback in these surveys can be like a treasure map for the course creators. They can see what you enjoyed and what you didn’t. If many people say they loved a specific part of the course, it tells them to keep doing more of that. 

And if you and others suggest improvements, it gives them ideas on how to make the course even better for future learners. Your feedback guides them in creating content that suits your needs and preferences.

Enhancing user experience

Your experience matters and these surveys help make it better. When you share your thoughts, you’re helping to make sure the next person who takes the course or attends the event has an even more enjoyable time. 

Organizers use your feedback to tweak things, fix any issues, and create a smoother and more satisfying experience for everyone. So, by participating in a post-test questionnaire, you make things more user-friendly and enjoyable for others.

Key elements of an effective post-test survey

When you’re creating a post-test survey to gather valuable feedback, there are some important things to keep in mind. These key elements will help you make sure your survey is effective:

  • Well-crafted question: Your question should be clear and easy to understand. This way, people can quickly and accurately answer them.
  • Relevant and focused topics: Make sure your question is directly related to the event or experience you’re assessing. Avoid straying off-topic. It’s like discussing your favorite movie in a survey about a cooking class that doesn’t fit.
  • Balanced question types: Use a mix of different question types. Multiple-choice questions are like choosing from a menu with options, while open-ended questions are like writing a short paragraph. This balance helps you gather both quick, quantitative data and detailed, qualitative insights.
  • Adequate survey length: Think about how long it takes to complete your survey. If it’s too long, people might get tired and not finish it. If it’s too short, you might not get enough useful information.
  • Proper timing: Timing is crucial. You should give the survey right after the event while things are fresh in people’s minds. It’s like taking a picture of something beautiful while it’s right in front of you, and it captures the moment accurately.
  • Pre-testing: Before sending out your survey to a larger audience, test it on a small group first. This is like trying out a new recipe with a few friends before serving it at a big dinner party. Testing helps you catch any issues and make improvements.

Creating engaging post-test survey questions

Creating engaging post-test survey questions is essential for collecting meaningful feedback from participants. Engaging questions encourage respondents to provide detailed and honest survey responses , leading to valuable insights. Here are some tips on how to conduct engaging post-test survey questions for post-test-surveys:

  • Start with a clear purpose: Before you begin, define the main objectives of your survey. Understand what specific information you want to gather and what decisions or improvements the survey will support.
  • Keep questions clear and simple: Use plain language and straightforward wording. Avoid jargon or complex terminology that may confuse respondents. Your question should be easy to understand at a glance.
  • Ask one question at a time: Avoid double-barreled questions that ask about multiple things simultaneously. Each question should focus on a single topic or aspect to ensure clear and accurate responses. 
  • Use varied question types:
  • Multiple-choice questions: Provide options for respondents to choose from. These questions are great for capturing quantitative data.
  • Likert scale questions: Use a scale to measure agreement or satisfaction. For example, from “Strongly Disagree” to “Strongly Agree.”
  • Open-ended questions: Allow respondents to provide free-text responses. These questions yield qualitative data and encourage detailed feedback.
  • Balance positives and negatives: Include questions that ask about both positive and negative aspects of the event or experience. Encourage respondents to share what they liked and what they believe could be improved.
  • Consider question order: Arrange your questions in a logical and user-friendly sequence. Start with easy and non-invasive questions to build respondent confidence before delving into more complex or personal topics.
  • Test your questions: Before distributing the questionnaire, test it with a small group of individuals to identify any confusing or problematic questions. Their feedback can help you refine your survey.
  • Provide clear instructions: If a question requires specific information or context, provide clear instructions or examples to ensure respondents understand what’s being asked.
  • Ensure mobile-friendly design: Many people take questionnaires on mobile devices. Ensure that your questionnaire is responsive and easy to complete on smartphones and tablets.

Creating engaging post-test survey questions is all about making it easy and appealing for participants to provide their thoughts and insights. When you make questions that are clear, relevant, and diverse in format, you’re more likely to receive valuable feedback. It can guide improvements and decision-making based on the genuine perspectives of your respondents.

How QuestionPro helps in conducting post-test surveys?

QuestionPro provides valuable assistance in conducting post-test surveys through a range of features and capabilities:

  • Ease of survey creation: QuestionPro offers an intuitive platform for creating post-test questionnaires. You can design questionnaires from scratch or use templates, simplifying the survey creation process.
  • Diverse question types: The platform supports various question types, such as multiple-choice, open-ended, and rating scales, enabling you to conduct surveys that gather comprehensive feedback.
  • Customization: You can personalize your questionnaires by adding your organization’s branding elements, like logos, colors, and fonts, ensuring a consistent and professional appearance.
  • Mobile responsiveness: These questionnaires designed with QuestionPro are mobile-responsive, ensuring that participants can complete surveys on their preferred devices and enhancing accessibility.
  • Distribution flexibility: The platform provides multiple distribution methods, allowing you to share questionnaires via email, social media, or website embedding, making it easy to reach your target audience.
  • Data analysis tools: QuestionPro offers robust data analysis tools that help you interpret survey results with charts, graphs, and reports, facilitating data-driven decision-making.
  • Data security and privacy: QuestionPro prioritizes data security, ensuring that participant data remains confidential and protected and building trust among survey takers.
  • Integrations: If you use other tools or platforms, QuestionPro offers integrations that allow you to connect survey data with other systems, streamlining your data management.
  • Flexible pricing: QuestionPro offers a range of pricing plans, from free options with basic features to paid plans with advanced functionalities, making it adaptable to different budget requirements.

QuestionPro assists in conducting post-test surveys by providing a comprehensive platform that simplifies survey creation, distribution, and data analysis. Its features help you collect, analyze, and leverage survey data effectively, whether you’re evaluating training programs, products, or other post-test survey objectives.

Post-test surveys are invaluable for evaluating the effectiveness of educational programs, training initiatives, product developments, and marketing efforts. Understanding the key elements and using a versatile survey platform like QuestionPro can greatly assist in the creation and execution of these questionnaires. 

Using QuestionPro survey software, you can gain actionable insights and continuously enhance your initiatives, driving positive change and success. Contact QuestionPro today to get the best value for your post-test surveys.

LEARN MORE         FREE TRIAL

MORE LIKE THIS

data security

Data Security: What it is, Types, Risk & Strategies to Follow

Sep 25, 2024

user behavior

User Behavior: What it is, How to Understand, Track & Uses

Sep 24, 2024

post experimental questionnaire

Mass Personalization is not Personalization! — Tuesday CX Thoughts

change management questions

Change Management Questions: How to Design & Ask Questions

Sep 23, 2024

Other categories

  • Academic Research
  • Artificial Intelligence
  • Assessments
  • Brand Awareness
  • Case Studies
  • Communities
  • Consumer Insights
  • Customer effort score
  • Customer Engagement
  • Customer Experience
  • Customer Loyalty
  • Customer Research
  • Customer Satisfaction
  • Employee Benefits
  • Employee Engagement
  • Employee Retention
  • Friday Five
  • General Data Protection Regulation
  • Insights Hub
  • Life@QuestionPro
  • Market Research
  • Mobile diaries
  • Mobile Surveys
  • New Features
  • Online Communities
  • Question Types
  • Questionnaire
  • QuestionPro Products
  • Release Notes
  • Research Tools and Apps
  • Revenue at Risk
  • Survey Templates
  • Training Tips
  • Tuesday CX Thoughts (TCXT)
  • Uncategorized
  • What’s Coming Up
  • Workforce Intelligence

figshare

Post-experiment questionnaire

Usage metrics.

  • Architectural design
  • Urban design

CC BY 4.0

  • Foundations
  • Write Paper

Search form

  • Experiments
  • Anthropology
  • Self-Esteem
  • Social Anxiety

post experimental questionnaire

Pretest-Posttest Designs

For many true experimental designs , pretest-posttest designs are the preferred method to compare participant groups and measure the degree of change occurring as a result of treatments or interventions.

This article is a part of the guide:

  • Experimental Research
  • Third Variable
  • Research Bias
  • Independent Variable
  • Between Subjects

Browse Full Outline

  • 1 Experimental Research
  • 2.1 Independent Variable
  • 2.2 Dependent Variable
  • 2.3 Controlled Variables
  • 2.4 Third Variable
  • 3.1 Control Group
  • 3.2 Research Bias
  • 3.3.1 Placebo Effect
  • 3.3.2 Double Blind Method
  • 4.1 Randomized Controlled Trials
  • 4.2 Pretest-Posttest
  • 4.3 Solomon Four Group
  • 4.4 Between Subjects
  • 4.5 Within Subject
  • 4.6 Repeated Measures
  • 4.7 Counterbalanced Measures
  • 4.8 Matched Subjects

Pretest-posttest designs grew from the simpler posttest only designs, and address some of the issues arising with assignment bias and the allocation of participants to groups.

One example is education, where researchers want to monitor the effect of a new teaching method upon groups of children. Other areas include evaluating the effects of counseling, testing medical treatments, and measuring psychological constructs. The only stipulation is that the subjects must be randomly assigned to groups, in a true experimental design, to properly isolate and nullify any nuisance or confounding variables .

post experimental questionnaire

The Posttest Only Design With Non-Equivalent Control Groups

Pretest-posttest designs are an expansion of the posttest only design with nonequivalent groups, one of the simplest methods of testing the effectiveness of an intervention.

In this design, which uses two groups, one group is given the treatment and the results are gathered at the end. The control group receives no treatment, over the same period of time, but undergoes exactly the same tests.

Statistical analysis can then determine if the intervention had a significant effect . One common example of this is in medicine; one group is given a medicine, whereas the control group is given none, and this allows the researchers to determine if the drug really works. This type of design, whilst commonly using two groups, can be slightly more complex. For example, if different dosages of a medicine are tested, the design can be based around multiple groups.

Whilst this posttest only design does find many uses, it is limited in scope and contains many threats to validity . It is very poor at guarding against assignment bias , because the researcher knows nothing about the individual differences within the control group and how they may have affected the outcome. Even with randomization of the initial groups, this failure to address assignment bias means that the statistical power is weak.

The results of such a study will always be limited in scope and, resources permitting; most researchers use a more robust design, of which pretest-posttest designs are one. The posttest only design with non-equivalent groups is usually reserved for experiments performed after the fact, such as a medical researcher wishing to observe the effect of a medicine that has already been administered.

post experimental questionnaire

The Two Group Control Group Design

This is, by far, the simplest and most common of the pretest-posttest designs, and is a useful way of ensuring that an experiment has a strong level of internal validity . The principle behind this design is relatively simple, and involves randomly assigning subjects between two groups, a test group and a control . Both groups are pre-tested, and both are post-tested, the ultimate difference being that one group was administered the treatment.

Confounding Variable

This test allows a number of distinct analyses, giving researchers the tools to filter out experimental noise and confounding variables . The internal validity of this design is strong, because the pretest ensures that the groups are equivalent. The various analyses that can be performed upon a two-group control group pretest-posttest designs are (Fig 1):

Pretest Posttest Design With Control Group

  • This design allows researchers to compare the final posttest results between the two groups, giving them an idea of the overall effectiveness of the intervention or treatment. (C)
  • The researcher can see how both groups changed from pretest to posttest, whether one, both or neither improved over time. If the control group also showed a significant improvement, then the researcher must attempt to uncover the reasons behind this. (A and A1)
  • The researchers can compare the scores in the two pretest groups, to ensure that the randomization process was effective. (B)

These checks evaluate the efficiency of the randomization process and also determine whether the group given the treatment showed a significant difference.

Problems With Pretest-Posttest Designs

The main problem with this design is that it improves internal validity but sacrifices external validity to do so. There is no way of judging whether the process of pre-testing actually influenced the results because there is no baseline measurement against groups that remained completely untreated. For example, children given an educational pretest may be inspired to try a little harder in their lessons, and both groups would outperform children not given a pretest, so it becomes difficult to generalize the results to encompass all children.

The other major problem, which afflicts many sociological and educational research programs, is that it is impossible and unethical to isolate all of the participants completely. If two groups of children attend the same school, it is reasonable to assume that they mix outside of lessons and share ideas, potentially contaminating the results. On the other hand, if the children are drawn from different schools to prevent this, the chance of selection bias arises, because randomization is not possible.

The two-group control group design is an exceptionally useful research method, as long as its limitations are fully understood. For extensive and particularly important research, many researchers use the Solomon four group method , a design that is more costly, but avoids many weaknesses of the simple pretest-posttest designs.

  • Psychology 101
  • Flags and Countries
  • Capitals and Countries

Martyn Shuttleworth (Nov 3, 2009). Pretest-Posttest Designs. Retrieved Sep 26, 2024 from Explorable.com: https://explorable.com/pretest-posttest-designs

You Are Allowed To Copy The Text

The text in this article is licensed under the Creative Commons-License Attribution 4.0 International (CC BY 4.0) .

This means you're free to copy, share and adapt any parts (or all) of the text in the article, as long as you give appropriate credit and provide a link/reference to this page.

That is it. You don't need our permission to copy the article; just include a link/reference back to this page. You can use it freely (with some kind of link), and we're also okay with people reprinting in publications like books, blogs, newsletters, course-material, papers, wikipedia and presentations (with clear attribution).

Want to stay up to date? Follow us!

Get all these articles in 1 guide.

Want the full version to study at home, take to school or just scribble on?

Whether you are an academic novice, or you simply want to brush up your skills, this book will take your academic writing skills to the next level.

post experimental questionnaire

Download electronic versions: - Epub for mobiles and tablets - For Kindle here - For iBooks here - PDF version here

Save this course for later

Don't have time for it all now? No problem, save it as a course and come back to it later.

Footer bottom

  • Privacy Policy

post experimental questionnaire

  • Subscribe to our RSS Feed
  • Like us on Facebook
  • Follow us on Twitter

Example post-test questionnaire (part 1).

Please answer the following questions.

Circle the dot that best describes your answer. Please do not leave any question blank.

For each word below, please indicate how well it describes the site.

Please circle the appropriate dot for each of the following questions:

How easy is it to find specific information in this web site?

How satisfied are you with the site's quality of language?

How frustrated did you feel while working in this site?

Compared to what you expected, how quickly did the tasks go?

How clear were the the naming and labeling of the links?

How relevant were the images on the site to the content?

How pleasing was the overall look and feel of the site?

Please let the facilitator know that you are finished.

Adapted From: Nielsen, Jakob. Usability Engineering , Morgan Kaufmann Publishers, 1994.

The views and opinions expressed in this page are strictly those of the page author. The contents of this page have not been reviewed or approved by the University of Minnesota.

HOW WE CAN HELP?

Digital data collection.

  • Kobotoolbox
  • KoBoForm - Question types
  • KoBoToolbox - Overview

Open Data Kit

  • Data Backup, Protection, and Loss
  • How to use ODKBriefcase
  • ODK Aggregate Introduction
  • ODK Briefcase Installation
  • ODK Central and Installation
  • ODK Collect
  • ODK Collect Installation
  • ODK Collect Permissions
  • Pushing, Pulling, and Logs in ODKBriefcase
  • What is ODK Collect?
  • What is Open Data Kit
  • Basics - XLSForm
  • Question types (XLSForm)
  • XLSForms- Appearance attributes & Styling prompts

Introduction

  • Advantages of electronic data collection in comparison to paper-based collection
  • Data Center Considerations
  • Mobile Data Collection Applications
  • Using Data Responsibly

Additional Resources

  • Official KoboToolbox Guide
  • Official Open Data Kit Guide
  • XLSForm Guide
  • ODK Central

Cheat Sheets

  • Assumptions
  • Complaint Response Mechanism
  • Downward Accountability
  • Gender Mainstreaming
  • Informed Consent
  • Microsoft Power BI
  • Pre and post-testing

Pre/Post Testing

Pre- and post- questionnaire testing.

  • For illiterate participants
  • ODK Template
  • Testing analysis template

3.2 out Of 5 Stars

5 Stars 0%
4 Stars 44%
3 Stars 19%
2 Stars 25%
1 Stars 13%

One major part of monitoring and evaluation (M&E) is to assess the impact and sustainability of any project. Nowadays, many projects focus on capacity-building by providing training. While in the past, project managers counted the number of people trained, the number of hours of training provided, and similar metrics, more is needed nowadays.

As part of the current, widely used, results-based monitoring approach, we want to understand how much people have learnt and if the results are attributable to the project in question.

One way to do this is by using pre- and post-testing. Testing is not a new way of assessing the level of acquired knowledge – everyone knows it from school or university.

One major difference from school testing is that we also want to assess the level of knowledge before the training started. Training should be based on needs assessments to understand where gaps exist. Thus, a pre-assessment helps us understand the current level of knowledge as our baseline. We therefore ensure that the training was indeed necessary. It is not helpful for us to pat ourselves on the back that everyone aced the post-test if, in actuality, they all knew the topics even before we started. To be sure we can attribute their knowledge to the training provided, we need a pre-test.

In general, the major questions we often get asked are regarding how to administer this testing with as little effort as possible, since most NGOs have never planned a sufficient budget for testing their capacity-building activities or they do not have a lot of staff members available to administer tests. This means that a proper test, graded by a human being, is out of question. Additionally, the length and depth of training often does not justify such labour- and time-intensive testing.

The pre-testing should always be done right at the beginning of the training (before any content is discussed in detail) to understand the pre-test situation – referred to as the ‘baseline’. The post-test should be done after the training is finished. And often, because we obviously want to make sure we can track down our training participants and have them all in the same room, we do the post-testing right at the end of the training. This is okay, but if we have the opportunity, we should consider doing a post-test later – some weeks, or even months, after the training is completed. This way, we can understand the participants’ retention rate – meaning how much they have kept in their long-term memory, and not just in the short-term.

Testing can be done in many different ways. We will present three different options here, and you can pick whichever works best for the context, your experience, and your budget.

But before we dive in, we are going to discuss some more basics that we consider for testing (all for the sake of finding a good and efficient way of testing). This should be seen as the minimum, and you can always do more.

Testing Fundamentals

Testing should always cover all of the topics discussed, so the questions developed should ideally be put together by the trainer. To differentiate between levels of knowledge, there should be both easier and more complicated questions. Because the grading of text-based written answers can take time, we suggest using multiple-choice questions. Again, for the ease of grading, we advise always having only one correct answer out of a list of possible answers. You can, of course, make multiple-choice questions more advanced, but then the grading also becomes more advanced. To really understand the improvement, we suggest that you use the same questions in your pre- and post-test, but you can of course change the order in which the questions appear, etc. It is also important that you not discuss the correct answers to the pre-test.

Basic Testing Using Paper-based Questionnaires

You can develop a paper-based questionnaire that you can give to the participants to fill out at the beginning and at the end. These can be simple Microsoft Word documents you print and give to the participants. Make sure you provide these questions in a language understood by all the participants. In case of a mixed group, have more than one language available.

Once you have done your testing, we have provided an Excel template file to help you. The file features note that guides you in detail on how to use it, along with five spreadsheets, where you can enter information. You only have to fill four of the spreadsheets, because when done correctly, the fifth will automatically analyse your test results for you.

The first sheet to fill for you is the Questionnaire Pre , your pre-test questions. We have given you space for 15 questions, with up to four answer options for each. You can have more, of course, but then you will need to make sure that all the formulas in the analysis sheet will see your additions (reach out to us if you need help).

pre and post questionnaire testing

Here you can see what the file looks like. We tried to make it quite easy and intuitive. In column C, you write your questions (while theoretically you do not need this, it’s good to have all the questions and answers at hand, as you might not remember a year or two down the line what the test was about, and this way, you have all the relevant information). Then, in column E, you write the answer options. We provide you space for four and would advise you to use all of them. Using fewer than four makes it easier for people to ‘guess’ correctly, and as Hans Rosling showed in his famous tests, you want to beat the chimpanzees (meaning the respondents just guess and get the answer right by chance). Then, in column F, you mark which of your given options is the correct one.

example question

When you have finished, it looks like this. We created a little colour code – it turns green – which helps you make quickly sure that the right answer is marked ‘correct’. Now you proceed and do this for all your questions. Then you enter the results from the pre-test in the next sheet, called Pre-test .

example catergories

Column B, C and D do not have to be filled, but if you do (especially for C/gender and D/age, additional analysis will be done). You can administer the test without knowing who provided which answers, or with all the details. Then in columns E to S, you get a drop-down list where you write which answer people gave.

example gender answers

You can directly see here how well people did, as column T shows you the number of correct answers given, and column U shows you the percentage of correct answers. You can, of course, use less than 15 questions here, in which case you just enter the responses for the actual number of questions participants answered. You do the exact same for the post-test. In the Questionnaire Post sheet, we offer you the ability to pose new questions (just in case you decide not to use the same ones used in the pre-test).

example answers

Our three sample students have done well; everyone seems to have learnt something. You can now go over to the Analysis sheet to see how things compare between the pre- and the post-test. First, we look at the descriptive summary. Here, we are just checking the gender (yes, we know this may seem odd to check, but bear with us) and the age of the respondents to see if there were any changes.

descriptive summary example

For our three samples, all is well: the number of pre- and post-test respondents by gender matches, so we get green results in row 13. If you were to use different numbers, row 13 would look like this.

descriptive summary example 2

We added two extra people, so the post-test has five respondents, while the pre-test only has three. Row 13 now turns red, showing us that something does not match. This could be okay, as you may know you had some people missing from the pre-test, but it can also help you spot any errors you may have made in data entry.

post experimental questionnaire

The results summary shows us the average percentage of correct answers by gender and in total. It also shows the difference between the pre- and post-test. If you had a target, you can put it in row 29 (the grey area), and then row 33 will show you if you reached the target.

post experimental questionnaire

With a target of 30% – that means 30% more correct answers – only the “Female” demographic achieved it. Although the others showed increases, they were below the target.

Lastly, the document provides you with an analysis by question, so you can understand where people struggled (if they struggled).

post experimental questionnaire

Digital Tools

Instead of using a paper-based version, you can also administer the test by means of digital data collection efforts. For this, we provide you with a simple template (which again can be altered and adjusted as you need). The template uses Open Data Kit (ODK), which works on many different websites for digital data collection, such as ODK Aggregate , ODK Central , Kobotoolbox , ONA , CommCare , SurveyCTO , and others.

The Excel file is already in the correct format for ODK, so if you do not already know how to use xlsform well, you should stick to only making changes to the areas marked in green or orange. In the green areas, you write your questions and answers. In the sheet titled survey , we have again added 15 questions (and you can change it, as you need). Each question needs one row.

post experimental questionnaire

We have taken some measures here to allow for less bias, ensuring that the order of the answer options will be randomised. This means that you should not write answers like ‘None of the above’, as this might be listed as the first answer option. Instead, you can write ‘None of the answers is correct’, for example.

You need to write the answer options into the second sheet, choices . In column C, you write the answer options in the green areas, and each option gets its own row. In column B (orange), you indicate which answer is correct by writing a ‘1’, and all other options should get a ‘0’. We have inserted some lines to make it easier for you to see which answers belong to which question, but you can also see that by referring to column A.

post experimental questionnaire

If you need the survey in more than one language, you can check xlsform to see how to add further languages.

At the beginning of the survey, we have added two questions regarding gender and age, and at the end, we added a question summing up all the answers (that’s why the correct answer is a 1 and the wrong answers are a 0). This system only works when all questions are answered, so we have made them ‘required’, by writing ‘yes’ in the relevant column F.

There are three ways to use the ODK tool. You can upload it and use the web-link, which you send to participants who then fill the survey out themselves. This often works for educated people, such as in business incubators or for training government or internal staff. The second option is to load the questions onto tablets and pass these around during the training (of course, at the beginning or the end). Lastly, you can have interviewers ask the questions to people, and the interviewers record the answers – this works well for illiterate or semi-literate participants. But the last option takes a bit of time, especially for bigger groups.

Illiterate/Semi-literate Participants

In many countries, we have participants who are semi-literate or illiterate. While what we described above is one method to test people with difficulties, it takes time. Here, we present you with a simpler/quicker version (at least regarding the time to get the results).

From our package, you need to print each of the question files (Q1 – Q15) four times (one for each of the answer options). Then, you need to print the back side of each question page with one of the four icons (apple, mug, pencil, sheep). So in the end, you have four double-sided pages for each question: always with the question (Q1, Q2, etc.) on one side, and one of the icons on the other side (see below for an example using the ‘apple’ icon).

post experimental questionnaire

Side 1                                      Side 2

Now comes a bit of do-it-yourself arts and crafts: Cut out each of these 4cm-by-4cm squares (the black boxes give you a bit of guidance), which will result in 28 “apple” responses for question one (Q1). You will need to have a set of four answer options – each with its specific icon – for each question for each participant. If you want to be able to know which of your respondents answered what, you can now write the numbers 1-28 on the bottom of each set of the cards, where it says “Respondents: _____”. Otherwise, your test will be anonymous.

Continue in this manner until you have a set of answer cards for each of the questions for each of your 28 respondents. Yes, this will total 28 sets of 60 small, square answer cards. If you have more than 28 participants, you need to just repeat the above steps. When you do the printing, make sure that the double-sided printing works well and lines up right – this might need a few trial and error attempts to get it right. After you cut out the answer cards, you can laminate them to make them more durable. Note: You need to laminate them AFTER you cut them out, as otherwise, the lamination will not hold when you cut them. Also, a cutting machine will be your friend here.

Now you need some organised way to store everything. We advise that you keep all questions and answer options separate (if you do not need to keep track of respondents). This means all apple answers of Q1 are kept together, but separate from the sheep answers for that same question, and so on with the other icons.

In the training (here it helps to have an assistant), you will give out the four answer options for each question (let’s start with Q1) – with the icons face up – to each respondent. If you want to track who said what, have a document that tells you which respondent gets which number and keep track of it as you pass each question out (be sure each person gets answers with the same respondent number on them each time).

Next, the trainer reads out the question, and the answer options. Most likely, they will have to do this more than once. Each answer option is associated with one symbol. You ask people to choose the symbol of their chosen answer option. For greater ease and less stress, have one container ready to collect the answer options selected by participants and a separate container for all ‘other’ options, and collect these after each question. It is very helpful to be able to organise them in peace after the training.

Another option is to give each person two containers: one for their chosen answers and one for the answers they did not choose. Then, you collect all answers by each person (without having to write ‘respondent’ numbers on the back). We would advise against this option, because this system works only when you are very organised and don’t mix things up, and you’re able to keep track of which container came from which person.

Once you are back in the office, you can use the spreadsheet document introduced above to keep track of the answer options and analyse the training test.

How can we improve this article?

Please submit the reason for your vote so that we can improve the article..

U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings

The PMC website is updating on October 15, 2024. Learn More or Try it out now .

  • Advanced Search
  • Journal List
  • Front Psychol

Evaluating Intervention Programs with a Pretest-Posttest Design: A Structural Equation Modeling Approach

Guido alessandri.

1 Department of Psychology, Sapienza University of Rome, Rome, Italy

Antonio Zuffianò

2 Department of Psychology, Liverpool Hope University, Liverpool, UK

Enrico Perinelli

Associated data.

A common situation in the evaluation of intervention programs is the researcher's possibility to rely on two waves of data only (i.e., pretest and posttest), which profoundly impacts on his/her choice about the possible statistical analyses to be conducted. Indeed, the evaluation of intervention programs based on a pretest-posttest design has been usually carried out by using classic statistical tests, such as family-wise ANOVA analyses, which are strongly limited by exclusively analyzing the intervention effects at the group level. In this article, we showed how second order multiple group latent curve modeling (SO-MG-LCM) could represent a useful methodological tool to have a more realistic and informative assessment of intervention programs with two waves of data. We offered a practical step-by-step guide to properly implement this methodology, and we outlined the advantages of the LCM approach over classic ANOVA analyses. Furthermore, we also provided a real-data example by re-analyzing the implementation of the Young Prosocial Animation, a universal intervention program aimed at promoting prosociality among youth. In conclusion, albeit there are previous studies that pointed to the usefulness of MG-LCM to evaluate intervention programs (Muthén and Curran, 1997 ; Curran and Muthén, 1999 ), no previous study showed that it is possible to use this approach even in pretest-posttest (i.e., with only two time points) designs. Given the advantages of latent variable analyses in examining differences in interindividual and intraindividual changes (McArdle, 2009 ), the methodological and substantive implications of our proposed approach are discussed.

Introduction

Evaluating intervention programs is at the core of many educational and clinical psychologists' research agenda (Malti et al., 2016 ; Achenbach, 2017 ). From a methodological perspective, collecting data from several points in time (usually T ≥ 3) is important to test the long-term strength of intervention effects once the treatment is completed, such as in classic designs including pretest, posttest, and follow up assessments (Roberts and Ilardi, 2003 ). However, several factors could hinder the researcher's capacity to collect data at follow-up assessments, in particular the lack of funds, participants' poor level of monitoring compliance, participants' relocation in different areas, etc. Accordingly, the use of the less advantageous pretest-posttest design (i.e., before and after the intervention) often represents a widely used methodological choice in the psychological intervention field. Indeed, from a literature research on the database PsycINFO using the following string “ intervention AND pretest AND posttest AND follow-up ” limited to abstract section and with a publication date from January 2006 to December 2016, we obtained 260 documents. When we changed “AND follow-up ” with “NOT follow-up ” the results were 1,544 (see Appendix A to replicate these literature search strategies).

A further matter of concern arises from the statistical approaches commonly used for evaluating intervention programs in pretest-posttest design, mostly ANOVA-family analyses, which heavily rely on statistical assumptions (e.g., normality, homogeneity of variance, independence of observations, absence of measurement error, and so on) rarely met in psychological research (Schmider et al., 2010 ; Nimon, 2012 ).

However, all is not lost and some analytical tools are available to help researchers better assess the efficacy of programs based on a pretest-posttest design (see McArdle, 2009 ). The goal of this article is to offer a formal presentation of a latent curve model approach (LCM; Muthén and Curran, 1997 ) to analyze intervention effects with only two waves of data. After a brief overview of the advantageous of the LCM framework over classic ANOVA analyses, a step-by-step application of the LCM on real pretest-posttest intervention data is provided.

Evaluation approaches: observed variables vs. latent variables

Broadly speaking, approaches to intervention evaluation can be distinguished into two categories: (1) approaches using observed variables and (2) approaches using latent variables . The first category includes widely used parametric tests such as Student's t , repeated measures analysis of variance (RM-ANOVA), analysis of covariance (ANCOVA), and ordinary least-squares regression (see Tabachnick and Fidell, 2013 ). However, despite their broad use, observed variable approaches suffer from several limitations, many of them ingenerated by the strong underlying statistical assumptions that must be satisfied. A first series of assumption underlying classic parametric tests is that the data being analyzed are normally distributed and have equal population variances (also called homogeneity of variance or homoscedasticity assumption). Normality assumption is not always met in real data, especially when the variables targeted by the treatment program are infrequent behaviors (i.e., externalizing conducts) or clinical syndromes (Micceri, 1989 ). Likewise, homoschedasticy assumption is rarely met in randomized control trial as a result of the experimental variable causing differences in variability between groups (Grissom and Kim, 2012 ). Violation of normality and homoscedasticity assumptions can compromise the results of classic parametric tests, in particular on rates of Type-I (Tabachnick and Fidell, 2013 ) and Type-II error (Wilcox, 1998 ). Furthermore, the inability to deal with measurement error can also lower the accuracy of inferences based on regression and ANOVA-family techniques which assume that the variables are measured without errors. However, the presence of some degree of measurement error is a common situation in psychological research where the focus is often on not directly observable constructs such as depression, self-esteem, or intelligence. Finally, observed variable approaches assume (without testing it) that the measurement structure of the construct under investigation is invariant across groups and/or time (Meredith and Teresi, 2006 ; Millsap, 2011 ). Thus, lack of satisfied statistical assumptions and/or uncontrolled unreliability can lead to the under or overestimation of the true relations among the constructs analyzed (for a detailed discussion of these issues, see Cole and Preacher, 2014 ).

On the other side, latent variable approaches refer to the class of techniques termed under the label structural equation modeling (SEM; Bollen, 1989 ) such as confirmatory factor analysis (CFA; Brown, 2015 ) and mean and covariance structures analysis (MACS; Little, 1997 ). Although a complete overview of the benefits of SEM is beyond the scope of the present work (for a thorough discussion, see Little, 2013 ; Kline, 2016 ), it is worthwhile mentioning here those advantages that directly relate to the evaluation of intervention programs. First, SEM can easily accommodate the lack of normality in the data. Indeed, several estimation methods with standard errors robust to non-normal data are available and easy-to-use in many popular statistical programs (e.g., MLM, MLR, WLSMV, etc. in M plus ; Muthén and Muthén, 1998–2012 ). Second, SEM explicitly accounts for measurement error by separating the common variance among the indicators of a given construct (i.e., the latent variable) from their residual variances (which include both measurement error and unique sources of variability). Third, if multiple items from a scale are used to assess a construct, SEM allows the researcher to evaluate to what extent the measurement structure (i.e., factor loadings, item intercepts, residual variances, etc.) of such scale is equivalent across groups (e.g., intervention group vs. control group) and/or over time (i.e., pretest and posttest); this issue is known as measurement invariance (MI) and, despite its crucial importance for properly interpreting psychological findings, is rarely tested in psychological research (for an overview see Millsap, 2011 ; Brown, 2015 ). Finally, different competitive SEMs can be evaluated and compared according to their goodness of fit (Kline, 2016 ). Many SEM programs, indeed, print in their output a series of fit indexes that help the researcher assess whether the hypothesized model is consistent with the data or not. In sum, when multiple indicators of the constructs of interest are available (e.g., multiple items from one scale, different informants, multiple methods, etc.), latent variables approaches offer many advantages and, therefore, they should be preferred over manifest variable approaches (Little et al., 2009 ). Moreover, when a construct is measured using a single psychometric measure, there are still ways to incorporate the individuals' scores in the analyses as latent variables, and thus reduce the impact of measurement unreliability (Cole and Preacher, 2014 ).

Latent curve models

Among latent variable models of change, latent curve models (LCMs; Meredith and Tisak, 1990 ), represent a useful and versatile tool to model stability and change in the outcomes targeted by an intervention program (Muthén and Curran, 1997 ; Curran and Muthén, 1999 ). Specifically, in LCM individual differences in the rate of change can be flexibly modeled through the use of two continuous random latent variables : The intercept (which usually represents the level of the outcome of interest at the pretest) and the slope (i.e., the mean-level change over time from the pretest to the posttest). In detail, both the intercept and the slope have a mean (i.e., the average initial level and the average rate of change, respectively) and a variance (i.e., the amount of inter-individual variability around the average initial level and the average rate of change). Importantly, if both the mean and the variance of the latent slope of the outcome y in the intervention group are statistically significant (whereas they are not significant in the control group), that means that there was not only an average effect of the intervention, but also some participants were differently affected by the program (Muthén and Curran, 1997 ). Hence, the assumption that participants respond to the treatment in the same way (as in ANOVA-family analyses) can be easily relaxed in LCM. Indeed, although individual differences may also be present in the ANOVA design, change occurs at the group level and, therefore, everyone is impacted in the same fashion after the exposure to the treatment condition.

As discussed by Muthén and Curran ( 1997 ), the LCM approach is particular useful for evaluating intervention effects when it is conducted within a multiple group framework (i.e., MG-LCM), namely when the intercept and the slope of the outcome of interest are simultaneously estimated in the intervention and control group. Indeed, as illustrate in our example, the MG-LCM allows the research to test if both the mean and the variability of the outcome y at the pretest are similar across intervention and control groups, as well as if the mean rate of change and its inter-individual variability are similar between the two groups. Therefore, the MG-LCM provides information about the efficacy of an intervention program in terms of both (1) its average (i.e., group-level) effect and (2) participants' sensitivity to differently respond to the treatment condition.

However, a standard MG-LCM cannot be empirically identified with two waves of data (Bollen and Curran, 2006 ). Yet, the use of multiple indicators (at least 2) for each construct of interest could represent a possible solution to overcome this problem by allowing the estimation of the intercept and slope as second-order latent variables (McArdle, 2009 ; Geiser et al., 2013 ; Bishop et al., 2015 ). Interestingly, although second-order LCMs are becoming increasingly common in psychological research due to their higher statistical power to detect changes over time in the variables of interest (Geiser et al., 2013 ), their use in the evaluation of intervention programs is still less frequent. In the next section, we present a formal overview of a second-order MG-LCM approach, we describe the possible models of change that can be tested to assess intervention effects in pretest-posttest design, and we show an application of the model to real data.

Identification of a two-time point latent curve model using parallel indicators

When only two points in time are available, it is possible to estimate two LCMs: A No-Change Model (see Figure ​ Figure1 1 Panel A) and a Latent Change Model (see Figure ​ Figure1 1 Panel B). In the following, we described in details the statistical underpinnings of both these models.

An external file that holds a picture, illustration, etc.
Object name is fpsyg-08-00223-g0001.jpg

Second Order Latent Curve Models with parallel indicators (i.e., residual variances of observed indicators are equal within the same latent variable: ε 1 within η 1 and ε 2 within η 2 ) . All the intercepts of the observed indicators (Y) and endogenous latent variables (η) are fixed to 0 (not reported in figure). In model A, the residual variances of η 1 and η 2 (ζ 1 and ζ 2 , respectively) are freely estimated, whereas in Model B they are fixed to 0. ξ 1 , intercept; ξ 2 , slope; κ 1 , mean of intercept; κ 2 , mean of slope; ϕ 1 , variance of intercept; ϕ 2 , variance of slope; ϕ 12 , covariance between intercept and slope; η 1 , latent variable at T1; η 2 , latent variable at T2; Y, observed indicator of η; ε, residual variance/covariance of observed indicators.

Latent change model

A two-time points latent change model implies two latent means (κ k ), two latent factor variances (ζ k ), plus the covariance between the intercept and slope factor (Φ k ). This results in a total of 5+T model parameters, where T are the error variances for (y k ) when allowing VAR (∈ k ) to change over time. In the case of a two waves of data (i.e., T = 2), this latent change model has 7 parameters to estimate from a total of (2) (3)/2+2 = 5 identified means, variances, and covariances of the observed variables. Hence, two waves of data are insufficient to estimate this model. However, this latent change model can be just-identified (i.e., zero degrees of freedom [df]) by constraining the residual variances of the observed variables to be 0. This last constraint should be considered structural and thus included in all two-time points latent change model. In this latter case, the variances of the latent variables (i.e., the latent intercept representing the starting level, and the latent change score) are equivalent to those of the observed variables. Thus, when fallible variables are used, this impedes to separate true scores from their error/residual terms.

A possible way to allow this latent change model to be over-identified (i.e., df ≥ 1) is by assuming the availability of at least two observed indicators of the construct of interest at each time point (i.e., T1 and T2). Possible examples include the presence of two informants rating the same behavior (e.g., caregivers and teachers), two scales assessing the same construct, etc. However, even if the construct of interest is assessed by only one single scale, it should be noted that psychological instruments are often composed by several items. Hence, as noted by Steyer et al. ( 1997 ), it is possible to randomly partitioning the items composing the scale into two (or more) parcels that can be treated as parallel forms. By imposing appropriate constraints on the loadings (i.e., λ k = 1), the intercepts (τ k = 0), within factor residuals (ε k = ε), and by fixing to 0 the residual variances of the first-order latent variables η k (ζ k = 0), the model can be specified as a first-order measurement model plus a second-order latent change model (see Figure ​ Figure1 1 Panel B). Given previous constraints of loadings, intercepts, and first order factor residual variances, this model is over-identified because we have (4) (5)/2+4 = 14 observed variances, covariances, and means. Of course, when three or more indicators are available, identification issues cease to be a problem. In this paper, we restricted our attention to the two parallel indicators case to address the more basic situation that a researcher can encounter in the evaluation of a two time-point intervention. Yet, our procedure can be easily extended to cases in which three or more indicators are available at each time point.

Specification

More formally, and under usual assumptions (Meredith and Tisak, 1990 ), the measurement model for the above two times latent change model in group k becomes:

where y k is a mp x 1 random vector that contains the observed scores, { y i t k } , for the ith variable at time t , i ∈ {1,2,., p}, and t ∈ {1,2,., m}. The intercepts are contained in the mp x 1 vector τ y k , Λ y k is a mp x mq matrix of factor loadings, η k is a mq x 1 vector of factor scores, and the unobserved error random vectors ∈ k is a mp x 1 vector. The population vector mean, μ y k , and covariance matrix, ∑ y k , or Means and Covariance Structure (MACS) are:

where μ η k is a vector of latent factors means, ∑ η k is the modeled covariance matrix, and θ ε k is a mp × mp matrix of observed variable residual covariances. For each column, fixing an element of Λ y k to 1, and an element of τ y k to 0, identifies the model. By imposing increasingly restrictive constraints on elements of matrix Λ y and τ y , the above two-indicator two-time points model can be identified.

The general equations for the structural part of a second order (SO) multiple group (MG) model are:

where Γ k is a mp x qr matrix containing second order factor coefficients, ξ k is a qr × 1 vector of second-order latent variables, and ζ k is a mq x 1 vector containing latent variable disturbance scores. Note that q is the number of latent factors and that r is the number of latent curves for each latent factor.

The population mean vector, μ η k , and covariance matrix, ∑ η k , based on (3) are

where Φ k is a r x r covariance of the latent variables, and Ψ k is a mq × mq latent variable residual covariance matrix. In the current application, what makes the difference in two models is the way in which matrices Γ k and Φ k are specified.

Application of the SO-MG-LCM to intervention studies using a pretest-posttest design

The application of the above two-times LCM to the evaluation of an intervention is straightforward. Usually, in intervention studies, individuals are randomly assigned to two different groups. The first group ( G 1 ) is exposed to an intervention that takes place somewhere after the initial time point. The second group ( G 2 ), also called the control group, does not receive any direct experimental manipulation. In light of the random assignment, G 1 and G 2 can be viewed as two equivalent groups drawn by the same population and the effect of the intervention may be ascertained by comparing individuals' changes from T1 to T2 across these two groups.

Following Muthén and Curran ( 1997 ), an intercept factor should be modeled in both groups. However, only in the intervention group an additional latent change factor should be added. This factor is aimed at capturing the degree of change that is specific to the treatment group. Whereas, the absolute value for the latent mean of this factor can be interpreted as the change determined by the intervention in the intervention group, a significant variance indicates a meaningful heterogeneity in responding to the treatment. In this model α y k is a vector containing freely estimating mean values for the intercept (i.e., ξ 1 ), and the slope (i.e., ξ 2 ). Γ y k is thus a 2 x 2 matrix, containing basis coefficients, determined in [ 1 1 ] for the intercept (i.e., ξ 1 ) and [ 0 1 ] for the slope (i.e., ξ 2 ). Φ k is a 2 x 2 matrix containing variances and covariance for the two latent factors representing the intercept and the slope.

Given randomization, restricting the parameters of the intercept to be equal across the control and treatment populations is warranted in a randomized intervention study. Yet, baseline differences can be introduced in field studies where randomization is not possible or, simply, the randomization failed during the course of the study (Cook and Campbell, 1979 ). In such cases, the equality constraints related to the mean or to the variance of the intercept can be relaxed.

The influence of participants' initial status on the effect of the treatment in the intervention group can also be incorporated in the model (Cronbach and Snow, 1977 ; Muthén and Curran, 1997 ; Curran and Muthén, 1999 ) by regressing the latent change factor onto the intercept factor, so that the mean and variance of the latent change factor in the intervention group are expressed as a function of the initial status. Accordingly, this analysis captures to what extent inter-individual initial differences on the targeted outcome can predispose participants to differently respond to the treatment delivered.

Sequence of models

We suggest a four-step approach to intervention evaluation. By comparing the relative fit of each model, researchers can have important information to assess the efficacy of their intervention.

Model 1: no-change model

A no-change model is specified for both intervention group (henceforth G1) and for control group (henceforth G2). As a first step, indeed, a researcher may assume that the intervention has not produced any meaningful effect, and therefore a no-change model (or strict stability model) should be simultaneously estimated in both the intervention and control group. In its more general version, the no-change model includes only a second-order intercept factor which represents the participants' initial level. Importantly, both the mean and variance of the second-order intercept factor are freely estimated across groups (see Figure ​ Figure1 1 Panel A). More formally, in this model, Φ k is a qr x qr covariance matrix of the latent variables, and Γ k is a mq x qr matrix, containing for each latent variable, a set of basis coefficients for the latent curves.

Model 2: latent change model in the intervention group

In this model, a slope growth factor is estimated in the intervention group only. As previously detailed, this additional latent factor is aimed at capturing any possible change in the intervention group. According to our premises, this model represents the “target” model, attesting a significant intervention effect in G1 but not in G2. Model 1 is then compared with Model 2 and changes in fit indexes between the two models are used to evaluate the need of this further latent factor (see section Statistical Analysis).

Model 3: latent change model in both the intervention and control group

In model 3, a latent change model is estimated simultaneously in both G1 and G2. The fit of Model 2 is compared with the fit of Model 3 and changes in fit indexes between the two models are used to evaluate the need of this further latent factor in the control group. From a conceptual point of view, the goal of Model 3 is twofold because it allows the researcher: (a) to rule out the eventuality of “contaminations effects” between the intervention and control group (Cook and Campbell, 1979 ); (b) to assess a possible, normative mean-level change in the control group (i.e., a change that cannot be attributed to the treatment delivered). In reference to (b), indeed, it should be noted that some variables may show a normative developmental increase during the period of the intervention. For instance, a consistent part of the literature has identified an overall increase in empathic capacities during early childhood (for an overview, see Eisenberg et al., 2015 ). Hence, researchers aimed at increasing empathy-related responding in young children may find that both the intervention and control group actually improved in their empathic response. In this situation, both the mean and variance of the latent slope should be constrained to equality across groups to mitigate the risk of confounding intervention effects with the normative development of the construct (for an alternative approach when more than two time points are available, see Muthén and Curran, 1997 ; Curran and Muthén, 1999 ). Importantly, the tenability of these constraints can be easily tested through a delta chi square test (Δχ 2 ) between the chi squares of the constrained model vs . unconstrained model. A significant Δχ 2 (usually p < 0.05) indicates that the two models are not statistically equivalent, and the unconstrained model should be preferred. On the contrary, a non-significant Δχ 2 (usually p > 0.05) indicates that the two models are statistically equivalent, and the constrained model (i.e., the more parsimonious model) should be preferred.

Model 4: sensitivity model

After having identified the best fitting model, the parameters of the intercept (i.e., mean and variance) should be constrained to equality across groups. This sensitivity analysis is crucial to ensure that both groups started with an equivalent initial status on the targeted behavior which is an important assumption in intervention programs. In line with previous analyses, the plausibility of initial status can be easily tested through the Δχ 2 test. Indeed, given randomization, it seems likely to assume that participants in both groups are characterized by similar or identical starting levels, and the groups have the same variability. These assumptions lead to a constrained no-change no-group difference model. This model is the same as the previous one, except that κ k = κ, or in our situation κ 1 = κ 2 . Moreover, in our situation, r = 1, q = 1, m = 2 , and hence, Φ k = Φ is a scalar, Γ k = 1 2 , and Ψ k = ΨI 2 for each of the k th population.

In the next section, the above sequence of models has been applied to the evaluation of a universal intervention program aimed to improve students' prosociality. We presented results from every step implied by the above methodology, and we offered a set of M plus syntaxes to allow researchers estimate the above models in their dataset.

The young prosocial animation program

The Young Prosocial Animation (YPA; Zuffianò et al., 2012 ) is a universal intervention program (Greenberg et al., 2001 ) to sensitize adolescents to prosocial and empathic values (Zuffianò et al., 2012 ).

In detail, the YPA tries to valorize: (a) the status of people who behave prosocially, (b) the similarity between the “model” and the participants, and (c) the outcomes related to prosocial actions. Following Bandura's ( 1977 ) concept of modeling , in fact, people are more likely to engage in those behaviors they value and if the model is perceived as similar and with an admired status . The main idea is that valuing these three aspects could foster a prosocial sensitization among the participants (Zuffianò et al., 2012 ). In other terms, the goal is to promote the cognitive and emotional aspects of prosociality, in order to strengthen attitudes to act and think in a “prosocial way.” The expected change, therefore, is at the level of the personal dispositions in terms of an increased receptiveness and propensity for prosocial thinking (i.e., both the ability to take the point of view and to be empathetic rather than directly affecting the behaviors acted out by the individuals, as well as the ability to produce ideas and solutions that can help other people; Zuffianò et al., 2012 ). Due to its characteristics, YPA can be conceived as a first phase of prosocial sensitization on which implementing programs more appropriately direct to increase prosocial behavior (e.g., CEPIDEA program; Caprara et al., 2014 ). YPA aims to achieve this goal through a guided discussion following the viewing of some prosocial scenes selected from the film “Pay It Forward” 1 . After viewing each scene, a trained researcher, using a standard protocol guides a discussion among the participants highlighting: (i) the type of prosocial action (e.g., consoling, helping, etc.); (ii) the benefits for the actor and the target of the prosocial action; (iii) possible benefits of the prosocial action extended to the context (e.g., other persons, the more broad community, etc.); (iv) requirements of the actor to behave prosocially (e.g., being empathetic, bravery, etc.); (v) the similarity between the participant and the actor of the prosocial behavior; (vi) the thoughts and the feelings experienced during the viewing of the scene. The researcher has to complete the intervention within 12 sessions (1 h per session, once a week).

For didactic purposes, in the present study we re-analyzed data from an implementation of the YPA in three schools located in a small city in the South of Italy (see Zuffianò et al., 2012 for details).

We expected Model 2 (a latent change model in the intervention group and a no-change model in the control group) to be the best fitting model. Indeed, from a developmental point of view, we had no reason to expect adolescents showing a normative change in prosociality after such a short period of time (Eisenberg et al., 2015 ). In line with the goal of the YPA, we hypothesized an small-medium increase in prosociality in the intervention group. We also expected that both groups did not differ at T1 in absolute level of prosocial behaviors, ensuring that both intervention and control group were equivalent. Finally, we explored the influence of participants' initial status on the treatment effect, a scenario in which those participants with lower initial level of prosociality benefitted more from attending the YPA session.

The study followed a quasi-experimental design , with both the intervention and control groups assessed at two different time points: Before (Time 1) YPA intervention and 6 months after (Time 2). Twelve classrooms from three schools (one middle school and two high schools) participated in the study during the school year 2008–2009. Each school has ensured the participation of 4 classes that were randomly assigned to intervention and control group (two classes to intervention group and two classes to control group). 2 In total, six classes were part of intervention group and six classes of control group. The students from the middle school were in the eighth grade (third year of secondary school in Italy), whereas the students from the two high schools were in the ninth (first year of high school in Italy) and tenth grade (second year of high school in Italy).

Participants

The YPA program was implemented in a city in the South of Italy. A total amount of 250 students participated in the study: 137 students (51.8% males) were assigned to the intervention group and 113 (54% males) to the control group. At T2 students were 113 in the intervention group (retention rate = 82.5%) and 91 in the control group (retention rate = 80.5%). Little's test of missingness at random showed a non-significant chi-squared value [ χ ( 2 ) 2 = 4.698, p = 0.10]; this means that missingness at posttest is not affected by the levels of prosociality at pretest. The mean age was 14.2 ( SD = 1.09) in intervention group, and 15.2 ( SD = 1.76) in control group. Considering socioeconomic status, the 56.8% of families in intervention group and the 60.0% in control group were one-income families. The professions mostly represented in the two groups were the “worker” among the fathers (the 36.4% in intervention group and the 27.9% in control group) and the “housewife” among the mothers (the 56.0% in the intervention group and the 55.2% in the control group). Parent's school level was approximately the same between the two groups: Most of parents in the intervention group (43.5%) and in the control group (44.7%) had a middle school degree.

Prosociality

Participants rated their prosociality on a 16-item scale (5-point Likert scale: 1 = never/almost never true ; 5 = almost always/always true ) that assesses the degree of engagement in actions aimed at sharing, helping, taking care of others' needs, and empathizing with their feelings (e.g., “ I try to help others ” and “ I try to console people who are sad ”). The alpha reliability coefficient was 0.88 at T1 and 0.87 at T2. The scale has been validated on a large sample of respondents (Caprara et al., 2005 ) and has been found to moderately correlate ( r > 0.50) with other-ratings of prosociality (Caprara et al., 2012 ).

Statistical analysis

All the preceding models were estimated by maximum likelihood (ML) using M plus program 7 (Muthén and Muthén, 1998–2012 ). Missing data were handled using full information maximum likelihood (FIML) estimation, which draws on all available data to estimate model parameters without imputing missing values (Enders, 2010 ). To evaluate the goodness of fit, we relied on different criteria. First we evaluated the values assumed by the χ 2 likelihood ratio statistic for the overall group. Given that we were interested in the relative fit of the above presented different models of change within G1 and G2, we investigated also the contribution offered by each group to the overall χ 2 value. The idea was to have a more careful indication of the impact of including the latent change factor in a specific group. We also investigated the values of the Comparative Fit Index (CFI), the Tucker Lewis Fit Index (TLI), the Root Mean Square Error of Approximation (RMSEA) with associated 90% confidence intervals, and the Root Mean Square Residuals Standardized (SRMR). We accepted CFI and TLI values >0.90, RMSEA values <0.08, and SRMR <0.08 (see Kline, 2016 ). Last, we used the Akaike Information Criteria (AIC; Burnham and Anderson, 2004 ). AIC rewards goodness of fit and includes a penalty that is an increasing function of the number of parameters estimated. Burnham and Anderson ( 2004 ) recommend rescaling all the observed AIC values before selecting the best fitting model according to the following formula: Δi = AICi-AICmin, where AICmin is the minimum of the observed AIC values (among competing models). Practical guidelines suggest that a model which differs less than Δi = 2 from the best fitting model (which has Δi = 0) in a specific dataset is said to be “strongly supported by evidence”; if the difference lies between 4 ≤ and ≤ 7 there is considerably less support, whereas models with Δi > 10 have essentially no support.

We created two parallel forms of the prosociality scale by following the procedure described in Little et al. ( 2002 , p. 166). In Table ​ Table1 1 we reported zero-order correlations, mean, standard deviation, reliability, skewness, and kurtosis for each parallel form. Cronbach's alphas were good (≥0.74), and correlations were all significant at p < 0.001. Indices of skewness and kurtosis for each parallel form in both groups did not exceed the value of |0.61|, therefore the univariate distribution of all the eight variables (4 variables for 2 groups) did not show substantial deviations from normal distribution (Curran et al., 1996 ). In order to check multivariate normality assumptions, we computed the Mardia's two-sided multivariate test of fit for skewness and kurtosis. Given the well-known tendency of this coefficient to easily reject H 0 , we set alpha level at 0.001 (in this regard, see Mecklin and Mundfrom, 2005 ; Villasenor Alva and Estrada, 2009 ). Results of Mardia's two-sided multivariate test of fit for skewness and kurtosis showed p -value of 0.010 and 0.030 respectively. Therefore, the study variables showed an acceptable, even if not perfect, multivariate normality. Given the modest deviation from the normality assumption we decided to use Maximum Likelihood as the estimation method.

Descriptive statistics and zero-order correlations for each group separately ( N = 250) .

(1) Pr1_T1 137
(2) Pr2_T10.81 137
(3) Pr1_T20.510.52 113
(4) Pr2_T20.480.590.78 113
3.443.493.623.71
0.750.720.600.62
−0.51−0.60−0.34−0.61
−0.060.43−0.130.02
(1) Pr1_T1 113
(2) Pr2_T10.76 113
(3) Pr1_T20.740.67 91
(4) Pr2_T20.650.730.78 91
3.423.493.493.55
0.700.710.650.64
−0.39−0.55−0.27−0.41
−0.12−0.01−0.44−0.49

Pr1_T1, Parallel form 1 of the Prosociality scale at Time 1; Pr2_T1, Parallel form 2 of the Prosociality scale at Time 1; Pr1_T2, Parallel form 1 of the Prosociality scale at Time 2; Pr2_T2, Parallel form 2 of the Prosociality scale at Time 2; M, mean; SD, standard deviation; Sk, skewness; Ku, kurtosis; n, number of subjects for each parallel form in each group .

Italicized numbers in diagonal are reliability coefficients (Cronbach's α) .

All correlations were significant at p ≤ 0.001 .

Evaluating the impact of the intervention

In Table ​ Table2 2 we reported the fit indexes for the three alternative models (see Appendices B1 – B4 for annotated M plus syntaxes for each of these). As hypothesized, Model 2 (see also Figure ​ Figure2) 2 ) was the best fitting model. Trajectories of Prosociality for intervention and control group separately are plotted in Figure ​ Figure3. 3 . The contribution of each group to overall chi-squared values highlighted how the lack of the slope factor in the intervention group results in a substantial misfit. On the contrary, adding a slope factor to control group did not significantly change the overall fit of the model [ Δ χ ( 1 ) 2 = 0.765, p = 0.381]. Of interest, the intercept mean and variance were equal across groups (see Table ​ Table2, 2 , Model 4) suggesting the equivalence of G1 and G2 at T1.

Goodness-of-fit indices for the tested models .

( ) G1( ) G2( )
Model 1 (G1 = A; G2 = A)1622.826(12) 18.779(6) 4.047(6) 0.9810.9810.085 [0.026,0.138]0.0811318.690(9.68)
Model 3 (G1 = B; G2 = B)1810.378(10) 7.096(5) 3.282(5) 0.9990.9990.017 [0.000,0.099]0.0451310.242(1.24)
( ) G1( ) G2( ) (Δ ) of M4 vs. M2
Model 41513.279(13) 7.920(6) 5.359(7) 1.001.000.013 [0.000,0.090]0.1602.136(2)

G1, intervention group; G2, control group; A, no-change model; B, latent change model; NFP, Number of Free Parameters; df, degrees of freedom; χ 2 G1, contribution of G1 to the overall chi-square value; χ 2 G2, contribution of G2 to the overall chi-square value; CFI, Comparative Fit Index; TLI, Tucker-Lewis Index; RMSEA, Root Mean Square Error of Approximation; CI, confidence intervals; SRMR, Standardized Root Mean Square Residual; AIC, Akaike's Information Criterion .

ΔAIC = Difference in AIC between the best fitting model (i.e., Model 2; highlighted in bold) and each model .

Model 4 = Model 2 with mean and variance of intercepts constrained to be equal across groups .

The full Mplus syntaxes for these models were reported in Appendices .

An external file that holds a picture, illustration, etc.
Object name is fpsyg-08-00223-g0002.jpg

Best fitting Second Order Multiple Group Latent Curve Model with parameter estimates for both groups . Parameters in bold were fixed. This model has parallel indicators (i.e., residual variances of observed indicators are equal within the same latent variable, in each group). All the intercepts of the observed indicators (Y) and endogenous latent variables (η) are fixed to 0 (not reported in figure). G1, intervention group; G2, control group; ξ 1 , intercept of prosociality; ξ 2 , slope of prosociality; η 1 , prosociality at T1; η 2 , prosociality at T2; Y, observed indicator of prosociality; ε, residual variance of observed indicator. n.s. p > 0.05; * p < 0.05; ** p < 0.01; *** p < 0.001.

An external file that holds a picture, illustration, etc.
Object name is fpsyg-08-00223-g0003.jpg

Trajectories of prosocial behavior for intervention group (G1) and control group (G2) in the best fitting model (Model 2 in Table ​ Table2 2 ) .

In Figure ​ Figure2 2 we reported all the parameters of the best fitting model, for both groups. The slope factor of intervention group has significant variance (φ 2 = 0.28, p < 0.001) and a positive and significant mean (κ 2 = 0.19, p < 0.01). Accordingly, we investigated the presence of the influence of the initial status on the treatment effect by regressing the slope onto the intercept in the intervention group. Note that this latter model has the same fit of Model 2; however, by implementing a slope instead of a covariance, allows to control the effect of the individuals' initial status on their subsequent change. The significant effect of the intercept (i.e., β = –0.62, p < 0.001) on the slope ( R 2 = 0.38) indicated that participants who were less prosocial at the beginning increased steeper in their prosociality after the intervention.

Data collected in intervention programs are often limited to two points in time, namely before and after the delivery of the treatment (i.e., pretest and posttest). When analyzing intervention programs with two waves of data, researchers so far have mostly relied on ANOVA-family techniques which are flawed by requiring strong statistical assumptions and assuming that participants are affected in the same fashion by the intervention. Although a general, average effect of the program is often plausible and theoretically sounded, neglecting individual variability in responding to the treatment delivered can lead to partial or incorrect conclusions. In this article, we illustrated how latent variable models can help overcome these issues and provide the researcher with a clear model-building strategy to evaluate intervention programs based on a pretest-posttest design. To this aim, we outlined a sequence of four steps to be followed which correspond to substantive research questions (e.g., efficacy of the intervention, normative development, etc.). In particular, Model 1, Model 2, and Model 3 included a different combinations of no-change and latent change models in both the intervention and control group (see Table ​ Table2). 2 ). These first three models are crucial to identify the best fitting trajectory of the targeted behavior across the two groups. Next, Model 4 was aimed at ascertaining if the intervention and control group were equivalent on their initial status (both in terms of average starting level and inter-individual differences) or if, vice-versa, this similarity assumption should be relaxed.

Importantly, even if the intervention and control group differ in their initial level, this should not prevent the researcher to investigate the presence of moderation effects—such as a treatment-initial status interaction—if this is in line with the researcher's hypotheses. One of the major advantage of the proposed approach, indeed, is the possibility to model the intervention effect as a random latent variable (i.e., the second-order latent slope) characterized by both a mean (i.e., the average change) and a variance (i.e., the degree of variability around the average effect). As already emphasized by Muthén and Curran ( 1997 ), a statistically significant variance indicates the presence of systematic individual differences in responding to the intervention program. Accordingly, the latent slope identified in the intervention group can be regressed onto the latent intercept in order to examine if participants with different initial values on the targeted behavior were differently affected by the program. Importantly, the analysis of the interaction effects does not need to be limited to the treatment-initial status interaction but can also include other external variables as moderators (e.g., sex, SES, IQ, behavioral problems, etc.; see Caprara et al., 2014 ).

To complement our formal presentation of the LCM procedure, we provided a real data example by re-analyzing the efficacy of the YPA, a universal intervention program aimed to promote prosociality in youths (Zuffianò et al., 2012 ). Our four-step analysis indicated that participants in the intervention group showed a small yet significant increase in their prosociality after 6 months, whereas students in the control group did not show any significant change (see Model 1, Model 2, and Model 3 in Table ​ Table2). 2 ). Furthermore, participants in the intervention and control group did not differ in their initial levels of prosociality (Model 4), thereby ensuring the comparability of the two groups. These results replicated those reported by Zuffianò et al. ( 2012 ) and further attested to the effectiveness of the YPA in promoting prosociality among adolescents. Importantly, our results also indicated that there was a significant variability among participants in responding to the YPA program, as indicated by the significant variance of the latent slope. Accordingly, we explored the possibility of a treatment-initial status interaction. The significant prediction of the slope by the intercept indicated that, after 6 months, those participants showing lower initial levels of prosociality were more responsive to the intervention delivered. On the contrary, participants who were already prosocial at the pretest remained overall stable in their high level of prosociality. Although this effect was not hypothesized a priori , we can speculate that less prosocial participants were more receptive to the content of the program because they appreciated more than their (prosocial) counterparts the discussion about the importance and benefits of prosociality, topics that, very likely, were relatively new for them. However, it is important to remark that the goal of the YPA was to merely sensitize youth to prosocial and empathic values and not to change their actual behaviors. Accordingly, our findings cannot be interpreted as an increase in prosocial conducts among less prosocial participants. Future studies are needed to examine to what extent the introduction of the YPA in more intensive school-based intervention programs (see Caprara et al., 2014 ) could represent a further strength to promote concrete prosocial behaviors.

Limitations and conclusions

Albeit the advantages of the proposed LCM approach, several limitations should be acknowledged. First of all, the use of a second order LCM with two available time points requires that the construct is measured by more than one observed indicators. As such, this technique cannot be used for single-item measures (e.g., Lucas and Donnellan, 2012 ). Second, as any structural equation model, our SO-MG-LCM makes the strong assumption that the specified model should be true in the population. An assumption that is likely to be violated in empirical studies. Moreover, it requires to be empirically identified, and thus an entire set of constraints that leave aside substantive considerations. Third, in this paper, we restricted our attention to the two parallel indicators case to address the more basic situation that a researcher can encounter in the evaluation of a two time-point intervention. Our aim was indeed to confront researchers with the more restrictive case, in terms of model identification. The case in which only two observed indicators are available is indeed, in our opinion, one of the more intimidating for researchers. Moreover, when a scale is composed of a long set of items or the target construct is a second order-construct loaded by two indicators (e.g., as in the case of psychological resilience; see Alessandri et al., 2012 ), and the sample size is not optimal (in terms of the ratio estimated parameters/available subjects) it makes sense to conduct measurement invariance test as a preliminary step, “before” testing the intervention effect, and then use the approach described above to be parsimonious and maximize statistical power. In these circumstances, the interest is indeed on estimating the LCM, and the invariance of indicators likely represent a prerequisite. Measurement invariance issues should never be undervalued by researchers. Instead, they should be routinely evaluated in preliminary research phases, and, when it is possible, incorporated in the measurement model specification phase. Finally, although intervention programs with two time points can still offer useful indications, the use of three (and possibly more) points in time provides the researcher with a stronger evidence to assess the actual efficacy of the program at different follow-up. Hence, the methodology described in this paper should be conceived as a support to take the best of pretest-posttest studies and not as an encouragement to collect only two-wave data. Fourth, SEM techniques usually require the use of relatively larger samples compared to classic ANOVA analyses. Therefore, our procedure may not be suited for the evaluation of intervention programs based on small samples. Although several rules of thumb have been proposed in the past for conducting SEM (e.g., N > 100), we encourage the use of Monte Carlo simulation studies for accurately planning the minimum sample size before starting the data collection (Bandalos and Leite, 2013 ; Wolf et al., 2013 ).

Despite these limitations, we believe that our LCM approach could represent a useful and easy-to-use methodology that should be in the toolbox of psychologists and prevention scientists. Several factors, often uncontrollable, can oblige the researcher to collect data from only two points in time. In front of this (less optimal) scenario, all is not lost and researchers should be aware that more accurate and informative analytical techniques than ANOVA are available to assess intervention programs based on a pretest-posttest design.

Author contributions

GA proposed the research question for the study and the methodological approach, and the focus and style of the manuscript; he contributed substantially to the conception and revision of the manuscript, and wrote the first drafts of all manuscript sections and incorporated revisions based on the suggestions and feedback from AZ and EP. AZ contributed the empirical data set, described the intervention and part of the discussion section, and critically revised the content of the study. EP conducted analyses and revised the style and structure of the manuscript.

The authors thank the students who participated in this study. This research was supported in part by a Research Grant (named: “Progetto di Ateneo”, No. 1081/2016) awarded by Sapienza University of Rome to GA, and by a Mobility Research Grant (No. 4389/2016) awarded by Sapienza University of Rome to EP.

Conflict of interest statement

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

1 Directed by Leder ( 2000 ).

2 Importantly, although classrooms were randomized across the two conditions (i.e., intervention group and control group), the selection of the four classrooms in each school was not random (i.e., each classroom in school X did not have the same probability to participate in the YPA). In detail, participating classrooms were chosen according to the interest in the project showed by the head teachers.

Supplementary material

The Supplementary Material for this article can be found online at: http://journal.frontiersin.org/article/10.3389/fpsyg.2017.00223/full#supplementary-material

  • Achenbach T. M. (2017). Future directions for clinical research, services, and training: evidence-based assessment across informants, cultures, and dimensional hierarchies . J. Clin. Child Adolesc. Psychol. 46 , 159–169. 10.1080/15374416.2016.1220315 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Alessandri G., Vecchione M., Caprara G. V., Letzring T. D. (2012). The ego resiliency scale revised: a crosscultural study in Italy, Spain, and the United States . Eur. J. Psychol. Assess. 28 , 139–146. 10.1027/1015-5759/a000102 [ CrossRef ] [ Google Scholar ]
  • Bandalos D. L., Leite W. (2013). Use of Monte Carlo studies in structural equation modeling research , in Structural Equation Modeling: A Second Course, 2nd Edn. , eds Hancock G. R., Mueller R. O. (Charlotte, NC: Information Age Publishing; ), 625–666. [ Google Scholar ]
  • Bandura A. (1977). Self-efficacy: toward a unifying theory of behavioral change . Psychol. Rev. 84 , 191–215. 10.1037/0033-295X.84.2.191 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Bishop J., Geiser C., Cole D. A. (2015). Modeling latent growth with multiple indicators: a comparison of three approaches . Psychol. Methods 20 , 43–62. 10.1037/met0000018 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Bollen K. A. (1989). Structural Equations with Latent Variables . New York, NY: Wiley. [ Google Scholar ]
  • Bollen K. A., Curran P. J. (2006). Latent Curve Models: A Structural Equation Perspective . Hoboken, NJ: Wiley. [ Google Scholar ]
  • Brown T. A. (2015). Confirmatory Factor Analysis for Applied Research . New York, NY: The Guilford Press. [ Google Scholar ]
  • Burnham K. P., Anderson D. R. (2004). Multimodel inference: understanding AIC and BIC in model selection . Sociol. Methods Res. 33 , 261–304. 10.1177/0049124104268644 [ CrossRef ] [ Google Scholar ]
  • Caprara G. V., Alessandri G., Eisenberg N. (2012). Prosociality: the contribution of traits, values, and self-efficacy beliefs . J. Pers. Soc. Psychol. 102 , 1289–1303. 10.1037/a0025626 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Caprara G. V., Luengo Kanacri B. P., Gerbino M., Zuffianò A., Alessandri G., Vecchio G., et al. (2014). Positive effects of promoting prosocial behavior in early adolescents: evidence from a school-based intervention . Int. J. Behav. Dev. 4 , 386–396. 10.1177/0165025414531464 [ CrossRef ] [ Google Scholar ]
  • Caprara G. V., Steca P., Zelli A., Capanna C. (2005). A new scale for measuring adults' prosocialness . Eur. J. Psychol. Assess. 21 , 77–89. 10.1027/1015-5759.21.2.77 [ CrossRef ] [ Google Scholar ]
  • Cole D. A., Preacher K. J. (2014). Manifest variable path analysis: potentially serious and misleading consequences due to uncorrected measurement error . Psychol. Methods 19 , 300–315. 10.1037/a0033805 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Cook T. D., Campbell D. T. (1979). Quasi-Experimentation: Design & Analysis Issues for Field Settings . Boston, MA: Houghton Mifflin. [ Google Scholar ]
  • Cronbach L. J., Snow R. E. (1977). Aptitudes and Instructional Methods: A Handbook for Research on Interactions . New York, NY: Irvington. [ Google Scholar ]
  • Curran P. J., Muthén B. O. (1999). The application of latent curve analysis to testing developmental theories in intervention research . Am. J. Commun. Psychol. 27 , 567–595. 10.1023/A:1022137429115 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Curran P. J., West S. G., Finch J. F. (1996). The robustness of test statistics to nonnormality and specification error in confirmatory factor analysis . Psychol. Methods 1 , 16–29. 10.1037/1082-989X.1.1.16 [ CrossRef ] [ Google Scholar ]
  • Eisenberg N., Spinrad T. L., Knafo-Noam A. (2015). Prosocial development , in Handbook of Child Psychology and Developmental Science Vol. 3, 7th Edn. , eds Lamb M. E., Lerner R. M. (Hoboken, NJ: Wiley; ), 610–656. [ Google Scholar ]
  • Enders C. K. (2010). Applied Missing Data Analysis . New York, NY: Guilford Press. [ Google Scholar ]
  • Geiser C., Keller B. T., Lockhart G. (2013). First-versus second-order latent growth curve models: some insights from latent state-trait theory . Struct. Equ. Modeling 20 , 479–503. 10.1080/10705511.2013.797832 [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Greenberg M. T., Domitrovich C., Bumbarger B. (2001). The prevention of mental disorders in school-aged children: current state of the field . Prevent. Treat. 4 :1a 10.1037/1522-3736.4.1.41a [ CrossRef ] [ Google Scholar ]
  • Grissom R. J., Kim J. J. (2012). Effect Sizes for Research: Univariate and Multivariate Applications, 2nd Edn . New York, NY: Routledge. [ Google Scholar ]
  • Kline R. B. (2016). Principles and Practice of Structural Equation Modeling, 4th Edn . New York, NY: The Guilford Press. [ Google Scholar ]
  • Leder M. (Director). (2000). Pay it Forward [Motion Picture]. Burbank, CA: Warner Bros. [ Google Scholar ]
  • Little T. D. (1997). Mean and covariance structures (MACS) analyses of cross-cultural data: practical and theoretical issues . Multivariate Behav. Res. 32 , 53–76. 10.1207/s15327906mbr3201_3 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Little T. D. (2013). Longitudinal Structural Equation Modeling . New York, NY: The Guilford Press. [ Google Scholar ]
  • Little T. D., Card N. A., Preacher K. J., McConnell E. (2009). Modeling longitudinal data from research on adolescence , in Handbook of Adolescent Psychology, Vol. 2, 3rd Edn. , eds Lerner R. M., Steinberg L. (Hoboken, NJ: Wiley; ), 15–54. [ Google Scholar ]
  • Little T. D., Cunningham W. A., Shahar G., Widaman K. F. (2002). To parcel or not to parcel: exploring the question, weighing the merits . Struct. Equ. Modeling 9 , 151–173. 10.1207/S15328007SEM0902_1 [ CrossRef ] [ Google Scholar ]
  • Lucas R. E., Donnellan M. B. (2012). Estimating the reliability of single-item life satisfaction measures: results from four national panel studies . Soc. Indic. Res. 105 , 323–331. 10.1007/s11205-011-9783-z [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Malti T., Noam G. G., Beelmann A., Sommer S. (2016). Good Enough? Interventions for child mental health: from adoption to adaptation—from programs to systems . J. Clin. Child Adolesc. Psychol. 45 , 707–709. 10.1080/15374416.2016.1157759 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • McArdle J. J. (2009). Latent variable modeling of differences and changes with longitudinal data . Annu. Rev. Psychol. 60 , 577–605. 10.1146/annurev.psych.60.110707.163612 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Mecklin C. J., Mundfrom D. J. (2005). A Monte Carlo comparison of the Type I and Type II error rates of tests of multivariate normality . J. Stat. Comput. Simul. 75 , 93–107. 10.1080/0094965042000193233 [ CrossRef ] [ Google Scholar ]
  • Meredith W., Teresi J. A. (2006). An essay on measurement and factorial invariance . Med. Care 44 , S69–S77. 10.1097/01.mlr.0000245438.73837.89 [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Meredith W., Tisak J. (1990). Latent curve analysis . Psychometrika 55 , 107–122. 10.1007/BF02294746 [ CrossRef ] [ Google Scholar ]
  • Micceri T. (1989). The unicorn, the normal curve, and other improbable creatures . Psychol. Bull. 105 , 156–166. 10.1037/0033-2909.105.1.156 [ CrossRef ] [ Google Scholar ]
  • Millsap R. E. (2011). Statistical Approaches to Measurement Invariance . New York, NY: Routledge. [ Google Scholar ]
  • Muthén B. O., Curran P. J. (1997). General longitudinal modeling of individual differences in experimental designs: a latent variable framework for analysis and power estimation . Psychol. Methods 2 , 371–402. 10.1037/1082-989X.2.4.371 [ CrossRef ] [ Google Scholar ]
  • Muthén L. K., Muthén B. O. (1998–2012). Mplus User's Guide, 7th Edn . Los Angeles, CA: Muthen & Muthen. [ Google Scholar ]
  • Nimon K. F. (2012). Statistical assumptions of substantive analyses across the general linear model: a mini-review . Front. Psychol. 3 :322. 10.3389/fpsyg.2012.00322 [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Roberts M. C., Ilardi S. S. (2003). Handbook of Research Methods in Clinical Psychology . Oxford: Blackwell Publishing. [ Google Scholar ]
  • Schmider E., Ziegler M., Danay E., Beyer L., Bühner M. (2010). Is it really robust? Reinvestigating the robustness of ANOVA against violations of the normal distribution assumption . Methodology 6 , 147–151. 10.1027/1614-2241/a000016 [ CrossRef ] [ Google Scholar ]
  • Steyer R., Eid M., Schwenkmezger P. (1997). Modeling true intraindividual change: true change as a latent variable . Methods Psychol. Res. Online 2 , 21–33. [ Google Scholar ]
  • Tabachnick B. G., Fidell L. S. (2013). Using Multivariate Statistics, 6th Edn . New Jersey, NJ: Pearson. [ Google Scholar ]
  • Villasenor Alva J. A., Estrada E. G. (2009). A generalization of Shapiro–Wilk's test for multivariate normality . Commun. Stat. Theor. Methods 38 , 1870–1883. 10.1080/03610920802474465 [ CrossRef ] [ Google Scholar ]
  • Wilcox R. R. (1998). The goals and strategies of robust methods . Br. J. Math. Stat. Psychol. 51 , 1–39. 10.1111/j.2044-8317.1998.tb00659.x [ CrossRef ] [ Google Scholar ]
  • Wolf E. J., Harrington K. M., Clark S. L., Miller M. W. (2013). Sample size requirements for structural equation models: an evaluation of power, bias, and solution propriety . Educ. Psychol. Meas. 76 , 913–934. 10.1177/0013164413495237 [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Zuffianò A., Alessandri G., Roche-Olivar R. (2012). Valutazione di un programma di sensibilizzazione prosociale: young prosocial animation [evaluation of a prosocial sensitization program: the young prosocial animation] . Psicol. Educ. 2 , 203–219. [ Google Scholar ]

Our systems are now restored following recent technical disruption, and we’re working hard to catch up on publishing. We apologise for the inconvenience caused. Find out more: https://www.cambridge.org/universitypress/about-us/news-and-blogs/cambridge-university-press-publishing-update-following-technical-disruption

We use cookies to distinguish you from other users and to provide you with a better experience on our websites. Close this message to accept cookies or find out how to manage your cookie settings .

Login Alert

  • > The Cambridge Handbook of Research Methods and Statistics for the Social and Behavioral Sciences
  • > Debriefing and Post-Experimental Procedures

post experimental questionnaire

Book contents

  • The Cambridge Handbook of Research Methods and Statistics for the Social and Behavioral Sciences
  • Cambridge Handbooks in Psychology
  • Copyright page
  • Contributors
  • Part I From Idea to Reality: The Basics of Research
  • Part II The Building Blocks of a Study
  • 9 Participant Recruitment
  • 10 Informed Consent to Research
  • 11 Experimenter Effects
  • 12 Debriefing and Post-Experimental Procedures
  • Part III Data Collection
  • Part IV Statistical Approaches
  • Part V Tips for a Successful Research Career

12 - Debriefing and Post-Experimental Procedures

from Part II - The Building Blocks of a Study

Published online by Cambridge University Press:  25 May 2023

The steps social and behavioral scientists take after the end of a study are just as important as the steps taken before and during it. The goal of this chapter is to discuss the practical and ethical considerations that should be addressed before participants leave the physical or virtual study space. We review several post-experimental techniques, including the debriefing, manipulation checks, attention checks, mitigating participant crosstalk, and probing for participant suspicion regarding the purpose of the study. Within this review, we address issues with the implementation of each post-experimental technique as well as best practices for their use, with an emphasis placed on prevention of validity threats and the importance of accurate reporting of the steps taken after the experiment ends. Finally, we emphasize the importance of continuing to develop and empirically test post-experimental practices, with suggestions for future research.

Access options

Further reading.

For a detailed example of a funnel debriefing procedure and the empirical test of various post-experimental practices including suspicion probing, we recommend the following article:

For further discussion of the history and progression of manipulation checks as well as specific recommendations for their use, we recommend Table 4 in the following article:

We are proponents of manipulation checks (with the proper precautions), but criticisms of manipulation checks should be seriously considered. For further reading on critiques of manipulation check practices we recommend the following article:

Save book to Kindle

To save this book to your Kindle, first ensure [email protected] is added to your Approved Personal Document E-mail List under your Personal Document Settings on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part of your Kindle email address below. Find out more about saving to your Kindle .

Note you can select to save to either the @free.kindle.com or @kindle.com variations. ‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi. ‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.

Find out more about the Kindle Personal Document Service .

  • Debriefing and Post-Experimental Procedures
  • By Travis D. Clark , Ginette Blackhart
  • Edited by Austin Lee Nichols , Central European University, Vienna , John Edlund , Rochester Institute of Technology, New York
  • Book: The Cambridge Handbook of Research Methods and Statistics for the Social and Behavioral Sciences
  • Online publication: 25 May 2023
  • Chapter DOI: https://doi.org/10.1017/9781009010054.013

Save book to Dropbox

To save content items to your account, please confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account. Find out more about saving content to Dropbox .

Save book to Google Drive

To save content items to your account, please confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account. Find out more about saving content to Google Drive .

  • Access through  your organization
  • Purchase PDF

Article preview

Introduction, section snippets, references (25), cited by (6).

Elsevier

Journal of Economic Psychology

The reliability of questionnaires in laboratory experiments: what can we do ☆.

  • • Replication: standard measures of careless answering are well-correlated.
  • • Combined answer-quality index well-correlated with additional measures of diligence.
  • • Paying participants as soon as possible reduces answer quality.
  • • Waiting for all increases time costs. Intermediate procedure minimizes both problems.
  • • Higher payment increases answer quality; framing of payment irrelevant.

The questionnaire

Hypothesis 1

Constructing the unreliability indices

Discussion and conclusions, technical acknowledgements, taking the initiative. what characterizes leaders, european economic review, modeling altruism and spitefulness in experiments, review of economic dynamics, identifying the random responder, journal of psychology: interdisciplinary and applied, the relationship between economic preferences and psychological personality measures, annual review of economics, empirical development of a scale of patience, z-tree: zurich toolbox for ready-made economic experiments, experimental economics, lies in disguise. an experimental study on cheating, journal of the european economic association, subject pool recruitment procedures: organizing experiments with orsee, journal of the economic science association, r: past and future history.

  • KDE e.V. (2012). About kde. <http://www.kde.org/community/whatiskde/> Last accessed on 31st January,...

Risk, time pressure, and selection effects

Examining completion rates in web surveys via over 25,000 real-world surveys, social science computer review, app-based experiments, online belief elicitation methods.

Furthermore, if a study’s sample is likely to be composed of a majority of low-numeracy individuals, then various techniques such as greater reliance on visual aids can be used to enhance comprehension (Delavande & Rohwedder, 2008). Equally relevant is the question of how best to measure such covariates online given the results of Wolff (2019) suggesting that different methods (e.g. incentives’ size, timing and framing) significantly influence the quality of this data in laboratory experiments. It would also be worthwhile for future research to investigate the marginal costs and benefits of further adapting instructions and procedures for online environments.

Self-reported & revealed trust: Experimental evidence

This is introduced to provide incentives for truthful responses to the usually non-incentivised measures of personality. Recently, Wolff (2019) have found that the reliability of questionnaires varies under various experimental procedures, thus, highlighting the importance of understanding better how to increase response quality to surveys in experimental settings. The participants earned on average 12.51 GBP; the show-up fee was 5 GBP.

At the eve of the 40th anniversary of the Journal of Economic Psychology: Standards, practices, and challenges

Enhanced anonymity in tax experiments does not affect compliance.

The absence of a significant interaction between the anonymity manipulation and any of the tax-related parameters is reassuring, as it does not challenge published findings from experiments that do not employ such pronounced anonymity measures, as for instance complete absence of experimenter-subject interactions. Our findings – in general and especially with regard to the payment procedure – are in line with a recent study that shows that having participants enter their names during the experiment for receipt preparation does not lead to more socially desirable answers (Wolff, 2019). One important conclusion of our study is that past experiments investigating deterrence effects are thus not likely to be considerably influenced or biased by social desirability.

The Global Brain Health Survey: Development of a Multi-Language Survey of Public Views on Brain Health

Lab-like findings from online experiments

  • Original Paper
  • Published: 18 December 2021
  • Volume 7 , pages 184–193, ( 2021 )

Cite this article

post experimental questionnaire

  • Irene Maria Buso 1 ,
  • Daniela Di Cagno 2 ,
  • Lorenzo Ferrari 2 ,
  • Vittorio Larocca 3 ,
  • Luisa Lorè 4 ,
  • Francesca Marazzi 5 ,
  • Luca Panaccione 6 &
  • Lorenzo Spadoni   ORCID: orcid.org/0000-0002-1208-2897 2  

4038 Accesses

29 Citations

11 Altmetric

Explore all metrics

Laboratory experiments have been often replaced by online experiments in the last decade. This trend has been reinforced when academic and research work based on physical interaction had to be suspended due to restrictions imposed to limit the spread of Covid-19. Therefore, data quality and results from web experiments have become an issue which is currently investigated. Are there significant differences between lab experiments and online findings? We contribute to this debate via an experiment aimed at comparing results from a novel online protocol with traditional laboratory settings, using the same pool of participants. We find that participants in our experiment behave in a similar way across settings and that there are at best weakly significant and quantitatively small differences in behavior observed using our online protocol and physical laboratory setting.

Similar content being viewed by others

post experimental questionnaire

LIONESS Lab: a free web-based platform for conducting interactive experiments online

post experimental questionnaire

Open Lab: A web application for running and sharing online experiments

post experimental questionnaire

Online field experiments: a selective survey of methods

Avoid common mistakes on your manuscript.

1 Introduction

The spread of Covid-19 has temporarily prevented experimental subjects from physically entering labs. Still, the experimental approach remains a crucial tool to understand individual and group behavior. To overcome the problems raised by physical distancing, researchers have turned to online experiments and surveys employing different platforms. The validity of these protocols has been demonstrated by successfully replicating a series of classic experiments (Crump et al. 2013 ; Amir et al. 2012 ; Horton et al. 2013 ). Moreover, a recent strand of research focuses on the comparison of quality of data and reliability of results from different platforms and pools of subjects (see, e.g., Gupta et al. 2021 ; Peer et al. 2021 ; Litman et al. 2021 ).

Online experiments feature differences from physical ones that limit the benefits of fundamental aspects of the traditional experimental methods. A first issue concerns subjects dropping out during the experiment: dropouts are problematic both because they may result in (expensive) losses due to discarding observations and because they might be endogenous (Arechar et al. 2018 ). A second issue concerns participants’ limited attention, which could hinder the understanding of instructions, due to limited control: Chandler et al. ( 2014 ) show that subjects could engage in other activities while participating in an online experiment (e.g. watching TV, listening to music, chatting, etc.). A third issue concerns the difficulty to control the recruiting process.

During the pandemic, we developed a novel online protocol which replicates the main features of physical experiments and therefore addresses the most relevant problems mentioned above (see Buso et al. 2020 ). In particular, it ensures: (i) isolated and monitored subjects, (ii) interactions mediated by computers, (iii) anonymity of participants, (iv) immediate monetary reward, and (v) the same recruiting process as in the physical lab, which allows for a better control and ensures that participants are drawn from the standard sample.

To contribute to the current debate comparing web experimental datasets and those collected in the traditional physical lab, in October 2021 we collected data on three standard games (Ultimatum, Dictator and Public Good Game) in traditional physical lab sessions and in two types of online sessions, with and without video monitoring of participants. The different settings in data collection identify our three treatments, which hereinafter are referred to as Physical Lab; Online, monitoring; Online, no monitoring. We find that participants in our experiment behave in a similar way across settings and that there are at best weakly significant and quantitatively small differences in choice data between sessions online and in the physical lab. Therefore, we confirm the validity of our protocol for online experiments and its capability to overcome the aforementioned issues.

The paper is organized as follows: in Sect.  2 , we present our online protocol; in Sect.  3 , we describe the experimental design; in Sect.  4 , we present the results, comparing online and physical lab evidence; we conclude in Sect.  5 . In the supplementary online materials, we report the translated instructions (online Appendix A), the instructions in the original language (online Appendix B) and post-experimental questionnaire (online Appendix C), together with additional material regarding our protocol (online Appendix D and online Appendix E).

2 Experimental protocol

The online visually monitored sessions are organized as follows: we adopt an architecture of connected platforms, specifically ORSEE for recruitment (Greiner 2015 ), Cisco WebEx for (visual) monitoring, oTree (Chen et al. 2016 ) for running the experiment, and PayPal for payments. In the invitation (see online Appendix D), we remind participants that a PayPal account is necessary to participate and receive the final payment. For privacy reasons, participants are informed that during the experiment they will be connected, but not recorded, via audio/video with the experimenter during the whole session and, therefore, that they need a suitable device. Participants are also informed that supernumerary subjects will be paid only the show-up fee. Before the beginning of the session, the experimenter randomly allocates registered participants to individual virtual cubicles created using Cisco WebEx, sending them the corresponding link. During the experiment, participants are monitored via webcam and can communicate, via chat and microphone, privately with the experimenter. They cannot see nor talk to each other while the experimenter can talk publicly to all participants. A picture of the experimenter’s screen is provided in online Appendix E. As participants log in to Cisco WebEx, the experimenter checks that their webcam and microphone work properly, as well as the overall quality of their internet connection. After completing these checks, the experimenters communicate to participants the access procedure and send them individual and anonymous oTree links. After log-in, participants input their PayPal account in oTree, which will be used for payments. Footnote 1 As soon as all participants are ready, the experimenter plays a prerecorded audio file with instructions read aloud for all participants, which preserves common awareness and reduces session effects. Written instructions are also displayed on participants’ screens while the audio recording is playing and remain available, by clicking a dedicated button, during the whole experiment. At the end of the session, participants answer a final questionnaire. Once participants complete the questionnaire, they are shown a receipt with their payment data and leave their virtual cubicle.

The non-monitored sessions follow the same protocol, excluding video connection, but preserving the possibility for participants and experimenters to communicate via audio or the chat. The physical lab sessions follow the traditional protocol for experiments, for example as described by Weimann and Brosig-Koch ( 2019 ).

As mentioned in the introduction, we believe our protocol addresses the most common issues of online experiments: (i) reducing involuntary dropouts (since oTree links allow participants to re-join the session and continue the experiment) and voluntary ones (by constantly monitoring participants, and communicating with them through webcam and microphone); (ii) mitigating limited attention by experimenters reading instructions aloud simultaneously before the experiment begins and by reducing the participants’ engagement in other activities; (iii) controlling for participants’ characteristics via recruiting on ORSEE. Footnote 2

3 Experimental design

The experiment features three sequences with different order (between sessions) of one-shot dictator game (DG), ultimatum game (UG) and public good game (PGG), all without feedback. In DG and UG, the proposers’ endowment is 20 tokens. Subjects play using the strategy method and role reversal, indicating their offer as proposer in DG and UG, and the minimum amount they accept as receivers (i.e., the rejection threshold) in UG. In PGG, the participants’ endowment is 10 tokens and the MPCR is equal to 0.5. They contribute in groups of four and report own contribution to the public good.

Subjects are informed that, at the end of the experiment, one game is randomly selected for payment (1 token = 1 euro). In DG and in UG, each participant is randomly matched with another subject and randomly assigned to a role (either proposer or receiver). In PGG, each participant is randomly assigned to a group with other 3 subjects. This matching procedure, which is performed after participants face the three games, together with the absence of feedback information, guarantees the independence of individual choices. The experiment was programmed in oTree (Chen et al. 2016 ) and the subjects were paid in cash after the experiment in the physical lab and via PayPal after the online sessions.

The experiment is composed of 9 sessions run between October 15 and October 22, 2021 with a total of 183 participants, students from LUISS Guido Carli University recruited via ORSEE (Greiner 2015 ). In particular, we ran three sessions in the physical lab, three online sessions with visually monitored subjects, and three online sessions without visual monitoring. Moreover, for each setting we varied the order of the three games across sessions, so that in each treatment we have (i) one PGG-DG-UG session, (ii) one DG-UG-PGG session, and (iii) one UG-PGG-DG session. Sessions in the physical lab were run at CESARE Lab with 60 participants, online sessions involved 63 participants visually monitored Footnote 3 and 60 non-monitored participants. Footnote 4

Table 1 shows that the composition of the sample in the different treatments is balanced by demographic characteristics. The dummy Economics equals 1 when the participant is a student of Economics. Self-reported RA is the self-reported willingness to take risks in general on a {0,1, ...,10} scale, where 0, respectively 10 identifies a risk averse, respectively loving subject. Footnote 5 The dummy Resident equals 1 when the participant comes from the Italian region where LUISS Guido Carli University is located. Footnote 6 The dummy Center equals 1 when the participant is from the Center of Italy versus other areas. The dummy Easy equals 1 when the participant declared (s)he found the experiment easy. Footnote 7

In this section, we discuss the experimental results with evidence from the three games. We first present some descriptive results, and then the econometric analysis.

Figure 1 reports average choices for each of the three games by treatment with confidence intervals and shows that, overall, between-treatment differences are negligible. In DG average demand amounts to 73.2% (14.64 tokens) of the pie size (20 tokens): 71.3%, i.e., 14.267 tokens, in the physical lab; 77.5%, i.e., 15.508, in online with visual monitoring and 70.5%, i.e., 14.117, in online without visual monitoring. In UG, average proposer demand amounts to 61.35% (12.27 tokens) of the pie size (20 tokens), and in particular is equal to 59.8% in the physical lab, 60.6% in online with visual monitoring and 63.75% in online without visual monitoring. The average responder rejection threshold amounts to 37.5% (7.5 tokens) of the pie size: 35.8% in the physical lab, 38.25% in online with visual monitoring and 38.4% in online without visual monitoring. In PGG, average contribution amounts to 32% (3.18 tokens) of the endowment (10 tokens), and in particular it equals 36% in the physical lab, 27.5% in online with visual monitoring and 32.3% in online without visual monitoring.

figure 1

Average choices by games and treatments

To verify whether there are significant differences in choice data between online and physical lab sessions, we run two sets of regressions. In both analyses, Physical Lab and Online, no monitoring are compared with the baseline, i.e. Online, monitoring.

The first set of regressions aims at checking whether the between-treatment differences observed in Fig. 1 are statistically significant. Individual choices in each game are analysed separately via OLS regressions using treatment dummies as covariates. The results, reported in Table 2 , confirm the absence of treatment effects for both UG choices. For DG demand and PGG contributions, we find weakly significant effects between the two online treatment, and the physical lab and the baseline, respectively. Footnote 8

In the second set of OLS regressions we expand the set of covariates with dummies indicating the sequence according to which the games were played (with the sequence DG-UG-PGG used as baseline) and individuals’ characteristics. The latter include participants’ gender and age, whether they reside in the Center of Italy, Footnote 9 and self-reported risk attitude. Results are reported in Table 3 and confirm the same treatment effects observed in Table 2 . Furthermore, we find a significant positive (negative) effect on the contribution when the PGG is played first (by students of Economics).

5 Conclusion

We compare lab-data collected in physical and online labs using participants from the same pool of subjects to validate our lab-like methodology that ensures (visual) monitoring, common reading of instructions, and isolation of participants.

Results from UG, DG, and PGG show that there is only one weakly significant difference in one of the choices between the physical lab and the online setting with visual monitoring, but no significant differences between the physical lab and the online setting without visual monitoring. Therefore, data generated on the web with our protocol is comparable with that in the physical lab. Furthermore, we find only one weakly significant difference in one of the choices between the online setting with and without visual monitoring. Footnote 10 Overall, we have proposed a validation of our protocol which reduces the debated side-effects arising from online experiments.

Moreover, since our protocol is based on recruitment from a pool of registered students, for example, via ORSEE, it allows to control for their characteristics, such as experience, gender, or field of study. More importantly, it allows to run a new strand of experiments based on the interaction of participants located in different geographical areas, and therefore embedded in their own cultural environment, as if they were simultaneously in the same physical lab.

The software stores PayPal accounts in a file separate from that of the experimental decisions, so as to preserve anonymity.

Similar platforms can be easily adapted to the characteristics of the lab (e.g., IT resources and administrative constraints). For instance, Prolific could be used for payments and subject recruitment, while alternative software such as LIONESS Lab (see Giamattei et al. 2020 ), Veconlab (see http://veconlab.econ.virginia.edu/ ), zTree Unleashed (see Duch et al. 2020 ), could be used for the experiment.

One participant was excluded because of technical issues.

Subjects received the same invitation for both monitored and non-monitored online sessions. In both sessions they were informed that a webcam would be needed for the initial identification phase and during the experiment, and that registration implied the acceptance to be monitored. Although in non-monitored sessions subjects were asked to turn off the webcam after identification, the same invitation with the monitoring acceptance statement was sent for both sessions to avoid self-selection into monitored and non-monitored sessions.

Specifically, subjects answered the following question (see Dohmen et al. 2011 ): “Are you a person generally willing to face risks or do you prefer to avoid facing them? Please express one preference on a 0–10 scale, where 0 means ‘I do not want to take any risk’ and 10 means ‘I am very willing to take risks’.”

With this variable we aim at distinguishing the students who likely live with their families from those living with other students or workers.

The translated post-experimental questionnaire is reported in the supplementary online materials (online Appendix C).

Two sample t tests reveal no treatment effect for all choices when comparing the physical lab setting and the online setting without monitoring.

Results are not affected if we substitute this variable with Resident .

This result is not directly comparable to those of Gupta et al. ( 2021 ), since they compare different populations commonly employed in economic experiments.

Amir, O., Rand, D., & Kobi, Y. (2012). Economic games on the internet: The effect of \$1 stakes. PLoS One . https://doi.org/10.1371/journal.pone.0031461 .

Article   Google Scholar  

Arechar, A., Gätcher, S., & Molleman, L. (2018). Conducting interactive experiments online. Experimental Economics, 21 (1), 99–131. https://doi.org/10.1007/s10683-017-9527-2 .

Buso, I. M., De Caprariis, S., Di Cagno, D., Ferrari, L., Larocca, V., Marazzi, F., Panaccione, L., & Spadoni, L. (2020). The effects of covid-19 lockdown on fairness and cooperation: Evidence from a lablike experiment. Economics Letters, 196 (C), S0165176520303487. https://EconPapers.repec.org/RePEc:eee:ecolet:v:196:y:2020:i:c:s0165176520303487

Chandler, J., Mueller, P., & Paolacci, G. (2014). Nonnaiveté among Amazon Mechanical Turk workers: Consequences and solutions for behavioral researchers. Behavior Research Methods, 46, 112–130. https://doi.org/10.3758/s13428-013-0365-7 .

Chen, D., Schonger, M., & Wickens, C. (2016). oTree-an open-source platform for laboratory, online, and field experiments. Journal of Behavioral and Experimental Finance, 9 (C), 8–97. https://EconPapers.repec.org/RePEc:eee:beexfi:v:9:y:2016:i:c:p:88-97

Crump, M., McDonnell, J., & Gureckis, T. (2013). Evaluating Amazon’s Mechanical Turk as a tool for experimental behavioral research. PLoS One, 8 (3), e57410. https://doi.org/10.1371/journal.pone.0057410 .

Dohmen, T., Falk, A., Huffman, D., Sunde, U., Schupp, J., & Wagner, G. G. (2011). Individual risk attitudes: Measurement, determinants, and behavioral consequences. Journal of the European Economic Association, 9 (3), 522–550. https://doi.org/10.1111/j.1542-4774.2011.01015.x .

Duch, M. L., Grossmann, M. R. P., & Lauer, T. (2020). z-Tree unleashed: A novel client-integrating architecture for conducting z-Tree experiments over the internet. Journal of Behavioral and Experimental Finance, 28 (3), 100400.

Giamattei, M., Yahosseini, K. S., Gächter, S., & Molleman, L. (2020). LIONESS Lab: A free web-based platform for conducting interactive experiments online. Journal of the Economic Science Association, 6 (1), 95–111. https://doi.org/10.1007/s40881-020-00087- . https://ideas.repec.org/a/spr/jesaex/v6y2020i1d10.1007_s40881-020-00087-0.html

Greiner, B. (2015). Subject pool recruitment procedures: Organizing experiments with ORSEE. Journal of the Economic Science Association, 1 (1), 114–125. https://doi.org/10.1007/s40881-015-0004-4 .

Gupta, N., Rigotti, L., & Wilson, A. (2021). The experimenters’ dilemma: Inferential preferences over populations. arXiv:2107.05064

Horton, J., Rand, D., & Zeckhauser, R. (2013). The online laboratory: Conducting experiments in a real labor market. Experimental Economics, 14, 399–425. https://doi.org/10.1007/s10683-011-9273-9 .

Litman, L., Moss, A., Rosenzweig, C., & Robinson, J. (2021). Reply to MTurk, Prolific or panels? Choosing the right audience for online research. https://ssrn.com/abstract=3775075 or https://doi.org/10.2139/ssrn.3775075

Peer, E., Rothschild, D., Gordon, A., Evernden, Z., & Damer, E. (2021). Data quality of platforms and panels for online behavioral research. Behavior Research Methods . https://doi.org/10.3758/s13428-021-01694-3 .

Weimann, J., & Brosig-Koch, J. (2019). Methods in Experimental Economics . Springer.

Book   Google Scholar  

Download references

Author information

Authors and affiliations.

Department of Economics, Ca’ Foscari University of Venice, Venice, Italy

Irene Maria Buso

Department of Economics and Finance, Luiss University, Rome, Italy

Daniela Di Cagno, Lorenzo Ferrari & Lorenzo Spadoni

Center for Experimental Studies of Internet, Entertainment and Gambling (CESIEG), Luiss University, Rome, Italy

Vittorio Larocca

Department of Economics, University of Innsbruck, Innsbruck, Austria

Centre for Economic and International Studies (CEIS), University of Rome Tor Vergata, Rome, Italy

Francesca Marazzi

Department of Economics and Law, Sapienza University of Rome, Rome, Italy

Luca Panaccione

You can also search for this author in PubMed   Google Scholar

Corresponding author

Correspondence to Lorenzo Spadoni .

Additional information

Publisher's note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

We thank the Editors, Maria Bigoni and Dirk Engelmann, and two anonymous referees for useful comments. We also thank Sofia De Caprariis for her assistance during this project.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary file1 (PDF 491 KB)

Rights and permissions.

Reprints and permissions

About this article

Buso, I.M., Di Cagno, D., Ferrari, L. et al. Lab-like findings from online experiments. J Econ Sci Assoc 7 , 184–193 (2021). https://doi.org/10.1007/s40881-021-00114-8

Download citation

Received : 23 December 2020

Revised : 29 November 2021

Accepted : 29 November 2021

Published : 18 December 2021

Issue Date : December 2021

DOI : https://doi.org/10.1007/s40881-021-00114-8

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Methodology
  • Experiments
  • Lab-like data

JEL Classification

  • Find a journal
  • Publish with us
  • Track your research

IMAGES

  1. Post Experimental Questionnaire

    post experimental questionnaire

  2. Items of the post experimental questionnaire

    post experimental questionnaire

  3. Post-experimental questionnaire: Q1, Q2, and Q3.

    post experimental questionnaire

  4. Longhand group post experimental questionnaire responses

    post experimental questionnaire

  5. Responses to the post-experimental questionnaire for the replica Night

    post experimental questionnaire

  6. Items of the post experimental questionnaire

    post experimental questionnaire

VIDEO

  1. Battles

  2. 5. What is the diagnosis?

  3. Create a post sign up questionnaire in Framer with FramerAuth

  4. Post experimental lab tests…🧪✍🏼 #chemistry #periodictable #science #elements

  5. Hadoken

  6. My Disco-German for Attention

COMMENTS

  1. PDF Incentives and Random Answers in Post-Experimental Questionnaires

    Incentives and Random Answers in Post-Experimental Questionnaires§ Lisa Bruttela and Irenaeus Wolffb a University of Potsdam, Department of Economics and Social Sciences, August-Bebel-Str. 89, 14482 Potsdam, Germany, [email protected] b Thurgau Institute of Economics (TWI) / University of Konstanz, Hauptstrasse 90, 8280 Kreuzlingen, Switzerland. wolff@twi-kreuzlingen.ch

  2. Post-Test Surveys: Definition, Elements & How to Create One

    So, by participating in a post-test questionnaire, you make things more user-friendly and enjoyable for others. Key elements of an effective post-test survey. When you're creating a post-test survey to gather valuable feedback, there are some important things to keep in mind. These key elements will help you make sure your survey is effective:

  3. PDF Post-Experiment Questionnaire

    Post-Experiment Questionnaire. II. Groupness. Please circle the picture below which best describes your relationship with the robot. . III. Feelings and emotions. This scale consists of a number of words that describe different feelings and emotions. Read each item and then mark the appropriate answer in the space next to that word.

  4. PDF GUIDANCE FOR PRE- and POST-TEST DESIGN

    questionnaire is applied before participation begins (pre-test or baseline), and re-applied after a set period, or at the end of the program (post-test or endline). Pre- and post-tests can be given in writing or orally. The goal of this guidance is to help programs avoid some of the most common errors in use of pre- and post-evaluation.

  5. Quasi-Experimental Design (Pre-Test and Post-Test Studies) in

    Because quasi-experimental research is not truly experimental in design, outcome causality cannot be determined, rather associations between interventions and outcomes are made. As far back as the 18th Century, pre-test and post-test research methods have been used in many fields, including medicine-nursing, health, mental health, and education.

  6. Post-experiment questionnaire

    Post experiment questionnaire. pdf (147.22 kB) File info This item contains files with download restrictions. Fullscreen. Post-experiment questionnaire. Cite Download (147.22 kB)Share Embed. dataset. posted on 2022-04-24, 15:27 authored by Heidi Silvennoinen Heidi Silvennoinen, Saskia Kuliga, Pieter Herthogs, Daniela Rodrigues Recchia, Bige Tuncer.

  7. The effects of experimentally induced choice on elementary school

    Post-experimental questionnaire and debriefing. After the activities, all children gathered in their classrooms. They then completed a second questionnaire concerning the measurement of perceived need satisfaction and the different indicators of intrinsic motivation. Finally, the experimenter debriefed the participants, revealing the real goal ...

  8. Pretest-Posttest Designs

    For many true experimental designs, pretest-posttest designs are the preferred method to compare participant groups and measure the degree of change occurring as a result of treatments or interventions. Pretest-posttest designs grew from the simpler posttest only designs, and address some of the issues arising with assignment bias and the ...

  9. Pretest-Posttest Design: Definition & Examples

    The teacher then uses a certain teaching technique for one week and administers a post-test of similar difficulty. She then analyzes the differences between the pre-test and post-test scores to see if the teaching technique had a significant effect on scores. Experimental Research. 1. Randomly assign individuals to a treatment group or control ...

  10. Breakthrough or One-Hit Wonder?

    The planned post-experimental questionnaire contained questions (adapted from Allen & Madden, 1985) pertaining to demand artifacts (see supplementary materials). Two weeks later, 116 of the 195 participants filled out a secondary (not planned in preregistered proposal) online post-experimental questionnaire focusing on possible variations in ...

  11. PDF Checklist for Quasi-experimental Studies (Non-randomized Experimental

    The systematic review is essentially an analysis of the available literature (that is, evidence) and a. judgment of the effectiveness or otherwise of a practice, involving a series of complex steps. JBI takes a. particular view on what counts as evidence and the methods utilised to synthesise those different types of. evidence.

  12. PDF Post-experiment Questionnaire.

    Post-experiment Questionnaire. After the experimental task of Study 1, participants were asked to answer a yes/no question: Do ... After the experimental task of Study 2, participants were asked ...

  13. Example post-test questionnaire (part 1).

    Example post-test questionnaire (part 1). Please answer the following questions. Circle the dot that best describes your answer. Please do not leave any question blank. For each word below, please indicate how well it describes the site. Describes Describes.

  14. Pre- and Post- Questionnaire Testing

    You only have to fill four of the spreadsheets, because when done correctly, the fifth will automatically analyse your test results for you. The first sheet to fill for you is the Questionnaire Pre, your pre-test questions. We have given you space for 15 questions, with up to four answer options for each.

  15. Evaluating Intervention Programs with a Pretest-Posttest Design: A

    The study followed a quasi-experimental design, with both the intervention and control groups assessed at two different time points: Before (Time 1) YPA intervention and 6 months after (Time 2). Twelve classrooms from three schools (one middle school and two high schools) participated in the study during the school year 2008-2009.

  16. Pretest-Posttest Design

    A pretest-posttest research design must provide participants with the same assessment measures before and after treatment in order to determine if any changes can be connected to the treatment. A ...

  17. Post-experiment survey questionnaire

    View. Download Table | Post-experiment survey questionnaire from publication: Evaluating distributed inspection through controlled experiments | Inspection methods can be classified according to ...

  18. 12

    We review several post-experimental techniques, including the debriefing, manipulation checks, attention checks, mitigating participant crosstalk, and probing for participant suspicion regarding the purpose of the study. Within this review, we address issues with the implementation of each post-experimental technique as well as best practices ...

  19. The reliability of questionnaires in laboratory experiments: What can

    Furthermore, 59% at least "tend to agree" that such post-experimental questionnaires are becoming more and more important in the field of experimental economics, while only 11% "disagree" or "strongly disagree". 3. Let us now turn to whether there is non-negligible heterogeneity in the administration of such questionnaires.

  20. PDF Appendix 1: Post-task questionnaire (English version)

    Appendix 1: Post-task questionnaire (English version) Appendix 2: Post-experiment questionnaire. Appendix 3: Warm-up task worksheet. The pre-task warm up activity can be viewed in its entirety here. Statement number. 12 Statement I never feel quite sure of myself when I am speaking in my foreign language class.

  21. Lab-like findings from online experiments

    The spread of Covid-19 has temporarily prevented experimental subjects from physically entering labs. Still, the experimental approach remains a crucial tool to understand individual and group behavior. ... and post-experimental questionnaire (online Appendix C), together with additional material regarding our protocol (online Appendix D and ...

  22. PDF Post experiment questionnaire

    MAMUT p ost experiment questionnaire UFP/CEREM - 6/2001 , page 2 /2 6 How did you consider the task of interact with the system? very easy 1 easy 2 neutral 3 difficult 4 very difficult 5 don't know 6 7 How did you consider the way as the system allows navigation within the available information? very helpful 1 quite helpful 2 neutral 3 quite ...

  23. Experimental investigation of heat treatments on the mechanical

    Metal 3D printing is a rapidly advancing manufacturing technology that enables the production of complex lattice structures with high precision and accuracy. 1 Lattice structures offer unique mechanical properties, including high strength-to-weight ratios, energy absorption, and improved heat transfer, making them ideal for a variety of applications in industries such as aerospace, biomedical ...