Using Experimental Design in Evaluation

A recent issue of New Directions in Evaluation, (No. 152, Winter, 2016) “Social Experiments in Practice: The What, Why, When, Where, and How of Experimental Design and Analyses,” is devoted to the use of randomized experiments in program evaluation. The eight articles in this thematic volume discuss different aspects of experimental design—the practical and theoretical benefits and challenges of applying randomized controlled trials (RCTs) to the evaluation of programs. Although it’s beyond the scope of this blogpost to discuss each of the articles in detail, I’d like to mention a few insights offered by the authors and review the advantages and challenges of experimental design.

Random assignment helps rule out alternative explanations for outcomes
Experimental design in the social sciences, are studies that randomly assign subjects (i.e., program participants) to treatment and control groups, then measure changes (i.e., average changes) in both groups to determine if a program, or “treatment,” has had a desired effect on those who receive the treatment. As the issue’s editor, Laura Peck, observes, “…when it comes to the question of cause and effect—the question of a program’s or policy’s impact, we assert that a randomized experiment should be the evaluation design of choice.” (p.11) Indeed, experimental design studies—whose origins are in the natural sciences, and whose benefits are perhaps most frequently demonstrated in FDA testing of pharmaceuticals—is thought to be the “gold standard” for scientifically establishing causation. Random assignment of individuals to two groups—one that receives treatment and one that does not receive treatment—is the best way to establish whether desired changes are the result of what happens in the treatment (i.e., program). As the editor observes, “This ‘coin toss’ (i.e., random assignment) to allocate access to treatment carries substantial power. It allows us to rule out alternative explanations for differences in outcomes between people who have access to a service and people who do not.” (p.11)

There are still concerns surrounding the use of experimental design
Although experimental design is viewed by many as the premier indicator of causation, it’s use in evaluations can have practical challenges. There are potentially legal and ethical concerns about non-treatment for control groups (especially in the fields of medicine and education). Additionally, some argue that experimental design, especially in complex social interventions, is unable to identify which specific component of a treatment is responsible for the observed differences in the treatment group (the “black box” phenomenon.). Michael Scriven observes that it is nearly impossible to create a truly “double blind” experiment in the social world (i.e. experiments where neither experimental subject nor the evaluator knows who is in the treatment who is in the control groups). Moreover, some argue that experimental design can be more labor and time-intensive than other study designs, and therefore, more costly.

Quasi- experimental design is useful for showing before and after changes
While experimental design is the most prestigious method for determining the causal effects of a program, initiative, or policy, it is far from a universally appropriate design for evaluations. Quasi-experimental design, for example, is often used to show pre- and post- changes in those who participate in a program or treatment, although quasi-experimental design is unable to unequivocally confirm whether such changes are attributable to the program. One form of a quasi-experimental design is the “non-equivalent (pre-test, post-test) control group design”. In this design, participants are assigned to two groups (although not randomly assigned.) Both groups take a pre-test and a post-test, but only one group, the experimental group, receives the treatment/program. (The key textbook resource on both experimental and non-experimental designs is Experimental and Quasi-Experimental Designs, by Shadish, Cook, and Campbell, Houghton Mifflin.)

There are, of course, a range of non-experimental designs that are used productively in evaluation. These range from case studies to observational studies, and rely on a variety of methods, largely qualitative, including phone and in-person interviews, focus groups, surveys, and document reviews. (See this page for a brief table comparing the characteristics of qualitative and quantitative methods of research. See also the National Science Foundation’s very helpful, “Overview of Qualitative Methods and Analytic Techniques”) Qualitative evaluation studies can be very effective, and are often used in a mixed methods approach to evaluation work.

Resources:

“Identifying and Implementing Educational Practices Supported by Rigorous Evidence: A User Friendly Guide”

“Designing Quasi-Experiments: Meeting What Works Clearinghouse Standards Without Random Assignment”

Good web-based resources on the subject of determining cause. Examples of research designs

“Overview of Qualitative Methods and Analytic Techniques”

“A Summative Evaluation of RCT Methodology: & An Alternative Approach to Causal Research,” Michael Scriven”

“Using Small-Scale Randomized Controlled Trials to Evaluate the Efficacy of New Curricular Materials”

“Example Evaluation Plan for a Quasi-Experimental Design”

Experimental and Quasi-Experimental Designs, by Shadish, Cook, and Campbell

“What is Evaluation?” – Gene Shackman