Difference-in-differences
What method will you use to see whether watching English porn (just an example🙃could be any program or policy) will help students learn English? A t-test between the treatment group and the control group? In this short post, I will elaborate on a method called difference-in-differences (DID) and discuss why it is particularly useful.
❓What is DID
Briefly, the difference-in-differences approach is helpful for understanding causality when randomisation is not possible. Talking about randomisation, we know that it is not possible to come to causal conclusions by just looking at changes in outcomes that happened before and after the treatment, as there might be other factors that influenced the outcome during that time. Additionally, we can't just compare the treatment and control groups either because there might be a selection bias, and the groups might have different characteristics that we cannot observe. Consider, are there any individual differences between those who watch English porn and those who do not?
Then, how does DID method help us deal with these difficulties? DID calculates the difference between the difference in outcomes experienced by the treatment group and the difference in outcomes experienced by the control group. I know it's confusing! Let's put it simply. We will now recruit some students and let them volunteer to enrol in either a treatment or control group based on their acceptance of English porn (started to think whether I shouldn't have this unethical example 😂). The students' English proficiency will be tested one month before the intervention, right before the intervention, and after the intervention (well, basically watching much English porn). We can see from the table here that these two groups are not randomised as expected.
Treatment | Control | |
---|---|---|
Before | 5 | 4.5 |
Right before intervention | 5.5 | 5 |
After intervention | 6.5 | 5.5 |
Click for the source of this figure
To use DID, we will mainly go through the following three steps.
- We will first calculate how the treatment group develop in the experiment, as our first difference.
- Then, we will do the same thing for the control group, to have our second difference.
- Now we would like to compare whether the intervention is effective, or whether the pattern of development between the two groups has changed after the intervention. In other words, we will calculate the difference between the differences in step 1 and step 2.
Then we will be able to tell if the intervention (watching a lot of English porn) changes the way the treatment group develops compared to the control group! For the fake data I provided here, we could see that, after the intervention, the treatment group grew faster (5.5→6.5) than the control group (5→5.5). It confirms that...🙃
Can I use it in all situations?
DID seems to be a 'perfect' method, right? BUT there are some restrictions or even limitations to it.
First, it can only be used when the two groups should experience the same trend of development without intervention. We sometimes call it Equal Trends Assumption. This assumption may not be explicitly proved, but its validity can be assessed in the following ways:
- Compare changes in the outcomes for the treatment and control groups repeatedly before the program is implemented (i.e. in t-3, t-2, t-1). If the outcome trend moves in parallel before the program began, it likely would have continued moving in tandem in the absence of the program.
- Perform a placebo test using a fake treatment group. The fake treatment group should be a group that was not affected by the program. A placebo test that reveals zero impact supports the equal-trend assumption.
- Perform a placebo test using a fake outcome. A placebo test that reveals zero impact supports the equal-trend assumption.
- Perform the difference-in-differences estimation using different comparison groups. Similar estimates of the impact of program confirms the equal-trend assumption.
However, if you do want to use DID when ETA is not assumed, read Abadie (2005) Semiparametric Difference-in-Difference Estimators. Review of Economic Studies for a weighting method for DID when the parallel trend assumption may not hold.
Apart from the links in the main text, some info was from:
WHO-Impact Evaluation in Practice: Chapter 6.
https://openknowledge.worldbank.org/handle/10986/25030
Some technical things you might want to know
DID belongs to Quasi-Experimental Methods, which aim to estimate program effects without reverse causality, confoundedness, or simultaneous causality (similar to real experiments!) The main difference between quasi-exp and real experiment is that the former will not randomise the sample into experiment/control groups. Instead, Quasi-Experimental methods rely on the sufficient element of randomness in existing situations, such as what happened in regression discontinuity design or event studies. They can also simulate an experimental setting by 'making' a control group to match the treatment group, like in propensity score matching. Other examples include instrumental variables and DID.