What is propensity score matching?

Core Function

Supposing that the introduction of a treatment, policy, or other intervention is the consequence of a complex web of both deliberate and random factors, propensity score matching (PSM) is a powerful tool with which researchers can observe the effect of this intervention. First presented in an article by Paul R. Rosenbaum and Donald Rubin in 1983, the method aims to reduce the bias produced by existing covariates that could influence whether or not the treatment is accepted, thus allowing a more accurate comparison between those with the treatment and those who, for whatever reason, did not avail of it. PSM serves as a meticulous tool, empowering researchers to observe the subtleties of how an intervention can alter outcomes.

Randomisation in experiments is often the best way to ensure the accuracy of a treatment outcome between treated and untreated groups. Unfortunately, in observational studies, randomisation may be difficult to achieve, and therefore treatment assignment bias is more likely to arise. To counter this potential issue, matching can be utilized as a technique to create samples that are comparable on observed covariates, as if they had been randomized. In doing so, it allows biased treatment assignment to essentially be mimicked, enabling more accurate estimation of a treatment effect.

For example, a growing body of research has established the alarming repercussions of smoking, with individuals taking up the habit often unaware of the health hazards they are subjecting their bodies to. To gain an optimum level of insight on the topic, observational studies are essential, though the ethical implications of randomly assigning people to the treatment “smoking” makes it impossible to do so. PSM is a way to control for any factors that could lead to a bias in the comparison between those who smoke and those who do not. It is a meticulous approach that seeks to ensure each group is comparable in terms of the control variables. By doing so, we can observe the true impact of smoking, with a better understanding of the dangers it can bring for those who take it up.

Problems

PSM has been widely touted for its ability to perform statistical matching between data sets, but there are serious drawbacks to its use. Research has demonstrated that when applied to data, PSM tends to be highly unbalanced, inefficient, overly reliant on model assumptions, and generally result in an overestimation of the truth. As a result, while the fundamental insights behind PSM remain, researchers in the field have suggested using other data-matching methods when striving for greater accuracy. Moreover, PSM also has other potential applications, such as in weighting and "doubly robust estimation" techniques.

Though Propensity Score Matching offers thorough control of observed covariates in its pursuit of causal inference, there are still potential issues lurking in the shadows. Even if the procedure accounts for the measurable traits that are known to play into assignment and outcome, it cannot capture latent factors that may very well remain skewed in the data. Additionally, it requires a large amount of data to properly compare outcome measures, which is not always available in small, non-randomly assigned samples. Thus, the method of Propensity Score Matching can only provide an incomplete picture of the results.

The rising concern of unintended consequences stemming from the use of matching methods has alarmed both theorists and practitioners alike. Leading figures in the field of causal inference, such as Judea Pearl, have argued that by relying solely on observed variables to pair participants, hidden biases may be created due to the presence of dormant, unobserved confounding factors. Put simply, it's a situation in which the experimenter is unable to identify and account for the effect of other variables that may influence a correlation between the independent and dependent variables.

To mitigate the risk of such skewing of results, Pearl asserts that only by modelling the qualitative causal relationships between treatment, outcome, and all observable and unobservable covariates – in a manner which satisfies the “backdoor criterion” – can bias reduction be effectively assured. His warning serves as a stark reminder of the importance of considering the complexity of these phenomena, and the need for deeper, more nuanced approaches to matching methods.

How should I do it?

The use of PSM has become increasingly popular among data scientists and researchers, with a range of availability across different programming packages. In particular, SAS features the PSMatch procedure and macro OneToManyMTCH, both of which match observations based on a propensity score; Stata provides a host of commands and the user-written psmatch2 as well as its built-in teffects psmatch command; SPSS offers users a dedicated dialog box; and Python’s library PsmPy offers an array of tools and functions for creating and matching observations. With its broad accessibility and well-established benefits, PSM has become a core component of the data science toolkit.