## Introduction to treatment effects in Stata: Part 2

This post was written jointly with David Drukker, Director of Econometrics, StataCorp.

In our last post, we introduced the concept of treatment effects and demonstrated four of the treatment-effects estimators that were introduced in Stata 13. Today, we will talk about two more treatment-effects estimators that use matching.

**Introduction**

Last time, we introduced four estimators for estimating the average treatment effect (ATE) from observational data. Each of these estimators has a different way of solving the missing-data problem that arises because we observe only the potential outcome for the treatment level received. Today, we introduce estimators for the ATE that solve the missing-data problem by matching.

Matching pairs the observed outcome of a person in one treatment group with the outcome of the “closest” person in the other treatment group. The outcome of the closest person is used as a prediction for the missing potential outcome. The average difference between the observed outcome and the predicted outcome estimates the ATE.

What we mean by “closest” depends on our data. Matching subjects based on a single binary variable, such as sex, is simple: males are paired with males and females are paired with females. Matching on two categorical variables, such as sex and race, isn’t much more difficult. Matching on continuous variables, such as age or weight, can be trickier because of the sparsity of the data. It is unlikely that there are two 45-year-old white males who weigh 193 pounds in a sample. It is even less likely that one of those men self-selected into the treated group and the other self-selected into the untreated group. So, in such cases, we match subjects who have approximately the same weight and approximately the same age.

This example illustrates two points. First, there is a cost to matching on continuous covariates; the inability to find good matches with more than one continuous covariate causes large-sample bias in our estimator because our matches become increasingly poor.

Second, we must specify a measure of similarity. When matching directly on the covariates, distance measures are used and the nearest neighbor selected. An alternative is to match on an estimated probability of treatment, known as the propensity score.

Before we discuss estimators for observational data, we note that matching is sometimes used in experimental data to define pairs, with the treatment subsequently randomly assigned within each pair. This use of matching is related but distinct.

**Nearest-neighbor matching**

Nearest-neighbor matching (NNM) uses distance between covariate patterns to define “closest”. There are many ways to define the distance between two covariate patterns. We could use squared differences as a distance measure, but this measure ignores problems with scale and covariance. Weighting the differences by the inverse of the sample covariance matrix handles these issues. Other measures are also used, but these details are less important than the costs and benefits of NNM dropping the functional-form assumptions (linear, logit, probit, etc.) used in the estimators discussed last time.

Dropping the functional-form assumptions makes the NNM estimator much more flexible; it estimates the ATE for a much wider class of models. The cost of this flexibility is that the NNM estimator requires much more data and the amount of data it needs grows with each additional continuous covariate.

In the previous blog entry, we used an example of mother’s smoking status on birthweight. Let’s reconsider that example.

. webuse cattaneo2.dta, clear

Now, we use **teffects nnmatch** to estimate the ATE by NNM.

. teffects nnmatch (bweight mmarried mage fage medu prenatal1) (mbsmoke) Treatment-effects estimation Number of obs = 4642 Estimator : nearest-neighbor matching Matches: requested = 1 Outcome model : matching min = 1 Distance metric: Mahalanobis max = 16 ------------------------------------------------------------------------------ | AI Robust bweight | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- ATE | mbsmoke | (smoker | vs | nonsmoker) | -210.5435 29.32969 -7.18 0.000 -268.0286 -153.0584 ------------------------------------------------------------------------------

The estimated ATE is -211, meaning that infants would weigh 211 grams less when all mothers smoked than when no mothers smoked.

The output also indicates that ties in distance caused at least one observation to be matched with 16 other observations, even though we requested only matching. NNM averages the outcomes of all the tied-in-distance observations, as it should. (They are all equally good and using all of them will reduce bias.)

NNM on discrete covariates does not guarantee exact matching. For example, some married women could be matched with single women. We probably prefer exact matching on discrete covariates, which we do now.

. teffects nnmatch (bweight mmarried mage fage medu prenatal1) (mbsmoke), /// ematch(mmarried prenatal1) Treatment-effects estimation Number of obs = 4642 Estimator : nearest-neighbor matching Matches: requested = 1 Outcome model : matching min = 1 Distance metric: Mahalanobis max = 16 ------------------------------------------------------------------------------ | AI Robust bweight | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- ATE | mbsmoke | (smoker | vs | nonsmoker) | -209.5726 29.32603 -7.15 0.000 -267.0506 -152.0946 ------------------------------------------------------------------------------

Exact matching on **mmarried** and **prenatal1** changed the results a little bit.

Using more than one continuous covariate introduces large-sample bias, and we have three. The option **biasadj()** uses a linear model to remove the large-sample bias, as suggested by Abadie and Imbens (2006, 2011).

. teffects nnmatch (bweight mmarried mage fage medu prenatal1) (mbsmoke), /// ematch(mmarried prenatal1) biasadj(mage fage medu) Treatment-effects estimation Number of obs = 4642 Estimator : nearest-neighbor matching Matches: requested = 1 Outcome model : matching min = 1 Distance metric: Mahalanobis max = 16 ------------------------------------------------------------------------------ | AI Robust bweight | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- ATE | mbsmoke | (smoker | vs | nonsmoker) | -210.0558 29.32803 -7.16 0.000 -267.5377 -152.5739 ------------------------------------------------------------------------------

In this case, the results changed by a small amount. In general, they can change a lot, and the amount increases with the number of continuous

covariates.

**Propensity-score matching**

NNM uses bias adjustment to remove the bias caused by matching on more than one continuous covariate. The generality of this approach makes it very appealing, but it can be difficult to think about issues of fit and model specification. Propensity-score matching (PSM) matches on an estimated probability of treatment known as the propensity score. There is no need for bias adjustment because we match on only one continuous covariate. PSM has the added benefit that we can use all the standard methods for checking the fit of binary regression models prior to matching.

We estimate the ATE by PSM using **teffects psmatch**.

. teffects psmatch (bweight) (mbsmoke mmarried mage fage medu prenatal1 ) Treatment-effects estimation Number of obs = 4642 Estimator : propensity-score matching Matches: requested = 1 Outcome model : matching min = 1 Treatment model: logit max = 16 ------------------------------------------------------------------------------ | AI Robust bweight | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- ATE | mbsmoke | (smoker | vs | nonsmoker) | -229.4492 25.88746 -8.86 0.000 -280.1877 -178.7107 ------------------------------------------------------------------------------

The estimated ATE is now -229, larger in magnitude than the NNM estimates but not significantly so.

**How to choose among the six estimatorsg**

We now have six estimators:

- RA: Regression adjustment
- IPW: Inverse probability weighting
- IPWRA: Inverse probability weighting with regression adjustment
- AIPW: Augmented inverse probability weighting
- NNM: Nearest-neighbor matching
- PSM: Propensity-score matching

The ATEs we estimated are

- RA: -277.06
- IPW: -275.56
- IPWRA: -229.97
- AIPW: -230.99
- NNM: -210.06
- PSM: -229.45

Which estimator should we use?

We would never suggest searching the above table for the result that most closely fits your wishes and biases. The choice of estimator needs to be made beforehand.

So, how do we choose?

Here are some rules of thumb:

- Under correct specification, all the estimators should produce similar results. (Similar estimates do not guarantee correct specification because all the specifications could be wrong.)
- When you know the determinants of treatment status, IPW is a natural base-case estimator.
- When you instead know the determinants of the outcome, RA is a natural base-case estimator.
- The doubly robust estimators, AIPW and IPWRA, give us an extra shot at correct specification.
- When you have lots of continuous covariates, NNM will crucially hinge on the bias adjustment, and the computation gets to be extremely difficult.
- When you know the determinants of treatment status, PSM is another base-case estimator.
- The IPW estimators are not reliable when the estimated treatment probabilities get too close to 0 or 1.

**Final thoughts**

Before we go, we reiterate the cautionary note from our last entry. Nothing about the mathematics of treatment-effects estimators magically extracts causal relationships from observational data. We cannot thoughtlessly analyze our data using Stata’s **teffects** commands and infer a causal relationship. The models must be supported by scientific theory.

If you would like to learn more about treatment effects in Stata, there is an entire manual devoted to the treatment-effects features in Stata 14; it includes a basic introduction, an advanced introduction, and many worked examples. In Stata, type **help teffects**:

. help teffects

**Title**

[TE] teffects—Treatment-effects estimation for observational data

**Syntax**

… <output omitted> …

The title **[TE] teffects** will be in blue, which means it’s clickable. Click on it to go to the *Treatment-Effects Reference Manual*.

Or download the manual from our website; visit

http://www.stata.com/manuals14/te/

**References**

Abadie, A., and Imbens, G. W. 2006. Large sample properties of matching estimators for average treatment effects. *Econometrica* 74: 235–267.

Abadie, A., and Imbens, G. W. 2011. Bias-corrected matching estimators for average treatment effects. *Journal of Business and Economic Statistics* 29: 1–11.

Cattaneo, M. D. 2010. Efficient semiparametric estimation of multi-valued treatment effects under ignorability. *Journal of Econometrics* 155: 138–154.