Ermistatas is our most popular t-shirt these days. See it and you will understand why.

We call the character Ermistatas and he is thinking—Ermistatas cogitatu. Notice the electricity bolts being emitted and received by his three antennae.

The shirt is popular even among those who do not use Stata and it’s leading them to ask questions. “Who or what is Ermistatas and why is he, she, or it deserving of a t-shirt?”. Then they add, “And why three and not the usual two antennae?”

Ermistatas is the creation of our arts-and-graphics department to represent Stata 15’s new commands for fitting Extended Regression Models—a term we coined. We call it ERMs for short. The new commands are Read more…

Categories: Statistics Tags: binary outcomes, confounders, continuous outcomes, endogeneity, endogenous selection, ERMs, extended regression models, Heckman, interval outcomes, intreg, linear regression, oprobit, ordered outcomes, probit, sample selection, statistics, treatment, treatment assignment, treatment effects, unobserved confounding
**Initial thoughts**

Estimating causal relationships from data is one of the fundamental endeavors of researchers, but causality is elusive. In the presence of omitted confounders, endogeneity, omitted variables, or a misspecified model, estimates of predicted values and effects of interest are inconsistent; causality is obscured.

A controlled experiment to estimate causal relations is an alternative. Yet conducting a controlled experiment may be infeasible. Policy makers cannot randomize taxation, for example. In the absence of experimental data, an option is to use instrumental variables or a control function approach.

Stata has many built-in estimators to implement these potential solutions and tools to construct estimators for situations that are not covered by built-in estimators. Below I illustrate both possibilities for a linear model and, in a later post, will talk about nonlinear models. Read more…

**Initial thoughts**

Estimating causal relationships from data is one of the fundamental endeavors of researchers. Ideally, we could conduct a controlled experiment to estimate causal relations. However, conducting a controlled experiment may be infeasible. For example, education researchers cannot randomize education attainment and they must learn from observational data.

In the absence of experimental data, we construct models to capture the relevant features of the causal relationship we have an interest in, using observational data. Models are successful if the features we did not include can be ignored without affecting our ability to ascertain the causal relationship we are interested in. Sometimes, however, ignoring some features of reality results in models that yield relationships that cannot be interpreted causally. In a regression framework, depending on our discipline or our research question, we give a different name to this phenomenon: endogeneity, omitted confounders, omitted variable bias, simultaneity bias, selection bias, etc.

Below I show how we can understand many of these problems in a unified regression framework and use simulated data to illustrate how they affect estimation and inference. Read more…

This post was written jointly with Joerg Luedicke, Senior Social Scientist and Statistician, StataCorp.

The command **gmm** is used to estimate the parameters of a model using the generalized method of moments (GMM). GMM can be used to estimate the parameters of models that have more identification conditions than parameters, overidentified models. The specification of these models can be evaluated using Hansen’s *J* statistic (Hansen, 1982).

We use **gmm** to estimate the parameters of a Poisson model with an endogenous regressor. More instruments than regressors are available, so the model is overidentified. We then use **estat overid** to calculate Hansen’s *J* statistic and test the validity of the overidentification restrictions.

In previous posts Read more…

The new command **gsem** allows us to fit a wide variety of models; among the many possibilities, we can account for endogeneity on different models. As an example, I will fit an ordinal model with endogenous covariates. Read more…