Archive

Archive for the ‘Statistics’ Category

Vector autoregressions in Stata

Introduction

In a univariate autoregression, a stationary time-series variable \(y_t\) can often be modeled as depending on its own lagged values:

\begin{align}
y_t = \alpha_0 + \alpha_1 y_{t-1} + \alpha_2 y_{t-2} + \dots
+ \alpha_k y_{t-k} + \varepsilon_t
\end{align}

When one analyzes multiple time series, the natural extension to the autoregressive model is the vector autoregression, or VAR, in which a vector of variables is modeled as depending on their own lags and on the lags of every other variable in the vector. A two-variable VAR with one lag looks like

\begin{align}
y_t &= \alpha_{0} + \alpha_{1} y_{t-1} + \alpha_{2} x_{t-1}
+ \varepsilon_{1t} \\
x_t &= \beta_0 + \beta_{1} y_{t-1} + \beta_{2} x_{t-1}
+ \varepsilon_{2t}
\end{align}

Applied macroeconomists use models of this form to both describe macroeconomic data and to perform causal inference and provide policy advice.

In this post, I will estimate a three-variable VAR using the U.S. unemployment rate, the inflation rate, and the nominal interest rate. This VAR is similar to those used in macroeconomics for monetary policy analysis. I focus on basic issues in estimation and postestimation. Data and do-files are provided at the end. Additional background and theoretical details can be found in Ashish Rajbhandari’s [earlier post], which explored VAR estimation using simulated data. Read more…

Multiple-equation models: Estimation and marginal effects using gmm

We estimate the average treatment effect (ATE) for an exponential mean model with an endogenous treatment. We have a two-step estimation problem where the first step corresponds to the treatment model and the second to the outcome model. As shown in Using gmm to solve two-step estimation problems, this can be solved with the generalized method of moments using gmm.

This continues the series of posts where we illustrate how to obtain correct standard errors and marginal effects for models with multiple steps. In the previous posts, we used gsem and mlexp to estimate the parameters of models with separable likelihoods. In the current model, because the treatment is endogenous, the likelihood for the model is no longer separable. We demonstrate how we can use gmm to estimate the parameters in these situations. Read more…

Probability differences and odds ratios measure conditional-on-covariate effects and population-parameter effects

\(\newcommand{\Eb}{{\bf E}}
\newcommand{\xb}{{\bf x}}
\newcommand{\betab}{\boldsymbol{\beta}}\)Differences in conditional probabilities and ratios of odds are two common measures of the effect of a covariate in binary-outcome models. I show how these measures differ in terms of conditional-on-covariate effects versus population-parameter effects. Read more…

Doctors versus policy analysts: Estimating the effect of interest

\(\newcommand{\Eb}{{\bf E}}\)The change in a regression function that results from an everything-else-held-equal change in a covariate defines an effect of a covariate. I am interested in estimating and interpreting effects that are conditional on the covariates and averages of effects that vary over the individuals. I illustrate that these two types of effects answer different questions. Doctors, parents, and consultants frequently ask individuals for their covariate values to make individual-specific recommendations. Policy analysts use a population-averaged effect that accounts for the variation of the effects over the individuals. Read more…

Effects of nonlinear models with interactions of discrete and continuous variables: Estimating, graphing, and interpreting

I want to estimate, graph, and interpret the effects of nonlinear models with interactions of continuous and discrete variables. The results I am after are not trivial, but obtaining what I want using margins, marginsplot, and factor-variable notation is straightforward. Read more…

Flexible discrete choice modeling using a multinomial probit model, part 2

Overview

In the first part of this post, I discussed the multinomial probit model from a random utility model perspective. In this part, we will have a closer look at how to interpret our estimation results.

How do we interpret our estimation results?

We created a fictitious dataset of individuals who were presented a set of three health insurance plans (Sickmaster, Allgood, and Cowboy Health). We pretended to have a random sample of 20- to 60-year-old persons who were asked Read more…

Flexible discrete choice modeling using a multinomial probit model, part 1

\(\newcommand{\xb}{{\bf x}}
\newcommand{\betab}{\boldsymbol{\beta}}
\newcommand{\zb}{{\bf z}}
\newcommand{\gammab}{\boldsymbol{\gamma}}\)We have no choice but to choose

We make choices every day, and often these choices are made among a finite number of potential alternatives. For example, do we take the car or ride a bike to get to work? Will we have dinner at home or eat out, and if we eat out, where do we go? Scientists, marketing analysts, or political consultants, to name a few, wish to find out why people choose what they choose.

In this post, Read more…

Unit-root tests in Stata

\(\newcommand{\mub}{{\boldsymbol{\mu}}}
\newcommand{\eb}{{\boldsymbol{e}}}
\newcommand{\betab}{\boldsymbol{\beta}}\)Determining the stationarity of a time series is a key step before embarking on any analysis. The statistical properties of most estimators in time series rely on the data being (weakly) stationary. Loosely speaking, a weakly stationary process is characterized by a time-invariant mean, variance, and autocovariance.

In most observed series, however, the presence of a trend component results in the series being nonstationary. Furthermore, the trend can be either deterministic or stochastic, depending on which appropriate transformations must be applied to obtain a stationary series. For example, a stochastic trend, or commonly known as a unit root, is eliminated by differencing the series. However, differencing a series that in fact contains a deterministic trend results in a unit root in the moving-average process. Similarly, subtracting a deterministic trend from a series that in fact contains a stochastic trend does not render a stationary series. Hence, it is important to identify whether nonstationarity is due to a deterministic or a stochastic trend before applying the proper transformations.

In this post, Read more…

Multiple equation models: Estimation and marginal effects using mlexp

We continue with the series of posts where we illustrate how to obtain correct standard errors and marginal effects for models with multiple steps. In this post, we estimate the marginal effects and standard errors for a hurdle model with two hurdles and a lognormal outcome using mlexp. mlexp allows us to estimate parameters for multiequation models using maximum likelihood. In the last post (Multiple equation models: Estimation and marginal effects using gsem), we used gsem to estimate marginal effects and standard errors for a hurdle model with two hurdles and an exponential mean outcome.

We exploit the fact that the hurdle-model likelihood is separable and the joint log likelihood is the sum of the individual hurdle and outcome log likelihoods. We estimate the parameters of each hurdle and the outcome separately to get initial values. Then, we use mlexp to estimate the parameters of the model and margins to obtain marginal effects. Read more…

Multiple equation models: Estimation and marginal effects using gsem

Starting point: A hurdle model with multiple hurdles

In a sequence of posts, we are going to illustrate how to obtain correct standard errors and marginal effects for models with multiple steps.

Our inspiration for this post is an old Statalist inquiry about how to obtain marginal effects for a hurdle model with more than one hurdle (http://www.statalist.org/forums/forum/general-stata-discussion/general/1337504-estimating-marginal-effect-for-triple-hurdle-model). Hurdle models have the appealing property that their likelihood is separable. Each hurdle has its own likelihood and regressors. You can estimate each one of these hurdles separately to obtain point estimates. However, you cannot get standard errors or marginal effects this way.

In this post, Read more…