Archive

Posts Tagged ‘simulation’

Introduction to Bayesian statistics, part 2: MCMC and the Metropolis–Hastings algorithm

In this blog post, I’d like to give you a relatively nontechnical introduction to Markov chain Monte Carlo, often shortened to “MCMC”. MCMC is frequently used for fitting Bayesian statistical models. There are different variations of MCMC, and I’m going to focus on the Metropolis–Hastings (M–H) algorithm. In the interest of brevity, I’m going to omit some details, and I strongly encourage you to read the [BAYES] manual before using MCMC in practice.

Let’s continue with the coin toss example from my previous post Introduction to Bayesian statistics, part 1: The basic concepts. We are interested in the posterior distribution of the parameter \(\theta\), which is the probability that a coin toss results in “heads”. Our prior distribution is a flat, uninformative beta distribution with parameters 1 and 1. And we will use a binomial likelihood function to quantify the data from our experiment, which resulted in 4 heads out of 10 tosses. Read more…

Flexible discrete choice modeling using a multinomial probit model, part 2

Overview

In the first part of this post, I discussed the multinomial probit model from a random utility model perspective. In this part, we will have a closer look at how to interpret our estimation results.

How do we interpret our estimation results?

We created a fictitious dataset of individuals who were presented a set of three health insurance plans (Sickmaster, Allgood, and Cowboy Health). We pretended to have a random sample of 20- to 60-year-old persons who were asked Read more…

Flexible discrete choice modeling using a multinomial probit model, part 1

\(\newcommand{\xb}{{\bf x}}
\newcommand{\betab}{\boldsymbol{\beta}}
\newcommand{\zb}{{\bf z}}
\newcommand{\gammab}{\boldsymbol{\gamma}}\)We have no choice but to choose

We make choices every day, and often these choices are made among a finite number of potential alternatives. For example, do we take the car or ride a bike to get to work? Will we have dinner at home or eat out, and if we eat out, where do we go? Scientists, marketing analysts, or political consultants, to name a few, wish to find out why people choose what they choose.

In this post, Read more…

A simulation-based explanation of consistency and asymptotic normality

Overview

In the frequentist approach to statistics, estimators are random variables because they are functions of random data. The finite-sample distributions of most of the estimators used in applied work are not known, because the estimators are complicated nonlinear functions of random data. These estimators have large-sample convergence properties that we use to approximate their behavior in finite samples.

Two key convergence properties are consistency and asymptotic normality. A consistent estimator gets arbitrarily close in probability to the true value. The distribution of an asymptotically normal estimator gets arbitrarily close to a normal distribution as the sample size increases. We use a recentered and rescaled version of this normal distribution to approximate the finite-sample distribution of our estimators.

I illustrate the meaning of consistency and asymptotic normality by Monte Carlo simulation (MCS). I use some of the Stata mechanics I discussed in Monte Carlo simulations using Stata.

Consistent estimator

A consistent estimator gets arbitrarily close in Read more…

Vector autoregression—simulation, estimation, and inference in Stata

\(\newcommand{\epsb}{{\boldsymbol{\epsilon}}}
\newcommand{\mub}{{\boldsymbol{\mu}}}
\newcommand{\thetab}{{\boldsymbol{\theta}}}
\newcommand{\Thetab}{{\boldsymbol{\Theta}}}
\newcommand{\etab}{{\boldsymbol{\eta}}}
\newcommand{\Sigmab}{{\boldsymbol{\Sigma}}}
\newcommand{\Phib}{{\boldsymbol{\Phi}}}
\newcommand{\Phat}{\hat{{\bf P}}}\)Vector autoregression (VAR) is a useful tool for analyzing the dynamics of multiple time series. VAR expresses a vector of observed variables as a function of its own lags.

Simulation

Let’s begin by simulating a bivariate VAR(2) process using the following specification,

\[
\begin{bmatrix} y_{1,t}\\ y_{2,t}
\end{bmatrix}
= \mub + {\bf A}_1 \begin{bmatrix} y_{1,t-1}\\ y_{2,t-1}
\end{bmatrix} + {\bf A}_2 \begin{bmatrix} y_{1,t-2}\\ y_{2,t-2}
\end{bmatrix} + \epsb_t
\]

where \(y_{1,t}\) and \(y_{2,t}\) are the observed series at time \(t\), \(\mub\) is a \(2 \times 1\) vector of intercepts, \({\bf A}_1\) and \({\bf A}_2\) are \(2\times 2\) parameter matrices, and \(\epsb_t\) is a \(2\times 1\) vector of innovations that is uncorrelated over time. I assume a \(N({\bf 0},\Sigmab)\) distribution for the innovations \(\epsb_t\), where \(\Sigmab\) is a \(2\times 2\) covariance matrix.

I set my sample size to 1,100 and Read more…

regress, probit, or logit?


In a previous post I illustrated that the probit model and the logit model produce statistically equivalent estimates of marginal effects. In this post, I compare the marginal effect estimates from a linear probability model (linear regression) with marginal effect estimates from probit and logit models.

My simulations show that when the true model is a probit or a logit, using a linear probability model can produce inconsistent estimates of the marginal effects of interest to researchers. The conclusions hinge on the probit or logit model being the true model.

Simulation results

For all simulations below, I use a sample size of 10,000 and 5,000 replications. The true data-generating processes (DGPs) are constructed using Read more…

probit or logit: ladies and gentlemen, pick your weapon

We often use probit and logit models to analyze binary outcomes. A case can be made that the logit model is easier to interpret than the probit model, but Stata’s margins command makes any estimator easy to interpret. Ultimately, estimates from both models produce similar results, and using one or the other is a matter of habit or preference.

I show that the estimates from a probit and logit model are similar for the computation of a set of effects that are of interest to researchers. I focus on the effects of changes in the covariates on the probability of a positive outcome for continuous and discrete covariates. I evaluate these effects on average and at the mean value of the covariates. In other words, I study the average marginal effects (AME), the average treatment effects (ATE), the marginal effects at the mean values of the covariates (MEM), and the treatment effects at the mean values of the covariates (TEM).

First, I present the results. Second, I discuss the code used for the simulations.

Results

In Table 1, I present the results of a simulation with 4,000 replications when the true data generating process (DGP) satisfies the assumptions of a probit model. I show the Read more…

Understanding the generalized method of moments (GMM): A simple example

\(\newcommand{\Eb}{{\bf E}}\)This post was written jointly with Enrique Pinzon, Senior Econometrician, StataCorp.

The generalized method of moments (GMM) is a method for constructing estimators, analogous to maximum likelihood (ML). GMM uses assumptions about specific moments of the random variables instead of assumptions about the entire distribution, which makes GMM more robust than ML, at the cost of some efficiency. The assumptions are called moment conditions.

GMM generalizes the method of moments (MM) by allowing the number of moment conditions to be greater than the number of parameters. Using these extra moment conditions makes GMM more efficient than MM. When there are more moment conditions than parameters, the estimator is said to be overidentified. GMM can efficiently combine the moment conditions when the estimator is overidentified.

We illustrate these points by estimating the mean of a \(\chi^2(1)\) by MM, ML, a simple GMM estimator, and an efficient GMM estimator. This example builds on Efficiency comparisons by Monte Carlo simulation and is similar in spirit to the example in Wooldridge (2001). Read more…

Efficiency comparisons by Monte Carlo simulation

Overview

In this post, I show how to use Monte Carlo simulations to compare the efficiency of different estimators. I also illustrate what we mean by efficiency when discussing statistical estimators.

I wrote this post to continue a dialog with my friend who doubted the usefulness of the sample average as an estimator for the mean when the data-generating process (DGP) is a \(\chi^2\) distribution with \(1\) degree of freedom, denoted by a \(\chi^2(1)\) distribution. The sample average is a fine estimator, even though it is not the most efficient estimator for the mean. (Some researchers prefer to estimate the median instead of the mean for DGPs that generate outliers. I will address the trade-offs between these parameters in a future post. For now, I want to stick to estimating the mean.)

In this post, I also want to illustrate that Monte Carlo simulations can help explain abstract statistical concepts. I show how to use a Monte Carlo simulation to illustrate the meaning of an abstract statistical concept. (If you are new to Monte Carlo simulations in Stata, you might want to see Monte Carlo simulations using Stata.) Read more…

Monte Carlo simulations using Stata

Overview

A Monte Carlo simulation (MCS) of an estimator approximates the sampling distribution of an estimator by simulation methods for a particular data-generating process (DGP) and sample size. I use an MCS to learn how well estimation techniques perform for specific DGPs. In this post, I show how to perform an MCS study of an estimator in Stata and how to interpret the results.

Large-sample theory tells us that the sample average is a good estimator for the mean when the true DGP is a random sample from a \(\chi^2\) distribution with 1 degree of freedom, denoted by \(\chi^2(1)\). But a friend of mine claims this estimator will not work well for this DGP because the \(\chi^2(1)\) distribution will produce outliers. In this post, I use an MCS to see if the large-sample theory works well for this DGP in a sample of 500 observations. Read more…