*The original blog posted May 26, 2016, omitted option ***initrandom** from the **bayesmh** command. The code and the text of the blog entry were updated on August 9, 2018, to reflect this.

**Overview**

MCMC algorithms used for simulating posterior distributions are indispensable tools in Bayesian analysis. A major consideration in MCMC simulations is that of convergence. Has the simulated Markov chain fully explored the target posterior distribution so far, or do we need longer simulations? A common approach in assessing MCMC convergence is based on running and analyzing the difference between multiple chains.

For a given Bayesian model, **bayesmh** is capable of producing multiple Markov chains with randomly dispersed initial values by using the **initrandom** option, available as of the update on 19 May 2016. In this post, I demonstrate the Gelman–Rubin diagnostic as a more formal test for convergence using multiple chains. For graphical diagnostics, see *Graphical diagnostics using multiple chains* in **[BAYES] bayesmh** for more details. To compute the Gelman–Rubin diagnostic, I use an unofficial command, **grubin**, which can be installed by typing the following in Stata: Read more…

**Initial thoughts**

Estimating causal relationships from data is one of the fundamental endeavors of researchers. Ideally, we could conduct a controlled experiment to estimate causal relations. However, conducting a controlled experiment may be infeasible. For example, education researchers cannot randomize education attainment and they must learn from observational data.

In the absence of experimental data, we construct models to capture the relevant features of the causal relationship we have an interest in, using observational data. Models are successful if the features we did not include can be ignored without affecting our ability to ascertain the causal relationship we are interested in. Sometimes, however, ignoring some features of reality results in models that yield relationships that cannot be interpreted causally. In a regression framework, depending on our discipline or our research question, we give a different name to this phenomenon: endogeneity, omitted confounders, omitted variable bias, simultaneity bias, selection bias, etc.

Below I show how we can understand many of these problems in a unified regression framework and use simulated data to illustrate how they affect estimation and inference. Read more…

**Overview**

In the frequentist approach to statistics, estimators are random variables because they are functions of random data. The finite-sample distributions of most of the estimators used in applied work are not known, because the estimators are complicated nonlinear functions of random data. These estimators have large-sample convergence properties that we use to approximate their behavior in finite samples.

Two key convergence properties are consistency and asymptotic normality. A consistent estimator gets arbitrarily close in probability to the true value. The distribution of an asymptotically normal estimator gets arbitrarily close to a normal distribution as the sample size increases. We use a recentered and rescaled version of this normal distribution to approximate the finite-sample distribution of our estimators.

I illustrate the meaning of consistency and asymptotic normality by Monte Carlo simulation (MCS). I use some of the Stata mechanics I discussed in Monte Carlo simulations using Stata.

**Consistent estimator**

A consistent estimator gets arbitrarily close in Read more…

This post was written jointly with Yulia Marchenko, Executive Director of Statistics, StataCorp.

As of update 03 Mar 2016, **bayesmh** provides a more convenient way of fitting distributions to the outcome variable. By design, **bayesmh** is a regression command, which models the mean of the outcome distribution as a function of predictors. There are cases when we do not have any predictors and want to model the outcome distribution directly. For example, we may want to fit a Poisson distribution or a binomial distribution to our outcome. This can now be done by specifying one of the four new distributions supported by **bayesmh** in the **likelihood()** option: **dexponential()**, **dbernoulli()**, **dbinomial()**, or **dpoisson()**. Previously, the suboption **noglmtransform** of **bayesmh**‘s option **likelihood()** was used to fit the exponential, binomial, and Poisson distributions to the outcome variable. This suboption continues to work but is now undocumented.

For examples, see *Beta-binomial model*, *Bayesian analysis of change-point problem*, and *Item response theory* under *Remarks and examples* in **[BAYES] bayesmh**.

We have also updated our earlier “Bayesian binary item response theory models using bayesmh” blog entry to use the new **dbernoulli()** specification when fitting 3PL, 4PL, and 5PL IRT models.

I use features new to Stata 14.1 to estimate an average treatment effect (ATE) for a heteroskedastic probit model with an endogenous treatment. In 14.1, we added new prediction statistics after **mlexp** that **margins** can use to estimate an ATE.

I am building on a previous post in which I demonstrated how to use **mlexp** to estimate the parameters of a probit model with an endogenous treatment and used **margins** to estimate the ATE for the model Using mlexp to estimate endogenous treatment effects in a probit model. Currently, no official commands estimate the heteroskedastic probit model with an endogenous treatment, so in this post I show how **mlexp** can be used to extend the models estimated by Stata. Read more…

\(\newcommand{\epsilonb}{\boldsymbol{\epsilon}}

\newcommand{\ebi}{\boldsymbol{\epsilon}_i}

\newcommand{\Sigmab}{\boldsymbol{\Sigma}}

\newcommand{\Omegab}{\boldsymbol{\Omega}}

\newcommand{\Lambdab}{\boldsymbol{\Lambda}}

\newcommand{\betab}{\boldsymbol{\beta}}

\newcommand{\gammab}{\boldsymbol{\gamma}}

\newcommand{\Gammab}{\boldsymbol{\Gamma}}

\newcommand{\deltab}{\boldsymbol{\delta}}

\newcommand{\xib}{\boldsymbol{\xi}}

\newcommand{\iotab}{\boldsymbol{\iota}}

\newcommand{\xb}{{\bf x}}

\newcommand{\xbit}{{\bf x}_{it}}

\newcommand{\xbi}{{\bf x}_{i}}

\newcommand{\zb}{{\bf z}}

\newcommand{\zbi}{{\bf z}_i}

\newcommand{\wb}{{\bf w}}

\newcommand{\yb}{{\bf y}}

\newcommand{\ub}{{\bf u}}

\newcommand{\Gb}{{\bf G}}

\newcommand{\Hb}{{\bf H}}

\newcommand{\thetab}{\boldsymbol{\theta}}

\newcommand{\XBI}{{\bf x}_{i1},\ldots,{\bf x}_{iT}}

\newcommand{\Sb}{{\bf S}} \newcommand{\Xb}{{\bf X}}

\newcommand{\Xtb}{\tilde{\bf X}}

\newcommand{\Wb}{{\bf W}}

\newcommand{\Ab}{{\bf A}}

\newcommand{\Bb}{{\bf B}}

\newcommand{\Zb}{{\bf Z}}

\newcommand{\Eb}{{\bf E}}\) This post was written jointly with Joerg Luedicke, Senior Social Scientist and Statistician, StataCorp.

**Overview**

We provide an introduction to parameter estimation by maximum likelihood and method of moments using **mlexp** and **gmm**, respectively (see **[R] mlexp** and **[R] gmm**). We include some background about these estimation techniques; see Pawitan (2001, Casella and Berger (2002), Cameron and Trivedi (2005), and Wooldridge (2010) for more details.

Maximum likelihood (ML) estimation finds the parameter values that make the observed data most probable. The parameters maximize the log of the likelihood function that specifies the probability of observing a particular set of data given a model.

Method of moments (MM) estimators specify population moment conditions and find the parameters that solve the equivalent sample moment conditions. MM estimators usually place fewer restrictions on the model than ML estimators, which implies that MM estimators are less efficient but more robust than ML estimators. Read more…

**Overview**

In this post, I show how to use Monte Carlo simulations to compare the efficiency of different estimators. I also illustrate what we mean by efficiency when discussing statistical estimators.

I wrote this post to continue a dialog with my friend who doubted the usefulness of the sample average as an estimator for the mean when the data-generating process (DGP) is a \(\chi^2\) distribution with \(1\) degree of freedom, denoted by a \(\chi^2(1)\) distribution. The sample average is a fine estimator, even though it is not the most efficient estimator for the mean. (Some researchers prefer to estimate the median instead of the mean for DGPs that generate outliers. I will address the trade-offs between these parameters in a future post. For now, I want to stick to estimating the mean.)

In this post, I also want to illustrate that Monte Carlo simulations can help explain abstract statistical concepts. I show how to use a Monte Carlo simulation to illustrate the meaning of an abstract statistical concept. (If you are new to Monte Carlo simulations in Stata, you might want to see Monte Carlo simulations using Stata.) Read more…

**Overview**

In this post, I show how to use **mlexp** to estimate the degree of freedom parameter of a chi-squared distribution by maximum likelihood (ML). One example is unconditional, and another example models the parameter as a function of covariates. I also show how to generate data from chi-squared distributions and I illustrate how to use simulation methods to understand an estimation technique. Read more…

**Overview**

A Monte Carlo simulation (MCS) of an estimator approximates the sampling distribution of an estimator by simulation methods for a particular data-generating process (DGP) and sample size. I use an MCS to learn how well estimation techniques perform for specific DGPs. In this post, I show how to perform an MCS study of an estimator in Stata and how to interpret the results.

Large-sample theory tells us that the sample average is a good estimator for the mean when the true DGP is a random sample from a \(\chi^2\) distribution with 1 degree of freedom, denoted by \(\chi^2(1)\). But a friend of mine claims this estimator will not work well for this DGP because the \(\chi^2(1)\) distribution will produce outliers. In this post, I use an MCS to see if the large-sample theory works well for this DGP in a sample of 500 observations. Read more…