This post was written jointly with Yulia Marchenko, Executive Director of Statistics, StataCorp.

As of update 03 Mar 2016, **bayesmh** provides a more convenient way of fitting distributions to the outcome variable. By design, **bayesmh** is a regression command, which models the mean of the outcome distribution as a function of predictors. There are cases when we do not have any predictors and want to model the outcome distribution directly. For example, we may want to fit a Poisson distribution or a binomial distribution to our outcome. This can now be done by specifying one of the four new distributions supported by **bayesmh** in the **likelihood()** option: **dexponential()**, **dbernoulli()**, **dbinomial()**, or **dpoisson()**. Previously, the suboption **noglmtransform** of **bayesmh**‘s option **likelihood()** was used to fit the exponential, binomial, and Poisson distributions to the outcome variable. This suboption continues to work but is now undocumented.

For examples, see *Beta-binomial model*, *Bayesian analysis of change-point problem*, and *Item response theory* under *Remarks and examples* in **[BAYES] bayesmh**.

We have also updated our earlier “Bayesian binary item response theory models using bayesmh” blog entry to use the new **dbernoulli()** specification when fitting 3PL, 4PL, and 5PL IRT models.

I use features new to Stata 14.1 to estimate an average treatment effect (ATE) for a heteroskedastic probit model with an endogenous treatment. In 14.1, we added new prediction statistics after **mlexp** that **margins** can use to estimate an ATE.

I am building on a previous post in which I demonstrated how to use **mlexp** to estimate the parameters of a probit model with an endogenous treatment and used **margins** to estimate the ATE for the model Using mlexp to estimate endogenous treatment effects in a probit model. Currently, no official commands estimate the heteroskedastic probit model with an endogenous treatment, so in this post I show how **mlexp** can be used to extend the models estimated by Stata. Read more…

\(\newcommand{\epsilonb}{\boldsymbol{\epsilon}}

\newcommand{\ebi}{\boldsymbol{\epsilon}_i}

\newcommand{\Sigmab}{\boldsymbol{\Sigma}}

\newcommand{\Omegab}{\boldsymbol{\Omega}}

\newcommand{\Lambdab}{\boldsymbol{\Lambda}}

\newcommand{\betab}{\boldsymbol{\beta}}

\newcommand{\gammab}{\boldsymbol{\gamma}}

\newcommand{\Gammab}{\boldsymbol{\Gamma}}

\newcommand{\deltab}{\boldsymbol{\delta}}

\newcommand{\xib}{\boldsymbol{\xi}}

\newcommand{\iotab}{\boldsymbol{\iota}}

\newcommand{\xb}{{\bf x}}

\newcommand{\xbit}{{\bf x}_{it}}

\newcommand{\xbi}{{\bf x}_{i}}

\newcommand{\zb}{{\bf z}}

\newcommand{\zbi}{{\bf z}_i}

\newcommand{\wb}{{\bf w}}

\newcommand{\yb}{{\bf y}}

\newcommand{\ub}{{\bf u}}

\newcommand{\Gb}{{\bf G}}

\newcommand{\Hb}{{\bf H}}

\newcommand{\thetab}{\boldsymbol{\theta}}

\newcommand{\XBI}{{\bf x}_{i1},\ldots,{\bf x}_{iT}}

\newcommand{\Sb}{{\bf S}} \newcommand{\Xb}{{\bf X}}

\newcommand{\Xtb}{\tilde{\bf X}}

\newcommand{\Wb}{{\bf W}}

\newcommand{\Ab}{{\bf A}}

\newcommand{\Bb}{{\bf B}}

\newcommand{\Zb}{{\bf Z}}

\newcommand{\Eb}{{\bf E}}\) This post was written jointly with Joerg Luedicke, Senior Social Scientist and Statistician, StataCorp.

**Overview**

We provide an introduction to parameter estimation by maximum likelihood and method of moments using **mlexp** and **gmm**, respectively (see **[R] mlexp** and **[R] gmm**). We include some background about these estimation techniques; see Pawitan (2001, Casella and Berger (2002), Cameron and Trivedi (2005), and Wooldridge (2010) for more details.

Maximum likelihood (ML) estimation finds the parameter values that make the observed data most probable. The parameters maximize the log of the likelihood function that specifies the probability of observing a particular set of data given a model.

Method of moments (MM) estimators specify population moment conditions and find the parameters that solve the equivalent sample moment conditions. MM estimators usually place fewer restrictions on the model than ML estimators, which implies that MM estimators are less efficient but more robust than ML estimators. Read more…

**Overview**

In this post, I show how to use Monte Carlo simulations to compare the efficiency of different estimators. I also illustrate what we mean by efficiency when discussing statistical estimators.

I wrote this post to continue a dialog with my friend who doubted the usefulness of the sample average as an estimator for the mean when the data-generating process (DGP) is a \(\chi^2\) distribution with \(1\) degree of freedom, denoted by a \(\chi^2(1)\) distribution. The sample average is a fine estimator, even though it is not the most efficient estimator for the mean. (Some researchers prefer to estimate the median instead of the mean for DGPs that generate outliers. I will address the trade-offs between these parameters in a future post. For now, I want to stick to estimating the mean.)

In this post, I also want to illustrate that Monte Carlo simulations can help explain abstract statistical concepts. I show how to use a Monte Carlo simulation to illustrate the meaning of an abstract statistical concept. (If you are new to Monte Carlo simulations in Stata, you might want to see Monte Carlo simulations using Stata.) Read more…

**Overview**

In this post, I show how to use **mlexp** to estimate the degree of freedom parameter of a chi-squared distribution by maximum likelihood (ML). One example is unconditional, and another example models the parameter as a function of covariates. I also show how to generate data from chi-squared distributions and I illustrate how to use simulation methods to understand an estimation technique. Read more…

**Overview**

A Monte Carlo simulation (MCS) of an estimator approximates the sampling distribution of an estimator by simulation methods for a particular data-generating process (DGP) and sample size. I use an MCS to learn how well estimation techniques perform for specific DGPs. In this post, I show how to perform an MCS study of an estimator in Stata and how to interpret the results.

Large-sample theory tells us that the sample average is a good estimator for the mean when the true DGP is a random sample from a \(\chi^2\) distribution with 1 degree of freedom, denoted by \(\chi^2(1)\). But a friend of mine claims this estimator will not work well for this DGP because the \(\chi^2(1)\) distribution will produce outliers. In this post, I use an MCS to see if the large-sample theory works well for this DGP in a sample of 500 observations. Read more…