## Bayesian modeling: Beyond Stata’s built-in models

This post was written jointly with Nikolay Balov, Senior Statistician and Software Developer, StataCorp.

A question on Statalist motivated us to write this blog entry.

A user asked if the **churdle** command (http://www.stata.com/stata14/hurdle-models/) for fitting hurdle models, new in Stata 14, can be combined with the **bayesmh** command (http://www.stata.com/stata14/bayesian-analysis/) for fitting Bayesian models, also new in Stata 14:

Our initial reaction to this question was ‘No’ or, more precisely, ‘Not easily’ — hurdle models are not among the likelihood models supported by **bayesmh**. One can write a program to compute the log likelihood of the double hurdle model and use this program with **bayesmh** (in the spirit of http://www.stata.com/stata14/bayesian-evaluators/), but this may seem like a daunting task if you are not familiar with Stata programming.

And then we realized, why not simply call **churdle** from the evaluator to compute the log likelihood? All we need is for **churdle** to evaluate the log likelihood at specific values of model parameters without performing iterations. This can be achieved by specifying **churdle**‘s options **from()** and **iterate(0)**.

Let’s look at an example. We consider a simple hurdle model using a subset of the fitness dataset from **[R] churdle**:

. webuse fitness . set seed 17653 . sample 10 . churdle linear hours age, select(commute) ll(0) Iteration 0: log likelihood = -2783.3352 Iteration 1: log likelihood = -2759.029 Iteration 2: log likelihood = -2758.9992 Iteration 3: log likelihood = -2758.9992 Cragg hurdle regression Number of obs = 1,983 LR chi2(1) = 3.42 Prob > chi2 = 0.0645 Log likelihood = -2758.9992 Pseudo R2 = 0.0006 ------------------------------------------------------------------------------ hours | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- hours | age | .0051263 .0028423 1.80 0.071 -.0004446 .0106971 _cons | 1.170932 .1238682 9.45 0.000 .9281548 1.413709 -------------+---------------------------------------------------------------- selection_ll | commute | -.0655171 .1561046 -0.42 0.675 -.3714765 .2404423 _cons | .1421166 .0882658 1.61 0.107 -.0308813 .3151144 -------------+---------------------------------------------------------------- lnsigma | _cons | .1280215 .03453 3.71 0.000 .060344 .195699 -------------+---------------------------------------------------------------- /sigma | 1.136577 .039246 1.062202 1.216161 ------------------------------------------------------------------------------

Let’s assume for a moment that we already have an evaluator, **mychurdle1**, that returns the corresponding log-likelihood value. We can fit a Bayesian hurdle model using **bayesmh** as follows:

. gen byte hours0 = (hours==0) //dependent variable for the selection equation . set seed 123 . bayesmh (hours age) (hours0 commute), llevaluator(mychurdle1, parameters({lnsig})) prior({hours:} {hours0:} {lnsig}, flat) saving(sim, replace) dots(output omitted)

We use a two-equation specification to fit this model. The main regression is specified first, and the selection regression is specified next. The additional parameter, log of the standard deviation associated with the main regression, is specified in **llevaluator()**‘s suboption **parameters()**. All parameters are assigned flat priors to obtain results similar to **churdle**. MCMC results are saved in **sim.dta**.

Here is the actual output from **bayesmh**:

. bayesmh (hours age) (hours0 commute), llevaluator(mychurdle1, parameters({lns > ig})) prior({hours:} {hours0:} {lnsig}, flat) saving(sim, replace) dots Burn-in 2500 aaaaaaaaa1000aaaaaa...2000..... done Simulation 10000 .........1000.........2000.........3000.........4000.........5 > 000.........6000.........7000.........8000.........9000.........10000 done Model summary ------------------------------------------------------------------------------ Likelihood: hours hours0 ~ mychurdle1(xb_hours,xb_hours0,{lnsig}) Priors: {hours:age _cons} ~ 1 (flat) (1) {hours0:commute _cons} ~ 1 (flat) (2) {lnsig} ~ 1 (flat) ------------------------------------------------------------------------------ (1) Parameters are elements of the linear form xb_hours. (2) Parameters are elements of the linear form xb_hours0. Bayesian regression MCMC iterations = 12,500 Random-walk Metropolis-Hastings sampling Burn-in = 2,500 MCMC sample size = 10,000 Number of obs = 1,983 Acceptance rate = .2889 Efficiency: min = .05538 avg = .06266 Log marginal likelihood = -2772.3953 max = .06945 ------------------------------------------------------------------------------ | Equal-tailed | Mean Std. Dev. MCSE Median [95% Cred. Interval] -------------+---------------------------------------------------------------- hours | age | .0050916 .0027972 .000106 .0049923 -.0000372 .0107231 _cons | 1.167265 .124755 .004889 1.175293 .9125105 1.392021 -------------+---------------------------------------------------------------- hours0 | commute | -.0621282 .1549908 .006585 -.0613511 -.3623891 .2379805 _cons | .1425693 .0863626 .003313 .1430396 -.0254507 .3097677 -------------+---------------------------------------------------------------- lnsig | .1321532 .0346446 .001472 .1326704 .0663646 .2015249 ------------------------------------------------------------------------------ file sim.dta saved

The results are similar to those produced by **churdle**, as one would expect with noninformative priors.

If desired, we can use **bayesstats summary** to obtain the estimate of the standard deviation:

. bayesstats summary (sigma: exp({lnsig})) Posterior summary statistics MCMC sample size = 10,000 sigma : exp({lnsig}) ------------------------------------------------------------------------------ | Equal-tailed | Mean Std. Dev. MCSE Median [95% Cred. Interval] -------------+---------------------------------------------------------------- sigma | 1.141969 .0396264 .001685 1.141874 1.068616 1.223267 ------------------------------------------------------------------------------

Let’s now talk in more detail about a log-likelihood evaluator. We will consider two evaluators: one using **churdle** and one directly implementing the log likelihood of the considered hurdle model.

### Log-likelihood evaluator using churdle

Here we demonstrate how to write a log-likelihood evaluator that calls an existing Stata estimation command, **churdle** in our example, to compute the log likelihood.

program mychurdle1 version 14.0 args llf tempname b mat `b' = ($MH_b, $MH_p) capture churdle linear $MH_y1 $MH_y1x1 if $MH_touse, /// select($MH_y2x1) ll(0) from(`b') iterate(0) if _rc { if (_rc==1) { // handle break key exit _rc } scalar `llf' = . } else { scalar `llf' = e(ll) } end

The **mychurdle1** program returns the log-likelihood value computed by **churdle** at the current values of model parameters. This program accepts one argument — a temporary scalar to contain the log-likelihood value **llf**. We stored current values of model parameters (regression coefficients from two equations stored in vector **MH_b** and the extra parameter, log standard-deviation, stored in vector **MH_p**) in a temporary matrix **b**. We specified **churdle**‘s options **from()** and **iterate(0)** to evaluate the log likelihood at the current parameter values. Finally, we stored the resulting log-likelihood value in **llf** (or missing value if the command failed to evaluate the log likelihood).

### Log-likelihood evaluator directly computing log likelihood

Here we demonstrate how to write a log-likelihood evaluator that computes the likelihood of the fitted hurdle model directly rather than calling **churdle**.

program mychurdle2 version 14.0 args lnf xb xg lnsig tempname sig scalar `sig' = exp(`lnsig') tempvar lnfj qui gen double `lnfj' = normal(`xg') if $MH_touse qui replace `lnfj' = log(1 - `lnfj') if $MH_y1 <= 0 & $MH_touse qui replace `lnfj' = log(`lnfj') - log(normal(`xb'/`sig')) /// + log(normalden($MH_y1,`xb',`sig')) /// if $MH_y1 > 0 & $MH_touse summarize `lnfj' if $MH_touse, meanonly if r(N) < $MH_n { scalar `lnf' = . exit } scalar `lnf' = r(sum) end

The **mychurdle2** program accepts four arguments: a temporary scalar to contain the log-likelihood value **llf**, temporary variables **xb** and **xg** containing linear predictors from the corresponding main and selection equations evaluated at the current values of model parameters, and temporary scalar **lnsig** containing the current value of the log standard-deviation parameter. We compute and store the observation-level log likelihood in a temporary variable **lnfj**. Global **MH_y1** contains the name of the dependent variable from the first (main) equation, and global **MH_touse** marks the estimation sample. If all observation-specific log likelihood contributions are nonmissing, we store the overall log-likelihood value in **lnf** or we otherwise store missing.

We fit our model using the same syntax as earlier, except we use **mychurdle2** as the program evaluator.

. set seed 123 . bayesmh (hours age) (hours0 commute), llevaluator(mychurdle2, parameters({lns > ig})) prior({hours:} {hours0:} {lnsig}, flat) saving(sim, replace) dots Burn-in 2500 aaaaaaaaa1000aaaaaa...2000..... done Simulation 10000 .........1000.........2000.........3000.........4000.........5 > 000.........6000.........7000.........8000.........9000.........10000 done Model summary ------------------------------------------------------------------------------ Likelihood: hours hours0 ~ mychurdle2(xb_hours,xb_hours0,{lnsig}) Priors: {hours:age _cons} ~ 1 (flat) (1) {hours0:commute _cons} ~ 1 (flat) (2) {lnsig} ~ 1 (flat) ------------------------------------------------------------------------------ (1) Parameters are elements of the linear form xb_hours. (2) Parameters are elements of the linear form xb_hours0. Bayesian regression MCMC iterations = 12,500 Random-walk Metropolis-Hastings sampling Burn-in = 2,500 MCMC sample size = 10,000 Number of obs = 1,983 Acceptance rate = .2889 Efficiency: min = .05538 avg = .06266 Log marginal likelihood = -2772.3953 max = .06945 ------------------------------------------------------------------------------ | Equal-tailed | Mean Std. Dev. MCSE Median [95% Cred. Interval] -------------+---------------------------------------------------------------- hours | age | .0050916 .0027972 .000106 .0049923 -.0000372 .0107231 _cons | 1.167265 .124755 .004889 1.175293 .9125105 1.392021 -------------+---------------------------------------------------------------- hours0 | commute | -.0621282 .1549908 .006585 -.0613511 -.3623891 .2379805 _cons | .1425693 .0863626 .003313 .1430396 -.0254507 .3097677 -------------+---------------------------------------------------------------- lnsig | .1321532 .0346446 .001472 .1326704 .0663646 .2015249 ------------------------------------------------------------------------------ file sim.dta not found; file saved

We obtain the same results as those obtained using approach 1, and we obtain them much faster.

### Final remarks

Approach 1 is very straightforward. It can be applied to any Stata command that returns the log likelihood and allows you to specify parameter values at which this log likelihood must be evaluated. Without too much programming effort, you can use almost any existing Stata maximum likelihood estimation command with **bayesmh**. A disadvantage of approach 1 is slower execution compared with programming the likelihood directly, as in approach 2. For example, the command using the **mychurdle1** evaluator from approach 1 took about 25 minutes to run, whereas the command using the **mychurdle2** evaluator from approach 2 took only 20 seconds.