### Archive

Posts Tagged ‘Bayesian’

## Stata 17 released

We just announced Stata 17. Visit stata.com/new-in-stata to read all about its 29 major new features.

They are

Looking over this list of features, someone suggested that a potential marketing tagline for Stata 17 could be “Better. Faster. Stronger.” I thought Daft Punk might not like it if we used that, so we aren’t, but really, it is a great overall description of the new version.

I’ll share my thoughts on some of the new features below.

### Customizable tables

There has been a long tradition in the Stata user community of commands that build various tables. These are among some of the most-used community-contributed commands! There has been an equally long tradition of the Stata user community asking us to provide more official features to assist with flexible table creation and export. Stata’s table command has been completely revamped, and a new collect command allows you to gather and manage results from multiple commands, which can then be shown in tabular form. Excel, HTML, LaTeX, Markdown, PDF, Stata SMCL, Word, and plain text are supported as export formats. I suspect almost all users will be adding this to their Stata repertoire.

### Bayesian econometrics

In Stata 17, we have added many features for Bayesian econometrics, including

Many of you have been asking us for Bayesian VAR models. That’s not surprising. VAR models have many parameters but often not enough data to estimate them reliably. The Bayesian approach provides a solution by incorporating specialized priors to allow you to obtain more stable parameter estimates.

As with classical VAR models, you can perform IRF analysis and obtain dynamic forecasts but now within the Bayesian paradigm.

Bayesian panel-data models are appealing when you have few panels or when you would like to study and compare panel-specific effects.

With Bayesian DSGE models, prior distributions give you a natural way to incorporate knowledge about model parameters that is motivated by the economic theory.

As with Stata’s other Bayesian features, our aim is to make specification of these models as intuitive as possible and as similar to the specifications of the frequentist counterparts as possible.

All the above is in addition to other existing Bayesian features of interest to econometricians such as Bayesian generalized linear models and Bayesian sample-selection models.

### Bayesian multilevel models

Stata users span many disciplines. In addition to the new Bayesian features above that will be of most interest to econometricians, Stata 17 also adds Bayesian multilevel modeling with support for nonlinear, joint, SEM-like, and even more models. One notable feature is the ability to fit multivariate nonlinear models containing random effects such as multivariate nonlinear growth models.

### Give me more speed

As computer capabilities have grown, so have dataset sizes. With larger datasets and more computationally intensive methods comes the above request, “Give me more speed.” Sometimes, our developers say, “I’m giving her all she’s got, Captain!”, but for Stata 17, they gave us all more.

We achieved speed gains in part through careful algorithm selection and implementation and in part through integration of the Intel Math Kernel Library (MKL) to underpin many of Mata’s linear algebra functions and operators.

Read the details, including tables of specific speed gains.

### Difference-in-differences (DID) models

DID and difference-in-difference-in-differences (DDD) models are appealing to many disciplines, including econometrics, epidemiology, political science, public policy, and many more. If you are studying the effect of a treatment (such as a drug regimen or policy) in observational data and are concerned that the effect may be influenced by time or by some other group effects, DID and DDD models provide intuitive methods to control for such unobserved effects.

### New meta-analysis features

Stata 17 adds the following to the excellent (IMHO) meta-analysis suite introduced in Stata 16:

If you recognize the above, you know you want them!

And we have made these new features just as easy to use as the rest of the meta suite.

### Interval-censored Cox model

The Cox proportional hazards model is used routinely by researchers from many disciplines to analyze right-censored event-time data, where time to an event of interest is observed exactly. In Stata 17, you can use it with interval-censored event-time data too!

With interval-censored event-time data, we only know that the time to an event of interest lies in an interval. For instance, think of time to cancer recurrence or to an infection or, for that matter, to any asymptomatic disease that can be detected only through periodic examinations. These are all interval-censored data, and you now have a new powerful tool to analyze them.

### New lasso features

Stata 17 adds the following to the popular lasso features first introduced in Stata 16:

In fact, treatment-effects lasso combines two popular features: treatment effects and lasso. You can now incorporate many (hundreds, thousands, and more) covariates in your treatment-effects models.

You can account for clustered observations in your lasso analysis. And you can use the BIC criterion to select lasso penalty parameters.

### Integration with other software and languages

Two of Stata’s great features are its extensibility and reproducibility. Stata 17 builds on that tradition by greatly enhancing its interoperability with Python and Java, adding support for Jupyter Notebook, adding JDBC support, and giving you experimental access to the H2O platform.

You could already call Python code from Stata code. Now you can call Stata from any stand-alone Python environment. Do you write Python code in Jupyter Notebook, Spyder IDE, or PyCharm IDE? Now you can call Stata directly from those environments, passing data, metadata, and results between Stata and Python seamlessly. Even your Stata graphs will show up directly in Jupyter Notebook. Our nickname for all the ways you can connect Python and Stata is ‘PyStata‘.

For Java, you could already compile Java code into .jar files and call those as plugins from Stata. Now you can embed Java code directly in your Stata do-files and ado-files, just like Mata code and Python code. Stata will compile and execute your Java code on the fly. You can interchange data, metadata, and results at will.

One of the first steps of any analysis is importing your data. Stata 17 supports JDBC for importing data from and writing data to databases that provide JDBC drivers. An important advantage JDBC has over ODBC is that JDBC drivers are platform independent, so if a database vendor provides a JDBC driver, it will work seamlessly on Windows, Mac, and Linux.

Finally, some of our developers have been experimenting with connecting Stata to H2O, a scalable and distributed open-source machine-learning and predictive analytics platform. We decided to release our experiment to you. Is this something you’d like us to do more with? We look forward to your feedback.

### New date and time functions

Stata 17 adds a plethora of date and time convenience functions in three main areas:

• Datetime durations, such as ages
• Relative dates, such as the next birthday relative to a reference date
• Datetime components, functions that extract various components from datetime values

You’ll undoubtedly find these make your life easier when working with date and time values.

### New Do-file Editor features

Don’t miss the enhancements in the Do-file Editor, including persistent bookmarks that are saved with your code, a Navigation Control providing quick access to those bookmarks and defined programs, syntax highlighting support for Java and XML (in addition to the existing support for Stata ado, Python, and Markdown), and autocompletion of quotes, parentheses, and brackets around a selection.

### And there’s more

Health scientists who deal with ordinal outcomes with an overabundance of values in the lowest category will want to try the new ziologit command.

Those of you interested in panel data and categorical outcomes will be pleased to know that you can now analyze both together easily with the new xtmlogit command.

And for those of you interested in nonparametric tests of trend, we have added three new tests to the existing nptrend command.

Finally, Stata 17 runs fully natively on Apple’s new M1 Macs, known as Apple Silicon. Stata ships as a universal application that has everything necessary to run natively on both M1 Macs and Intel-based Macs for the best performance no matter your choice of hardware platform.

It has been a lot of fun to see this release come together at StataCorp, and it is a tremendous pleasure to be able to release it to you.

Categories: New Products Tags:

## Stata 16 Released

We just announced the release of Stata 16. It is now available. Click to visit stata.com/new-in-stata.

Stata 16 is a big release, which our releases usually are. This one is broader than usual. It ranges from lasso to Python and from multiple datasets in memory to multiple chains in Bayesian analysis.

The highlights are listed below. If you click on a highlight, we will spirit you away to our website, where we will describe the feature in a dry but information-dense way. Or you can scroll down and read my comments, which I hope are more entertaining even if they are less informative.

The big features of Stata 16 are

Number 22 is not a link because it’s not a highlight. I added it because I suspect it will affect the most Stata users. It may not be enough to make you buy the release, but it will half tempt you. Buy the update, and you will never again have to type

. set matsize 600


And if you do type it, you will be ignored. Stata just works, and it uses less memory.

Oh, and in Stata/MP, Stata matrices can now be up to 65,534 x 65,534, meaning you can fit models with over 65,000 right-hand-side variables. Meanwhile, Mata matrices remain limited only by memory.

Here are my comments on the highlights.

1. Lasso, both for prediction and for inference

There are two parts to our implementation of lasso: prediction and inference. I suspect inference will be of more interest to our users, but we needed prediction to implement inference. By the way, when I say lasso, I mean lasso, elastic net, and square-root lasso, but if you want a features list, click the title.

. lasso linear y x1 x2 x3 ... x999


lasso will select the covariates from the x‘s specified and fit the model on them. lasso will be unlikely to choose the covariates that belong in the true model, but it will choose covariates that are collinear with them, and that works a treat for prediction. If English is not your first language, by “works a treat”, I mean great. Anyway, the lasso command is for prediction, and standard errors for the covariates it selects are not reported because they would be misleading.

Concerning inference, we provide four lasso-based methods: double selection, cross-fit partialing out, and two more. If you type

. dsregress y x1, controls(x2-x999)


then, conceptually but not actually, y will be fit on x1 and the variables lasso selects from x2-x999. That’s not how the calculation is made because the variables lasso selects are not identical to the true variables that belong in the model. I said earlier that they are correlated with the true variables, and they are. Another way to think about selection is that lasso estimates the variables to be selected and, as with all estimation, that is subject to error. Anyway, the inference calculations are robust to those errors. Reported will be the coefficient and its standard error for x1. I specified one variable of special interest in the example, but you can specify however many you wish.

2. Reproducible and automatically updating reports

The inelegant title above is trying to say (1) reports that reproduce themselves just as they were originally and (2) reports that, when run again, update themselves by running the analysis on the latest data. Stata has always been strong on both, and we have added more features. I don’t want to downplay the additions, but neither do I want to discuss them. Click the title to learn about them.

I think what’s important is another aspect of what we did. The real problem was that we never told you how to use the reporting features. Now we do in an all-new manual. We tell you and we show you, with examples and workflows. Here’s a link to the manual so you can judge for yourself.

3. New meta-analysis suite

Stata is known for its community-contributed meta-analysis. Now there is an official StataCorp suite as well. It’s complete and easy to use. And yes, it has funnel plots and forest plots, and bubble plots and L’Abbé plots.

4. Revamped and expanded choice modeling (margins works everywhere)

Choice modeling is jargon for conditional logit, mixed logit, multinomial probit, and other procedures that model the probability of individuals making a particular choice from the alternatives available to each of them.

We added a new command to fit mixed logit models, and we rewrote all the rest. The new commands are easier to use and have new features. Old commands continue to work under version control.

margins can now be used after fitting any choice model. margins answers questions about counterfactuals and can even answer them for any one of the alternatives. You can finally obtain answers to questions like, “How would a \$10,000 increase in income affect the probability people take public transportation to work?”

The new commands are easier to use because you must first cmset your data. That may not sound like a simplification, but it simplifies the syntax of the remaining commands because it gets details out of the way. And it has another advantage. It tells Stata what your data should look like so Stata can run consistency checks and flag potential problems.

Finally, we created a new [CM] Choice Modeling Manual. Everything you need to know about choice modeling can now be found in one place.

5. Integration of Python with Stata

If you don’t know what Python is, put down your quill pen, dig out your acoustic modem and plug it in, push your telephone handset firmly into the coupler, and visit Wikipedia. Python has become an exceedingly popular programming language with extensive libraries for writing numerical, machine learning, and web scraping routines.

Stata’s new relationship with Python is the same as its relationship with Mata. You can use it interactively from the Stata prompt, in do-files, and in ado-files. You can even put Python subroutines at the bottom of ado-files, just as you do Mata subroutines. Or put both. Stata’s flexible.

Python can access Stata results and post results back to Stata using the Stata Function Interface (sfi), the Python module that we provide.

6. Bayesian predictions, multiple chains, and more

We have lots of new Bayesian features.

We now have multiple chains. Has the MCMC converged? Estimate models using multiple chains, and reported will be the maximum of Gelman-Rubin convergence diagnostic. If it has not yet converged, do more simulations. Still hasn’t converged? Now you can obtain the Gelman-Rubin convergence diagnostic for each parameter. If the same parameter turns up again and again as the culprit, you know where the problem lies.

We now provide Bayesian predictions for outcomes and functions of them. Bayesian predictions are calculated from the simulations that were run to fit your model, so there are a lot of them. The predictions will be saved in a separate dataset. Once you have the predictions, we provide commands so that you can graph summaries of them and perform hypothesis testing. And you can use them to obtain posterior predictive p-values to check the fit of your model.

There’s more. Click the title.

7. Extended regression models (ERMs) for panel data

ERMs fits models with problems. These problems can be any combination of (1) endogenous and exogenous sample selection, (2) endogenous covariates, also known as unobserved confounders, and (3) nonrandom treatment assignment.

What’s new is that ERMs can now be used to fit models with panel (2-level) data. Random effects are added to each equation. Correlations between the random effects are reported. You can test them, jointly or singly. And you can suppress them, jointly or singly.

Ermistatas got a fourth antenna.

8. Importing of SAS and SPSS datasets

New command import sas imports .sas7bdat data files and .sas7bcat value-label files.

New command import spss imports IBM SPSS version 16 or higher .sav and .zsav files.

I recommend using them from their dialog boxes. You can preview the data and select the variables and observations you want to import.

9. Flexible nonparametric series regression

New command npregress series fits models like

y = g(x1, x2, x3) + ε

No functional-form restrictions are placed on g(), but you can impose separability restrictions. The new command can fit

y = g1(x1) + g2(x2, x3) + ε

y = g1(x1, x2) + g3(x3) + ε

y = g1(x1, x3) + g2(x2) + ε

and even fit

y = b1x1 + g2(x2, x3) + ε

y = b1x1 + b2x2 + g3(x3) + ε

I mentioned that lasso can perform inference in models like

. dsregress y x1, controls(x2-x999)


If you know that variables x12, x19, and x122 appear in the model, but do not know the functional form, you could use npregress series to obtain inference. The command

. npregress series y x12 x19 x122, asis(x1)


fits

y = b1x1 + g2(x12, x19, x122) + ε

and, among other statistics, reports the coefficient and standard error of b1.

10. Multiple datasets in memory, meaning frames

I’m a sucker for data management commands. Even so, I do not think I’m exaggerating when I say that frames will change the way you work. If you are not interested, bear with me. I think I can change your mind.

You can have multiple datasets in memory. Each is stored in a named frame. At any instant, one of the frames is the current frame. Most Stata commands operate on the data in the current frame. It’s the commands that work across frames that will change the way you work, but before you can use them, you have to learn how to use frames. So here’s a bit of me using frames:

. use persons

. frame create counties

. frame counties: use counties

. tabulate cntyid

. frame counties: tabulate cntyid


Well, I’m thinking at this point, it appears I could merge persons.dta with counties.dta, except I’m not thinking about merging them. I’m thinking about linking them.

. frlink m:1 cntyid, frame(counties)


Linking is frame’s equivalent of merge. It does not change either dataset except to add one variable to the data in the current frame. New variable counties is created in this case. If I were to drop the variable, I would eliminate the link, but I’m not going to do that. I’m curious whether the counties in which people reside in persons.dta were all found in counties.dta. I can find out by typing

. count if counties==.


If 1,000 were reported, I would now drop counties, and it would be as if I had never linked the two frames.

Let’s assume count reported 0. Or 4, which is a small enough number that I don’t care for this demonstration. Now watch this:

. generate relinc = income / frget(counties, medinc)


I just calculated each person’s income relative to the median income in the county in which he or she resides, and median income was in the counties dataset, not the persons dataset!

Next, I will copy to the current frame all the variables in counties that start with pop. The command that does this, frget, will use the link and copy the appropriate observations.

. frget pop*, from(counties)

. describe pop*

. generate ln_pop18plus = ln(pop18plus)

. generate ln_income = ln(income)

. correlate ln_income ln_pop18plus


I hope I have convinced you that frames are of interest. If not, this is only one of the five ways frames will change how you work with Stata. Maybe one of the other four ways will convince you. Visit the overview of frames page at stata.com.

11. Sample-size analysis for confidence intervals

The goal is to optimally allocate study resources when CIs are to be used for inference or, said differently, to estimate the sample size required to achieve the desired precision of a CI in a planned study. One mean, two independent means, or two paired means. Or one variance.

12. Nonlinear DSGE models

DSGE stands for Dynamic Stochastic General Equilibrium. Stata previously fit linear DSGEs. Now it can fit nonlinear ones too.

I know this either interests you or does not, and if it does not, there will be no changing your mind. It interests me, and what makes the new feature spectacular is how easy models are to specify and how readable the code is afterwards. You could almost teach from it. If this interests you, click through.

13. Multiple-group IRT

IRT (Item Response Theory) is about the relationship between latent traits and the instruments designed to measure them. An IRT analysis might be about scholastic ability (the latent trait) and a college admission test (the instrument).

Stata 16’s new IRT features produce results for data containing different groups of people. Do instruments measure latent traits in the same way for different populations?

Here is an example. Do students in urban and rural schools perform differently on a test intended to measure mathematical ability? Using Stata 16, you can fit a 2-parameter logistic model comparing the groups by typing

. irt 2pl item1-item10, group(urbanrural)


What’s new is the group() option.

Does an instrument measuring depression perform the same today as it did five years ago? You can fit a graded-response model that compares the groups by typing

. irt grm item-item10, group(timecategory)


And IRT’s postestimation graphs have been updated to reveal the differences among groups when a group() model has been fit.

The examples I mentioned both concerned two groups, but IRT can handle any number of them.

14. Panel-data Heckman-selection models

Heckman selection adjusts for bias when some outcomes are missing not at random.

The classic example is economists’ modeling of wages. Wages are observed only for those who work, and whether you work is unlikely to be random. Think about it. Should I work or go to school? Should I work or live off my meager savings? Should I work or retire? Few people would be willing to make those decisions by flipping a coin.

If you worry about such problems and are using panel data, the new xtheckman command is the solution.

15-21. Seven more new features

I will summarize the last seven features briefly. My briefness makes them no less important, especially if they interest you.

15. NLMEs with lags: multiple-dose pharmacokinetic models and more can now be fit by Stata’s menl command for fitting nonlinear mixed-effects regression. This includes fitting multiple-dose models.

16. Heteroskedastic ordered probit joins the ordered probit models that Stata already could fit.

17. Graph sizes in inches, centimeters, and printer points can now be specified. Specify 1in, 1.4cm, or 12pt.

18. Programmers: Mata’s new Quadrature class numerically integrates y = f(x) over the interval a to b, where a may be -∞ or finite and b may be finite or +∞.

19. Programmers: Mata’s new Linear programming class solves linear programs using an interior-point method. It minimizes or maximizes a linear objective function subject to linear constraints (equality and inequality) and boundary conditions.

20. Do-file Editor: Autocompletion and more. The editor now provides syntax highlighting for Python and Markdown. And it autocompletes Stata commands, quotes, parentheses, braces, and brackets. Last but not least, spaces as well as tabs can be used for indentation.

21. Stata for Mac: Dark Mode and tabbed windows. Dark mode is a color scheme that darkens background windows and controls so that they do not cause eye strain or distract from what you are working on. Stata now supports it. Meanwhile, tabbed windows conserve screen real estate. Stata has lots of windows. With the exception of the Results window, they come and go as they are needed. Now you can combine all or some into one. Click the tab, change the window.

That’s it

The highlights are 58% of what’s new in Stata 16, measured by the number of text lines required to describe them. Here is a sampling of what else is new.

• ranksum has new option exact to specify that exact p-values be computed for the Wilcoxon rank-sum test.
• New setting set iterlog controls whether estimation commands display iteration logs.
• menl has new option lrtest that reports a likelihood-ratio test comparing the nonlinear mixed-effects model with the model fit by ordinary nonlinear regression.
• The bayes: prefix command now supports the new hetoprobit command so that you can fit Bayesian heteroskedastic ordered probits.
• The svy: prefix works with more estimation commands, namely, existing command hetoprobit and new commands cmmixlogit and cmxtmixlogit.
• New command export sasxport8 exports datasets to SAS XPORT Version 8 Transport format.
• New command splitsample splits data into random samples. It can create simple random samples, clustered samples, and balanced random samples. Balance splitting can be used for matched-treatment assignment.

I could go on. Type help whatsnew15to16 when you get your copy of Stata 16 to find out all that’s new.

I hope you enjoy Stata 16.

Categories: New Products Tags:

## Bayesian logistic regression with Cauchy priors using the bayes prefix

Introduction

Stata 15 provides a convenient and elegant way of fitting Bayesian regression models by simply prefixing the estimation command with bayes. You can choose from 45 supported estimation commands. All of Stata’s existing Bayesian features are supported by the new bayes prefix. You can use default priors for model parameters or select from many prior distributions. I will demonstrate the use of the bayes prefix for fitting a Bayesian logistic regression model and explore the use of Cauchy priors (available as of the update on July 20, 2017) for regression coefficients. Read more…

Categories: Statistics Tags:

## Gelman–Rubin convergence diagnostic using multiple chains

As of Stata 16, see [BAYES] bayesstats grubin and Bayesian analysis: Gelman-Rubin convergence diagnostic.

The original blog posted May 26, 2016, omitted option initrandom from the bayesmh command. The code and the text of the blog entry were updated on August 9, 2018, to reflect this.

Overview

MCMC algorithms used for simulating posterior distributions are indispensable tools in Bayesian analysis. A major consideration in MCMC simulations is that of convergence. Has the simulated Markov chain fully explored the target posterior distribution so far, or do we need longer simulations? A common approach in assessing MCMC convergence is based on running and analyzing the difference between multiple chains.

For a given Bayesian model, bayesmh is capable of producing multiple Markov chains with randomly dispersed initial values by using the initrandom option, available as of the update on 19 May 2016. In this post, I demonstrate the Gelman–Rubin diagnostic as a more formal test for convergence using multiple chains. For graphical diagnostics, see Graphical diagnostics using multiple chains in [BAYES] bayesmh for more details. To compute the Gelman–Rubin diagnostic, I use an unofficial command, grubin, which can be installed by typing the following in Stata: Read more…

Categories: Statistics Tags:

## Fitting distributions using bayesmh

This post was written jointly with Yulia Marchenko, Executive Director of Statistics, StataCorp.

As of update 03 Mar 2016, bayesmh provides a more convenient way of fitting distributions to the outcome variable. By design, bayesmh is a regression command, which models the mean of the outcome distribution as a function of predictors. There are cases when we do not have any predictors and want to model the outcome distribution directly. For example, we may want to fit a Poisson distribution or a binomial distribution to our outcome. This can now be done by specifying one of the four new distributions supported by bayesmh in the likelihood() option: dexponential(), dbernoulli(), dbinomial(), or dpoisson(). Previously, the suboption noglmtransform of bayesmh‘s option likelihood() was used to fit the exponential, binomial, and Poisson distributions to the outcome variable. This suboption continues to work but is now undocumented.

For examples, see Beta-binomial model, Bayesian analysis of change-point problem, and Item response theory under Remarks and examples in [BAYES] bayesmh.

We have also updated our earlier “Bayesian binary item response theory models using bayesmh” blog entry to use the new dbernoulli() specification when fitting 3PL, 4PL, and 5PL IRT models.

Categories: Statistics Tags:

## Bayesian binary item response theory models using bayesmh

This post was written jointly with Yulia Marchenko, Executive Director of Statistics, StataCorp.

Overview

Item response theory (IRT) is used for modeling the relationship between the latent abilities of a group of subjects and the examination items used for measuring their abilities. Stata 14 introduced a suite of commands for fitting IRT models using maximum likelihood; see, for example, the blog post Spotlight on irt by Rafal Raciborski and the [IRT] Item Response Theory manual for more details. In this post, we demonstrate how to fit Bayesian binary IRT models by using the redefine() option introduced for the bayesmh command in Stata 14.1. We also use the likelihood option dbernoulli() available as of the update on 03 Mar 2016 for fitting Bernoulli distribution. If you are not familiar with the concepts and jargon of Bayesian statistics, you may want to watch the introductory videos on the Stata Youtube channel before proceeding.

We use the abridged version of the mathematics and science data from DeBoeck and Wilson (2004), masc1. The dataset includes 800 student responses to 9 test questions intended to measure mathematical ability.

The irt suite fits IRT models using data in the wide form – one observation per subject with items recorded in separate variables. To fit IRT models using bayesmh, we need data in the long form, where items are recorded as multiple observations per subject. We thus reshape the dataset in a long form: we have a single binary response variable, y, and two index variables, item and id, which identify the items and subjects, respectively. This allows us to Read more…

Categories: Statistics Tags:

## Bayesian modeling: Beyond Stata’s built-in models

This post was written jointly with Nikolay Balov, Senior Statistician and Software Developer, StataCorp.

A question on Statalist motivated us to write this blog entry.

A user asked if the churdle command (http://www.stata.com/stata14/hurdle-models/) for fitting hurdle models, new in Stata 14, can be combined with the bayesmh command (http://www.stata.com/stata14/bayesian-analysis/) for fitting Bayesian models, also new in Stata 14:

http://www.statalist.org/forums/forum/general-stata-discussion/general/1290426-comibining-bayesmh-and-churdle

Our initial reaction to this question was ‘No’ or, more precisely, ‘Not easily’—hurdle models are not among the likelihood models supported by bayesmh. One can write a program to compute the log likelihood of the double hurdle model and use this program with bayesmh (in the spirit of http://www.stata.com/stata14/bayesian-evaluators/), but this may seem like a daunting task if you are not familiar with Stata programming.

And then we realized, why not simply call churdle from the evaluator to compute the log likelihood? All we need is for churdle to evaluate the log likelihood at specific values of model parameters without performing iterations. This can be achieved by specifying churdle‘s options from() and iterate(0). Read more…

Categories: Statistics Tags:

## Stata 14 announced, ships

We’ve just announced the release of Stata 14. Stata 14 ships and downloads starting now.

I just posted on Statalist about it. Here’s a copy of what I wrote.

Stata 14 is now available. You heard it here first.

There’s a long tradition that Statalisters hear about Stata’s new releases first. The new forum is celebrating its first birthday, but it is a continuation of the old Statalist, so the tradition continues, but updated for the modern world, where everything happens more quickly. You are hearing about Stata 14 roughly a microsecond before the rest of the world. Traditions are important.

Here’s yet another example of everything happening faster in the modern world. Rather than the announcement preceding shipping by a few weeks as in previous releases, Stata 14 ships and downloads starting now. Or rather, a microsecond from now.

Some things from the past are worth preserving, however, and one is that I get to write about the new release in my own idiosyncratic way. So let me get the marketing stuff out of the way and then I can tell you about a few things that especially interest me and might interest you.

MARKETING BEGINS.

Here’s a partial list of what’s new, a.k.a. the highlights:

• Unicode
• More than 2 billion observations (Stata/MP)
• Bayesian analysis
• IRT (Item Response Theory)
• Panel-data survival models
• Treatment effects
• Treatment effects for survival models
• Endogenous treatments
• Probability weights
• Balance analysis
• Multilevel mixed-effects survival models
• Small-sample inference for multilevel models
• SEM (structural equation modeling)
• Survival models
• Satorra-Bentler scaled chi-squared test
• Survey data
• Multilevel weights
• Power and sample size
• Survival models
• Contingency (epidemiological) tables
• Markov-switching regression models
• Tests for structural breaks in time-series
• Fractional outcome regression models
• Hurdle models
• Censored Poisson regression
• Survey support & multilevel weights for multilevel models
• New random-number generators
• Estimated marginal means and marginal effects
• Tables for multiple outcomes and levels
• Integration over unobserved and latent variables
• ICD-10
• Stata in Spanish and in Japanese

The above list is not complete; it lists about 30% of what’s new.

For all the details about Stata 14, including purchase and update information, and links to distributors outside of the US, visit stata.com/stata14.

If you are outside of the US, you can order from your authorized Stata distributor. They will supply codes so that you can access and download from stata.com.

MARKETING ENDS.

I want to write about three of the new features ‒ Unicode, more than 2-billion observations, and Bayesian analysis.

Unicode is the modern way that computers encode characters such as the letters in what you are now reading. Unicode encodes all the world’s characters, meaning I can write Hello, Здравствуйте, こんにちは, and lots more besides. Well, the forum software is modern and I always could write those words here. Now I can write them in Stata, too.

For those who care, Stata uses Unicode’s UTF-8 encoding.

Anyway, you can use Unicode characters in your data, of course; in your variable labels, of course; and in your value labels, of course. What you might not expect is that you can use Unicode in your variable names, macro names, and everywhere else Stata wants a name or identifier.

Here’s the auto data in Japanese:

Your use of Unicode may not be as extreme as the above. It might be enough just to make tables and graphs labeled in languages other than English. If so, just set the variable labels and value labels. It doesn’t matter whether the variables are named übersetzung and kofferraum or gear_ratio and trunkspace or 変速比 and トランク.

I want to remind English speakers that Unicode includes mathematical symbols. You can use them in titles, axis labels, and the like.

Few good things come without cost. If you have been using Extended ASCII to circumvent Stata’s plain ASCII limitations, those files need to be translated to Unicode if the strings in them are to display correctly in Stata 14. This includes .dta files, do-files, ado-files, help files, and the like. It’s easier to do than you might expect. A new unicode analyze command will tell you whether you have files that need fixing and, if so, the new unicode translate command will fix them for you. It’s almost as easy as typing

. unicode translate *

This command translates your files and that has got to concern you. What if it mistranslates them? What if the power fails? Relax. unicode translate makes backups of the originals, and it keeps the backups until you delete them, which you have to do by typing

Yes, the option really is named badidea and it is not optional. Another unicode command can restore the backups.

The difficult part of translating your existing files is not performing the translation, it’s determining which Extended ASCII encoding your files used so that the translation can be performed. We have advice on that in the help files but, even so, some of you will only be able to narrow down the encoding to a few choices. The good news is that it is easy to try each one. You just type

. unicode retranslate *

It won’t take long to figure out which encoding works best.

Stata/MP now allows you to process datasets containing more than 2.1-billion observations. This sounds exciting, but I suspect it will interest only a few of you. How many of us have datasets with more than 2.1-billion observations? And even if you do, you will need a computer with lots of memory. This feature is useful if you have access to a 512-gigabyte, 1-terabyte, or 1.5-terabyte computer. With smaller computers, you are unlikely to have room for 2.1 billion observations. It’s exciting that such computers are available.

We increased the limit on only Stata/MP because, to exploit the higher limit, you need multiple processors. It’s easy to misjudge how much larger a 2-billion observation dataset is than a 2-million observation one. On my everyday 16 gigabyte computer ‒ which is nothing special ‒ I just fit a linear regression with six RHS variables on 2-million observations. It ran in 1.2 seconds. I used Stata/SE, and the 1.2 seconds felt fast. So, if my computer had more memory, how long would it take to fit a model on 2-billion observations? 1,200 seconds, which is to say, 20 minutes! You need Stata/MP. Stata/MP4 will reduce that to 5 minutes. Stata/MP32 will reduce that to 37.5 seconds.

By the way, if you intend to use more than 2-billion observations, be sure to click on help obs_advice that appears in the start-up notes after Stata launches. You will get better performance if you set min_memory and segmentsize to larger values. We tell you what values to set.

There’s quite a good discussion about dealing with more than 2-billion observations at stata.com/stata14/huge-datasets.

After that, it’s statistics, statistics, statistics.

Which new statistics will interest you obviously depends on your field. We’ve gone deeper into a number of fields. Treatment effects for survival models is just one example. Multilevel survival models is another. Markov-switching models is yet another. Well, you can read the list above.

Two of the new statistical features are worth mentioning, however, because they simply weren’t there previously. They are Bayesian analysis and IRT models, which are admittedly two very different things.

IRT is a highlight of the release and for some of it you will be the highlight, so I mention it, and I’ll just tell you to see stata.com/stata14/irt for more information.

Bayesian analysis is the other highlight as far as I’m concerned, and it will interest a lot of you because it cuts across fields. Many of you are already knowledgeable about this and I can just hear you asking, “Does Stata include …?” So here’s the high-speed summary:

Stata fits continuous-, binary-, ordinal-, and count-outcome models. And linear and nonlinear models. And generalized nonlinear models. Univariate, multivariate, and multiple-equation. It provides 10 likelihood models and 18 prior distributions. It also allows for user-defined likelihoods combined with built-in priors, built-in likelihoods combined with user-defined priors, and a roll-your-own programming approach to calculate the posterior density directly. MCMC methods are provided, including Adaptive Metropolis-Hastings (MH), Adaptive MH with Gibbs updates, and full Gibbs sampling for certain likelihoods and priors.

It’s also easy to use and that’s saying something.

There’s a great example of the new Bayes features in The Stata News. I mention this because including the example there is nearly a proof of ease of use. The example looks at the number of disasters in the British coal mining industry. There was a fairly abrupt decrease in the rate sometime between 1887 and 1895, which you see if you eyeballed a graph. In the example, we model the number of disasters before the change point as one Poisson process; the number after, as another Poisson process; and then we fit a model of the two Poisson parameters and the date of change. For the change point it uses a uniform prior on [1851, 1962] ‒ the range of the data ‒ and obtains a posterior mean estimate of 1890.4 and a 95% credible interval of [1886, 1896], which agrees with our visual assessment.

I hope something I’ve written above interests you. Visit stata.com/stata14 for more information.

‒ Bill
wgould@stata.com