Archive

Author Archive

Announcing StataNow

One of the most exciting times for us at StataCorp (and hopefully for you as well) is when we get to announce a new version of Stata, full of new features. Now, we hope to experience that feeling with you much more often.

Historically, we have released a new major version of Stata roughly every two years. We will still continue to do that, but most users will now have access to StataNow – a continuous-release Stata. StataNow gives you access to new features now, as soon as they are ready from the development, testing, and documentation groups. The features in StataNow are some of the same features that will also eventually appear in the next major release of Stata. StataNow users will get additional features on a continuous basis throughout the lifetime of a release.

You can read more about StataNow, including how to get it, and you can see its initial set of additional features. But let me tell you a little more about it here.

Many of you create features in Stata that you share with others via your own sites, the SSC archive, and the Stata Journal. And all of you write your own do-files as you perform your analyses in Stata. Knowing this, let me share with you a few technical details about StataNow.

First, StataNow is Stata. To be exact, the current Stata that most of you have is Stata 18.0. StataNow is Stata 18.5 (which we will call StataNow 18.5 from now on). When you are using StataNow, you should start your programs and do-files with version 18.5, just as you previously started them with version 18.0. Why is the version number different? Because StataNow is newer than Stata 18.0, and it is possible something in it will need to be version-controlled differently than in Stata 18. This is no different than when a new release comes out and it has a different version, 16.0, 17.0, 18.0, etc. As always, StataNow is backward compatible, so any programs, do-files, datasets, and so on from earlier versions will work, without changes, in StataNow.

What if we need to version-control something simultaneously in both Stata and StataNow? We would then release Stata 18.1 and StataNow 18.6.

The documentation and help files for Stata 18.0 and StataNow 18.5 are the same. StataNow features are included in them and clearly marked as such.

The dataset format in StataNow is the same as in Stata.

What are the new features in StataNow, and how often will we add features to StataNow? See the current set of new features. There is no set schedule for releasing new features, but we anticipate new features will be released fairly often – several times a year. We will release no new feature before its time, which means that anything released in StataNow is fully official, tested, validated, certified, and documented, just as all the features we put out in a new release of Stata.

When Stata 19 eventually comes out, it will of course include all the features that have come out along the way in StataNow as well as some additional new ones. Users of StataNow will automatically be able to upgrade to Stata 19 — actually, they will upgrade to StataNow 19.5 when Stata 19.0 comes out, and over time StataNow 19.5 will get additional features as soon as they are ready from the Stata elves.

We are excited to be able to give you the new features we add to Stata on a continuous basis, getting them into your hands sooner!

Stata 18 released

Stata 17 released

We just announced Stata 17. Visit stata.com/new-in-stata to read all about its 29 major new features.

They are

Looking over this list of features, someone suggested that a potential marketing tagline for Stata 17 could be “Better. Faster. Stronger.” I thought Daft Punk might not like it if we used that, so we aren’t, but really, it is a great overall description of the new version.

I’ll share my thoughts on some of the new features below.

 

Customizable tables

 
There has been a long tradition in the Stata user community of commands that build various tables. These are among some of the most-used community-contributed commands! There has been an equally long tradition of the Stata user community asking us to provide more official features to assist with flexible table creation and export. Stata’s table command has been completely revamped, and a new collect command allows you to gather and manage results from multiple commands, which can then be shown in tabular form. Excel, HTML, LaTeX, Markdown, PDF, Stata SMCL, Word, and plain text are supported as export formats. I suspect almost all users will be adding this to their Stata repertoire.

 

Bayesian econometrics

 
In Stata 17, we have added many features for Bayesian econometrics, including

Many of you have been asking us for Bayesian VAR models. That’s not surprising. VAR models have many parameters but often not enough data to estimate them reliably. The Bayesian approach provides a solution by incorporating specialized priors to allow you to obtain more stable parameter estimates.

As with classical VAR models, you can perform IRF analysis and obtain dynamic forecasts but now within the Bayesian paradigm.

Bayesian panel-data models are appealing when you have few panels or when you would like to study and compare panel-specific effects.

With Bayesian DSGE models, prior distributions give you a natural way to incorporate knowledge about model parameters that is motivated by the economic theory.

As with Stata’s other Bayesian features, our aim is to make specification of these models as intuitive as possible and as similar to the specifications of the frequentist counterparts as possible.

All the above is in addition to other existing Bayesian features of interest to econometricians such as Bayesian generalized linear models and Bayesian sample-selection models.

 

Bayesian multilevel models

 
Stata users span many disciplines. In addition to the new Bayesian features above that will be of most interest to econometricians, Stata 17 also adds Bayesian multilevel modeling with support for nonlinear, joint, SEM-like, and even more models. One notable feature is the ability to fit multivariate nonlinear models containing random effects such as multivariate nonlinear growth models.

 

Give me more speed

 
As computer capabilities have grown, so have dataset sizes. With larger datasets and more computationally intensive methods comes the above request, “Give me more speed.” Sometimes, our developers say, “I’m giving her all she’s got, Captain!”, but for Stata 17, they gave us all more.

We achieved speed gains in part through careful algorithm selection and implementation and in part through integration of the Intel Math Kernel Library (MKL) to underpin many of Mata’s linear algebra functions and operators.

Read the details, including tables of specific speed gains.

 

Difference-in-differences (DID) models

 
DID and difference-in-difference-in-differences (DDD) models are appealing to many disciplines, including econometrics, epidemiology, political science, public policy, and many more. If you are studying the effect of a treatment (such as a drug regimen or policy) in observational data and are concerned that the effect may be influenced by time or by some other group effects, DID and DDD models provide intuitive methods to control for such unobserved effects.

 

New meta-analysis features

 
Stata 17 adds the following to the excellent (IMHO) meta-analysis suite introduced in Stata 16:

If you recognize the above, you know you want them!

And we have made these new features just as easy to use as the rest of the meta suite.

 

Interval-censored Cox model

 
The Cox proportional hazards model is used routinely by researchers from many disciplines to analyze right-censored event-time data, where time to an event of interest is observed exactly. In Stata 17, you can use it with interval-censored event-time data too!

With interval-censored event-time data, we only know that the time to an event of interest lies in an interval. For instance, think of time to cancer recurrence or to an infection or, for that matter, to any asymptomatic disease that can be detected only through periodic examinations. These are all interval-censored data, and you now have a new powerful tool to analyze them.

 

New lasso features

 
Stata 17 adds the following to the popular lasso features first introduced in Stata 16:

In fact, treatment-effects lasso combines two popular features: treatment effects and lasso. You can now incorporate many (hundreds, thousands, and more) covariates in your treatment-effects models.

You can account for clustered observations in your lasso analysis. And you can use the BIC criterion to select lasso penalty parameters.

 

Integration with other software and languages

 
Two of Stata’s great features are its extensibility and reproducibility. Stata 17 builds on that tradition by greatly enhancing its interoperability with Python and Java, adding support for Jupyter Notebook, adding JDBC support, and giving you experimental access to the H2O platform.

You could already call Python code from Stata code. Now you can call Stata from any stand-alone Python environment. Do you write Python code in Jupyter Notebook, Spyder IDE, or PyCharm IDE? Now you can call Stata directly from those environments, passing data, metadata, and results between Stata and Python seamlessly. Even your Stata graphs will show up directly in Jupyter Notebook. Our nickname for all the ways you can connect Python and Stata is ‘PyStata‘.

For Java, you could already compile Java code into .jar files and call those as plugins from Stata. Now you can embed Java code directly in your Stata do-files and ado-files, just like Mata code and Python code. Stata will compile and execute your Java code on the fly. You can interchange data, metadata, and results at will.

One of the first steps of any analysis is importing your data. Stata 17 supports JDBC for importing data from and writing data to databases that provide JDBC drivers. An important advantage JDBC has over ODBC is that JDBC drivers are platform independent, so if a database vendor provides a JDBC driver, it will work seamlessly on Windows, Mac, and Linux.

Finally, some of our developers have been experimenting with connecting Stata to H2O, a scalable and distributed open-source machine-learning and predictive analytics platform. We decided to release our experiment to you. Is this something you’d like us to do more with? We look forward to your feedback.

 

New date and time functions

 
Stata 17 adds a plethora of date and time convenience functions in three main areas:

  • Datetime durations, such as ages
  • Relative dates, such as the next birthday relative to a reference date
  • Datetime components, functions that extract various components from datetime values

You’ll undoubtedly find these make your life easier when working with date and time values.

 

New Do-file Editor features

 
Don’t miss the enhancements in the Do-file Editor, including persistent bookmarks that are saved with your code, a Navigation Control providing quick access to those bookmarks and defined programs, syntax highlighting support for Java and XML (in addition to the existing support for Stata ado, Python, and Markdown), and autocompletion of quotes, parentheses, and brackets around a selection.

 

And there’s more

 
Health scientists who deal with ordinal outcomes with an overabundance of values in the lowest category will want to try the new ziologit command.

Those of you interested in panel data and categorical outcomes will be pleased to know that you can now analyze both together easily with the new xtmlogit command.

And for those of you interested in nonparametric tests of trend, we have added three new tests to the existing nptrend command.

Finally, Stata 17 runs fully natively on Apple’s new M1 Macs, known as Apple Silicon. Stata ships as a universal application that has everything necessary to run natively on both M1 Macs and Intel-based Macs for the best performance no matter your choice of hardware platform.

It has been a lot of fun to see this release come together at StataCorp, and it is a tremendous pleasure to be able to release it to you.

Compatibility and reproducibility

I saw a tweet the other day where someone claimed that StataCorp ensures that the dataset format in Stata X is always different from Stata X-1.

This reminded me of an email I wrote a few years ago to a user who had questions about backward compatibility and reproducibility. I’m going to use large parts of that email in this blog post to share my thoughts on those topics.

I understand the frustration of incompatibilities between software versions. While it may not ease the inevitable difficulties that arise, I would like to explain our efforts in this regard. Read more…

Automating web downloads and file unzipping

Andrew J. Dyck wrote a nice post on his blog on how to Download and unzip data files from Stata. He writes

Recently, I’ve been using Stata’s -shp2dta- command to convert some shapefiles to stata format, grabbing Lat/Lon data and merging into another dataset. There were several compressed shapefiles I wanted to download contained in a directory from the web. I could manually download each file and uncompress each one but that would be time consuming. Also, when the maps are updated, I’d have to do the download/uncompress all over again. I’ve found that the process can be automated from within Stata by using a combination of -shell- and some handy terminal commands. …

You should read the rest of his post. He goes on to show how you can script with Stata to automate shelling out to download and unzip a series of files from a website, and he introduces you to some cool Unix-like utilities for Windows.

We here at StataCorp use Stata for tasks like this all the time. In fact, we have built some tools into Stata to allow you to do much of what Andrew described without ever having to leave or shell out of Stata. Read more…

Categories: Programming Tags: , , , ,

Big computers

We here at Stata are often asked to make recommendations on the “best” computer on which to run Stata, and such discussions sometimes pop up on Statalist. Of course, there is no simple answer, as it depends on the analyses a given user wishes to run, the size of their datasets, and their budget. And, we do not recommend particular computer or operating system vendors. Many manufacturers use similar components in their computers, and the choice of operating system comes down to personal preference of the user. We take pride in making sure Stata works well regardless of operating system and hardware configuration.

For some users, the analyses they wish to run are demanding, the datasets they have are huge, and their budgets are large. For these users, it is useful to know what kind of off-the-shelf hardware they can easily get their hands on. To give you an idea of what is available, HP makes a server with up to 1 TB of memory. Yes, 1 terabyte! This computer can be configured and ordered online at hp.com. Read more…

Categories: Hardware Tags: , , , , , ,