Python Archives - The Stata Blog

Announcing StataNow

30 April 2024 Alan Riley, President No comments

One of the most exciting times for us at StataCorp (and hopefully for you as well) is when we get to announce a new version of Stata, full of new features. Now, we hope to experience that feeling with you much more often.

Historically, we have released a new major version of Stata roughly every two years. We will still continue to do that, but most users will now have access to StataNow – a continuous-release Stata. StataNow gives you access to new features now, as soon as they are ready from the development, testing, and documentation groups. The features in StataNow are some of the same features that will also eventually appear in the next major release of Stata. StataNow users will get additional features on a continuous basis throughout the lifetime of a release.

You can read more about StataNow, including how to get it, and you can see its initial set of additional features. But let me tell you a little more about it here.

Many of you create features in Stata that you share with others via your own sites, the SSC archive, and the Stata Journal. And all of you write your own do-files as you perform your analyses in Stata. Knowing this, let me share with you a few technical details about StataNow.

First, StataNow is Stata. To be exact, the current Stata that most of you have is Stata 18.0. StataNow is Stata 18.5 (which we will call StataNow 18.5 from now on). When you are using StataNow, you should start your programs and do-files with version 18.5, just as you previously started them with version 18.0. Why is the version number different? Because StataNow is newer than Stata 18.0, and it is possible something in it will need to be version-controlled differently than in Stata 18. This is no different than when a new release comes out and it has a different version, 16.0, 17.0, 18.0, etc. As always, StataNow is backward compatible, so any programs, do-files, datasets, and so on from earlier versions will work, without changes, in StataNow.

What if we need to version-control something simultaneously in both Stata and StataNow? We would then release Stata 18.1 and StataNow 18.6.

The documentation and help files for Stata 18.0 and StataNow 18.5 are the same. StataNow features are included in them and clearly marked as such.

The dataset format in StataNow is the same as in Stata.

What are the new features in StataNow, and how often will we add features to StataNow? See the current set of new features. There is no set schedule for releasing new features, but we anticipate new features will be released fairly often – several times a year. We will release no new feature before its time, which means that anything released in StataNow is fully official, tested, validated, certified, and documented, just as all the features we put out in a new release of Stata.

When Stata 19 eventually comes out, it will of course include all the features that have come out along the way in StataNow as well as some additional new ones. Users of StataNow will automatically be able to upgrade to Stata 19 — actually, they will upgrade to StataNow 19.5 when Stata 19.0 comes out, and over time StataNow 19.5 will get additional features as soon as they are ready from the Stata elves.

We are excited to be able to give you the new features we add to Stata on a continuous basis, getting them into your hands sooner!

Categories: New Products, Stata Products Tags: biostatistics, econometrics, Mata, new release, Python, StataNow, statistics, time series

A Stata command to run ChatGPT

25 July 2023 Chuck Huber, Director of Statistical Outreach No comments

Artificial intelligence (AI) is a popular topic in the media these days, and ChatGPT is, perhaps, the most well-known AI tool. I recently tweeted that I had written a Stata command called chatgpt for myself that runs ChatGPT. I promised to explain how I did it, so here is the explanation. Read more…

Categories: Programming Tags: ado, artificial intelligence, chatgpt, programming, PyStata, Python, stata

Stata 17 released

20 April 2021 Alan Riley, President No comments

We just announced Stata 17. Visit stata.com/new-in-stata to read all about its 29 major new features.

They are

Looking over this list of features, someone suggested that a potential marketing tagline for Stata 17 could be “Better. Faster. Stronger.” I thought Daft Punk might not like it if we used that, so we aren’t, but really, it is a great overall description of the new version.

I’ll share my thoughts on some of the new features below.

Customizable tables

There has been a long tradition in the Stata user community of commands that build various tables. These are among some of the most-used community-contributed commands! There has been an equally long tradition of the Stata user community asking us to provide more official features to assist with flexible table creation and export. Stata’s table command has been completely revamped, and a new collect command allows you to gather and manage results from multiple commands, which can then be shown in tabular form. Excel, HTML, LaTeX, Markdown, PDF, Stata SMCL, Word, and plain text are supported as export formats. I suspect almost all users will be adding this to their Stata repertoire.

Bayesian econometrics

In Stata 17, we have added many features for Bayesian econometrics, including

Many of you have been asking us for Bayesian VAR models. That’s not surprising. VAR models have many parameters but often not enough data to estimate them reliably. The Bayesian approach provides a solution by incorporating specialized priors to allow you to obtain more stable parameter estimates.

As with classical VAR models, you can perform IRF analysis and obtain dynamic forecasts but now within the Bayesian paradigm.

Bayesian panel-data models are appealing when you have few panels or when you would like to study and compare panel-specific effects.

With Bayesian DSGE models, prior distributions give you a natural way to incorporate knowledge about model parameters that is motivated by the economic theory.

As with Stata’s other Bayesian features, our aim is to make specification of these models as intuitive as possible and as similar to the specifications of the frequentist counterparts as possible.

All the above is in addition to other existing Bayesian features of interest to econometricians such as Bayesian generalized linear models and Bayesian sample-selection models.

Bayesian multilevel models

Stata users span many disciplines. In addition to the new Bayesian features above that will be of most interest to econometricians, Stata 17 also adds Bayesian multilevel modeling with support for nonlinear, joint, SEM-like, and even more models. One notable feature is the ability to fit multivariate nonlinear models containing random effects such as multivariate nonlinear growth models.

Give me more speed

As computer capabilities have grown, so have dataset sizes. With larger datasets and more computationally intensive methods comes the above request, “Give me more speed.” Sometimes, our developers say, “I’m giving her all she’s got, Captain!”, but for Stata 17, they gave us all more.

We achieved speed gains in part through careful algorithm selection and implementation and in part through integration of the Intel Math Kernel Library (MKL) to underpin many of Mata’s linear algebra functions and operators.

Read the details, including tables of specific speed gains.

Difference-in-differences (DID) models

DID and difference-in-difference-in-differences (DDD) models are appealing to many disciplines, including econometrics, epidemiology, political science, public policy, and many more. If you are studying the effect of a treatment (such as a drug regimen or policy) in observational data and are concerned that the effect may be influenced by time or by some other group effects, DID and DDD models provide intuitive methods to control for such unobserved effects.

New meta-analysis features

Stata 17 adds the following to the excellent (IMHO) meta-analysis suite introduced in Stata 16:

If you recognize the above, you know you want them!

And we have made these new features just as easy to use as the rest of the meta suite.

Interval-censored Cox model

The Cox proportional hazards model is used routinely by researchers from many disciplines to analyze right-censored event-time data, where time to an event of interest is observed exactly. In Stata 17, you can use it with interval-censored event-time data too!

With interval-censored event-time data, we only know that the time to an event of interest lies in an interval. For instance, think of time to cancer recurrence or to an infection or, for that matter, to any asymptomatic disease that can be detected only through periodic examinations. These are all interval-censored data, and you now have a new powerful tool to analyze them.

New lasso features

Stata 17 adds the following to the popular lasso features first introduced in Stata 16:

In fact, treatment-effects lasso combines two popular features: treatment effects and lasso. You can now incorporate many (hundreds, thousands, and more) covariates in your treatment-effects models.

You can account for clustered observations in your lasso analysis. And you can use the BIC criterion to select lasso penalty parameters.

Integration with other software and languages

Two of Stata’s great features are its extensibility and reproducibility. Stata 17 builds on that tradition by greatly enhancing its interoperability with Python and Java, adding support for Jupyter Notebook, adding JDBC support, and giving you experimental access to the H2O platform.

You could already call Python code from Stata code. Now you can call Stata from any stand-alone Python environment. Do you write Python code in Jupyter Notebook, Spyder IDE, or PyCharm IDE? Now you can call Stata directly from those environments, passing data, metadata, and results between Stata and Python seamlessly. Even your Stata graphs will show up directly in Jupyter Notebook. Our nickname for all the ways you can connect Python and Stata is ‘PyStata‘.

For Java, you could already compile Java code into .jar files and call those as plugins from Stata. Now you can embed Java code directly in your Stata do-files and ado-files, just like Mata code and Python code. Stata will compile and execute your Java code on the fly. You can interchange data, metadata, and results at will.

One of the first steps of any analysis is importing your data. Stata 17 supports JDBC for importing data from and writing data to databases that provide JDBC drivers. An important advantage JDBC has over ODBC is that JDBC drivers are platform independent, so if a database vendor provides a JDBC driver, it will work seamlessly on Windows, Mac, and Linux.

Finally, some of our developers have been experimenting with connecting Stata to H2O, a scalable and distributed open-source machine-learning and predictive analytics platform. We decided to release our experiment to you. Is this something you’d like us to do more with? We look forward to your feedback.

New date and time functions

Stata 17 adds a plethora of date and time convenience functions in three main areas:

Datetime durations, such as ages
Relative dates, such as the next birthday relative to a reference date
Datetime components, functions that extract various components from datetime values

You’ll undoubtedly find these make your life easier when working with date and time values.

New Do-file Editor features

Don’t miss the enhancements in the Do-file Editor, including persistent bookmarks that are saved with your code, a Navigation Control providing quick access to those bookmarks and defined programs, syntax highlighting support for Java and XML (in addition to the existing support for Stata ado, Python, and Markdown), and autocompletion of quotes, parentheses, and brackets around a selection.

And there’s more

Health scientists who deal with ordinal outcomes with an overabundance of values in the lowest category will want to try the new ziologit command.

Those of you interested in panel data and categorical outcomes will be pleased to know that you can now analyze both together easily with the new xtmlogit command.

And for those of you interested in nonparametric tests of trend, we have added three new tests to the existing nptrend command.

Finally, Stata 17 runs fully natively on Apple’s new M1 Macs, known as Apple Silicon. Stata ships as a universal application that has everything necessary to run natively on both M1 Macs and Intel-based Macs for the best performance no matter your choice of hardware platform.

It has been a lot of fun to see this release come together at StataCorp, and it is a tremendous pleasure to be able to release it to you.

Categories: New Products Tags: Bayesian, biostatistics, data science, econometrics, java, Jupyter Notebook, lasso, meta-analysis, new release, Python, Stata 17, statistics

Stata/Python integration part 9: Using the Stata Function Interface to copy data from Python to Stata

19 November 2020 Chuck Huber, Director of Statistical Outreach No comments

In my previous post, we learned how to use the Stata Function Interface (SFI) module to copy data from Stata to Python. In this post, I will show you how to use the SFI module to copy data from Python to Stata. We will be using the yfinance module to download financial data from the Yahoo! finance website. You can install this module in your Python environment by typing pip install yfinance. Our goal is to use Python to download historical data for the Dow Jones Industrial Average (DJIA) and use Stata to create the following graph. Read more…

Categories: Programming Tags: Python, stocks, yahoo finance

Stata/Python integration part 8: Using the Stata Function Interface to copy data from Stata to Python

5 November 2020 Chuck Huber, Director of Statistical Outreach No comments

In my previous posts, I used the read_stata() method to read Stata datasets into pandas data frames. This works well when you want to read an entire Stata dataset into Python. But sometimes we wish to read a subset of the variables or observations, or both, from a Stata dataset into Python. In this post, I will introduce you to the Stata Function Interface (SFI) module and show you how to use it to read partial datasets into a pandas data frame. Read more…

Categories: Programming Tags: Python, stocks, yahoo finance

Stata/Python integration part 7: Machine learning with support vector machines

13 October 2020 Chuck Huber, Director of Statistical Outreach No comments

Machine learning, deep learning, and artificial intelligence are a collection of algorithms used to identify patterns in data. These algorithms have exotic-sounding names like “random forests”, “neural networks”, and “spectral clustering”. In this post, I will show you how to use one of these algorithms called a “support vector machines” (SVM). I don’t have space to explain an SVM in detail, but I will provide some references for further reading at the end. I am going to give you a brief introduction and show you how to implement an SVM with Python.

Our goal is to use an SVM to differentiate between people who are likely to have diabetes and those who are not. We will use age and HbA1c level to differentiate between people with and without diabetes. Age is measured in years, and HbA1c is a blood test that measures glucose control. The graph below displays diabetics with red dots and nondiabetics with blue dots. An SVM model predicts that older people with higher levels of HbA1c in the red-shaded area of the graph are more likely to have diabetes. Younger people with lower HbA1c levels in the blue-shaded area are less likely to have diabetes. Read more…

Categories: Programming Tags: artificial intelligence, cross validation, machine learning, Python, support vector machines

Stata/Python integration part 6: Working with APIs and JSON data

29 September 2020 Chuck Huber, Director of Statistical Outreach No comments

Data are everywhere. Many government agencies, financial institutions, universities, and social media platforms provide access to their data through an application programming interface (API). APIs often return the requested data in a JavaScript Object Notation (JSON) file. In this post, I will show you how to use Python to request data with API calls and how to work with the resulting JSON data. Read more…

Categories: Programming Tags: api, data science, json, openFDA, Python

Stata/Python integration part 5: Three-dimensional surface plots of marginal predictions

14 September 2020 Chuck Huber, Director of Statistical Outreach No comments

In my first four posts about Stata and Python, I showed you how to set up Stata to use Python, three ways to use Python in Stata, how to install Python packages, and how to use Python packages. It might be helpful to read those posts before you continue with this post if you are not familiar with Python. Now, I’d like to shift our focus to some practical uses of Python within Stata. This post will demonstrate how to use Stata to estimate marginal predictions from a logistic regression model and use Python to create a three-dimensional surface plot of those predictions.

Stata/Python integration part 4: How to use Python packages

10 September 2020 Chuck Huber, Director of Statistical Outreach No comments

In my last post, I showed you how to use pip to install four popular packages for Python. Today I want to show you the basics of how to import and use Python packages. We will learn some important Python concepts and jargon along the way. I will be using the pandas package in the examples below, but the ideas and syntax are the same for other Python packages. Read more…

Categories: Programming Tags: programming, Python

Stata/Python integration part 3: How to install Python packages

1 September 2020 Chuck Huber, Director of Statistical Outreach No comments

In my last post, I showed you three ways to use Python within Stata. The examples were simple but they allowed us to start using Python. At this point, you could write your own Python programs within Stata. But the real power of Python lies in the thousands of freely available packages. Today, I want to show you how to download and install Python packages. Read more…