Archive

Posts Tagged ‘Python’

A Stata command to run ChatGPT

Artificial intelligence (AI) is a popular topic in the media these days, and ChatGPT is, perhaps, the most well-known AI tool. I recently tweeted that I had written a Stata command called chatgpt for myself that runs ChatGPT. I promised to explain how I did it, so here is the explanation. Read more…

Stata 17 released

We just announced Stata 17. Visit stata.com/new-in-stata to read all about its 29 major new features.

They are

Looking over this list of features, someone suggested that a potential marketing tagline for Stata 17 could be “Better. Faster. Stronger.” I thought Daft Punk might not like it if we used that, so we aren’t, but really, it is a great overall description of the new version.

I’ll share my thoughts on some of the new features below.

 

Customizable tables

 
There has been a long tradition in the Stata user community of commands that build various tables. These are among some of the most-used community-contributed commands! There has been an equally long tradition of the Stata user community asking us to provide more official features to assist with flexible table creation and export. Stata’s table command has been completely revamped, and a new collect command allows you to gather and manage results from multiple commands, which can then be shown in tabular form. Excel, HTML, LaTeX, Markdown, PDF, Stata SMCL, Word, and plain text are supported as export formats. I suspect almost all users will be adding this to their Stata repertoire.

 

Bayesian econometrics

 
In Stata 17, we have added many features for Bayesian econometrics, including

Many of you have been asking us for Bayesian VAR models. That’s not surprising. VAR models have many parameters but often not enough data to estimate them reliably. The Bayesian approach provides a solution by incorporating specialized priors to allow you to obtain more stable parameter estimates.

As with classical VAR models, you can perform IRF analysis and obtain dynamic forecasts but now within the Bayesian paradigm.

Bayesian panel-data models are appealing when you have few panels or when you would like to study and compare panel-specific effects.

With Bayesian DSGE models, prior distributions give you a natural way to incorporate knowledge about model parameters that is motivated by the economic theory.

As with Stata’s other Bayesian features, our aim is to make specification of these models as intuitive as possible and as similar to the specifications of the frequentist counterparts as possible.

All the above is in addition to other existing Bayesian features of interest to econometricians such as Bayesian generalized linear models and Bayesian sample-selection models.

 

Bayesian multilevel models

 
Stata users span many disciplines. In addition to the new Bayesian features above that will be of most interest to econometricians, Stata 17 also adds Bayesian multilevel modeling with support for nonlinear, joint, SEM-like, and even more models. One notable feature is the ability to fit multivariate nonlinear models containing random effects such as multivariate nonlinear growth models.

 

Give me more speed

 
As computer capabilities have grown, so have dataset sizes. With larger datasets and more computationally intensive methods comes the above request, “Give me more speed.” Sometimes, our developers say, “I’m giving her all she’s got, Captain!”, but for Stata 17, they gave us all more.

We achieved speed gains in part through careful algorithm selection and implementation and in part through integration of the Intel Math Kernel Library (MKL) to underpin many of Mata’s linear algebra functions and operators.

Read the details, including tables of specific speed gains.

 

Difference-in-differences (DID) models

 
DID and difference-in-difference-in-differences (DDD) models are appealing to many disciplines, including econometrics, epidemiology, political science, public policy, and many more. If you are studying the effect of a treatment (such as a drug regimen or policy) in observational data and are concerned that the effect may be influenced by time or by some other group effects, DID and DDD models provide intuitive methods to control for such unobserved effects.

 

New meta-analysis features

 
Stata 17 adds the following to the excellent (IMHO) meta-analysis suite introduced in Stata 16:

If you recognize the above, you know you want them!

And we have made these new features just as easy to use as the rest of the meta suite.

 

Interval-censored Cox model

 
The Cox proportional hazards model is used routinely by researchers from many disciplines to analyze right-censored event-time data, where time to an event of interest is observed exactly. In Stata 17, you can use it with interval-censored event-time data too!

With interval-censored event-time data, we only know that the time to an event of interest lies in an interval. For instance, think of time to cancer recurrence or to an infection or, for that matter, to any asymptomatic disease that can be detected only through periodic examinations. These are all interval-censored data, and you now have a new powerful tool to analyze them.

 

New lasso features

 
Stata 17 adds the following to the popular lasso features first introduced in Stata 16:

In fact, treatment-effects lasso combines two popular features: treatment effects and lasso. You can now incorporate many (hundreds, thousands, and more) covariates in your treatment-effects models.

You can account for clustered observations in your lasso analysis. And you can use the BIC criterion to select lasso penalty parameters.

 

Integration with other software and languages

 
Two of Stata’s great features are its extensibility and reproducibility. Stata 17 builds on that tradition by greatly enhancing its interoperability with Python and Java, adding support for Jupyter Notebook, adding JDBC support, and giving you experimental access to the H2O platform.

You could already call Python code from Stata code. Now you can call Stata from any stand-alone Python environment. Do you write Python code in Jupyter Notebook, Spyder IDE, or PyCharm IDE? Now you can call Stata directly from those environments, passing data, metadata, and results between Stata and Python seamlessly. Even your Stata graphs will show up directly in Jupyter Notebook. Our nickname for all the ways you can connect Python and Stata is ‘PyStata‘.

For Java, you could already compile Java code into .jar files and call those as plugins from Stata. Now you can embed Java code directly in your Stata do-files and ado-files, just like Mata code and Python code. Stata will compile and execute your Java code on the fly. You can interchange data, metadata, and results at will.

One of the first steps of any analysis is importing your data. Stata 17 supports JDBC for importing data from and writing data to databases that provide JDBC drivers. An important advantage JDBC has over ODBC is that JDBC drivers are platform independent, so if a database vendor provides a JDBC driver, it will work seamlessly on Windows, Mac, and Linux.

Finally, some of our developers have been experimenting with connecting Stata to H2O, a scalable and distributed open-source machine-learning and predictive analytics platform. We decided to release our experiment to you. Is this something you’d like us to do more with? We look forward to your feedback.

 

New date and time functions

 
Stata 17 adds a plethora of date and time convenience functions in three main areas:

  • Datetime durations, such as ages
  • Relative dates, such as the next birthday relative to a reference date
  • Datetime components, functions that extract various components from datetime values

You’ll undoubtedly find these make your life easier when working with date and time values.

 

New Do-file Editor features

 
Don’t miss the enhancements in the Do-file Editor, including persistent bookmarks that are saved with your code, a Navigation Control providing quick access to those bookmarks and defined programs, syntax highlighting support for Java and XML (in addition to the existing support for Stata ado, Python, and Markdown), and autocompletion of quotes, parentheses, and brackets around a selection.

 

And there’s more

 
Health scientists who deal with ordinal outcomes with an overabundance of values in the lowest category will want to try the new ziologit command.

Those of you interested in panel data and categorical outcomes will be pleased to know that you can now analyze both together easily with the new xtmlogit command.

And for those of you interested in nonparametric tests of trend, we have added three new tests to the existing nptrend command.

Finally, Stata 17 runs fully natively on Apple’s new M1 Macs, known as Apple Silicon. Stata ships as a universal application that has everything necessary to run natively on both M1 Macs and Intel-based Macs for the best performance no matter your choice of hardware platform.

It has been a lot of fun to see this release come together at StataCorp, and it is a tremendous pleasure to be able to release it to you.

Stata/Python integration part 9: Using the Stata Function Interface to copy data from Python to Stata

In my previous post, we learned how to use the Stata Function Interface (SFI) module to copy data from Stata to Python. In this post, I will show you how to use the SFI module to copy data from Python to Stata. We will be using the yfinance module to download financial data from the Yahoo! finance website. You can install this module in your Python environment by typing pip install yfinance. Our goal is to use Python to download historical data for the Dow Jones Industrial Average (DJIA) and use Stata to create the following graph. Read more…

Categories: Programming Tags: , ,

Stata/Python integration part 8: Using the Stata Function Interface to copy data from Stata to Python

In my previous posts, I used the read_stata() method to read Stata datasets into pandas data frames. This works well when you want to read an entire Stata dataset into Python. But sometimes we wish to read a subset of the variables or observations, or both, from a Stata dataset into Python. In this post, I will introduce you to the Stata Function Interface (SFI) module and show you how to use it to read partial datasets into a pandas data frame. Read more…

Categories: Programming Tags: , ,

Stata/Python integration part 7: Machine learning with support vector machines

Machine learning, deep learning, and artificial intelligence are a collection of algorithms used to identify patterns in data. These algorithms have exotic-sounding names like “random forests”, “neural networks”, and “spectral clustering”. In this post, I will show you how to use one of these algorithms called a “support vector machines” (SVM). I don’t have space to explain an SVM in detail, but I will provide some references for further reading at the end. I am going to give you a brief introduction and show you how to implement an SVM with Python.

Our goal is to use an SVM to differentiate between people who are likely to have diabetes and those who are not. We will use age and HbA1c level to differentiate between people with and without diabetes. Age is measured in years, and HbA1c is a blood test that measures glucose control. The graph below displays diabetics with red dots and nondiabetics with blue dots. An SVM model predicts that older people with higher levels of HbA1c in the red-shaded area of the graph are more likely to have diabetes. Younger people with lower HbA1c levels in the blue-shaded area are less likely to have diabetes. Read more…

Stata/Python integration part 6: Working with APIs and JSON data

Data are everywhere. Many government agencies, financial institutions, universities, and social media platforms provide access to their data through an application programming interface (API). APIs often return the requested data in a JavaScript Object Notation (JSON) file. In this post, I will show you how to use Python to request data with API calls and how to work with the resulting JSON data. Read more…

Categories: Programming Tags: , , , ,

Stata/Python integration part 5: Three-dimensional surface plots of marginal predictions

In my first four posts about Stata and Python, I showed you how to set up Stata to use Python, three ways to use Python in Stata, how to install Python packages, and how to use Python packages. It might be helpful to read those posts before you continue with this post if you are not familiar with Python. Now, I’d like to shift our focus to some practical uses of Python within Stata. This post will demonstrate how to use Stata to estimate marginal predictions from a logistic regression model and use Python to create a three-dimensional surface plot of those predictions.

graph1
Read more…

Stata/Python integration part 4: How to use Python packages

In my last post, I showed you how to use pip to install four popular packages for Python. Today I want to show you the basics of how to import and use Python packages. We will learn some important Python concepts and jargon along the way. I will be using the pandas package in the examples below, but the ideas and syntax are the same for other Python packages. Read more…

Categories: Programming Tags: ,

Stata/Python integration part 3: How to install Python packages

In my last post, I showed you three ways to use Python within Stata. The examples were simple but they allowed us to start using Python. At this point, you could write your own Python programs within Stata. But the real power of Python lies in the thousands of freely available packages. Today, I want to show you how to download and install Python packages. Read more…

Categories: Programming Tags: ,

Stata/Python integration part 2: Three ways to use Python in Stata

In my last post, I showed you how to install Python and set up Stata to use Python. Now, we’re ready to use Python. There are three ways to use Python within Stata: calling Python interactively, including Python code in do-files and ado-files, and executing Python script files. Each is useful in different circumstances, so I will demonstrate all three. The examples are intentionally simple and somewhat silly. I’ll show you some more complex examples in future posts, but I want to keep things simple in this post. Read more…

Categories: Programming Tags: ,