The Stata Blog

Prediction intervals with gradient boosting machine

20 May 2025 Aramayis Dallakyan, Senior Statistician and Software Developer

Introduction
Machine learning methods, such as ensemble decision trees, are widely used to predict outcomes based on data. However, these methods often focus on providing point predictions, which limits their ability to quantify prediction uncertainty. In many applications, such as healthcare and finance, the goal is not only to predict accurately but also to assess the reliability of those predictions. Prediction intervals, which provide lower and upper bounds such that the true response lies within them with high probability, are a reliable tool for quantifying prediction accuracy. An ideal prediction interval should meet several criteria: it should offer valid coverage (defined below) without relying on strong distributional assumptions, be informative by being as narrow as possible for each observation, and be adaptive—provide wider intervals for observations that are “difficult” to predict and narrower intervals for “easy” ones. Read more…

Categories: Statistics Tags: conformal intervals, GBM, machine learning, prediction intervals, quantile regression

Approximate statistical tests for comparing binary classifier error rates using H2OML

22 April 2025 Houssein Assaad, Associate Director, Statistics and Aramayis Dallakyan, Senior Statistician and Software Developer

Motivation

You have just trained a gradient boosting machine (GBM) and a random forest (RF) classifier on your data using Stata’s new h2oml command suite. Your GBM model achieves 87% accuracy on the testing data, and your RF model, 85%. It looks as if GBM is the preferred classifier, right? Not so fast.

Why accuracy alone isn’t enough

Accuracy, area under the curve, and root mean squared error are popular metrics, but they provide only point estimates. These numbers reflect how well a model performed on one specific testing sample, but they don’t account for the variability that can arise from sample to sample. In other words, they don’t answer this key question: Will the difference in performance between these methods hold at the population level, or could it have occurred by chance only in this particular testing dataset? Read more…

Categories: Statistics Tags: ensemble trees classification, H2O, machine learning, statistical tests

Stata 19 is released!

8 April 2025 Alan Riley, President

I am excited to let you be the second to know that Stata 19 is now available. Statalist is always the first to know!

Highlights include

And more. Visit stata.com/new-in-stata for all the details. You can also visit stata.com/help.cgi?whatsnew18to19 for the nitty gritty on every single change from Stata 18 to Stata 19.

Those of you with StataNow already received some of these features along the way in updates to StataNow. And, those of you with StataNow are eligible for an automatic upgrade to StataNow 19. Watch your inbox for an email from us with instructions on how to request your upgrade.

Categories: New Products Tags: Bayesian, biostatistics, CATE, cox, CRE, data science, econometrics, H2O, HDFE, machine learning, meta-analysis, mundlak, new release

Announcing StataNow

30 April 2024 Alan Riley, President

One of the most exciting times for us at StataCorp (and hopefully for you as well) is when we get to announce a new version of Stata, full of new features. Now, we hope to experience that feeling with you much more often.

Historically, we have released a new major version of Stata roughly every two years. We will still continue to do that, but most users will now have access to StataNow – a continuous-release Stata. StataNow gives you access to new features now, as soon as they are ready from the development, testing, and documentation groups. The features in StataNow are some of the same features that will also eventually appear in the next major release of Stata. StataNow users will get additional features on a continuous basis throughout the lifetime of a release.

You can read more about StataNow, including how to get it, and you can see its initial set of additional features. But let me tell you a little more about it here.

Many of you create features in Stata that you share with others via your own sites, the SSC archive, and the Stata Journal. And all of you write your own do-files as you perform your analyses in Stata. Knowing this, let me share with you a few technical details about StataNow.

First, StataNow is Stata. To be exact, the current Stata that most of you have is Stata 18.0. StataNow is Stata 18.5 (which we will call StataNow 18.5 from now on). When you are using StataNow, you should start your programs and do-files with version 18.5, just as you previously started them with version 18.0. Why is the version number different? Because StataNow is newer than Stata 18.0, and it is possible something in it will need to be version-controlled differently than in Stata 18. This is no different than when a new release comes out and it has a different version, 16.0, 17.0, 18.0, etc. As always, StataNow is backward compatible, so any programs, do-files, datasets, and so on from earlier versions will work, without changes, in StataNow.

What if we need to version-control something simultaneously in both Stata and StataNow? We would then release Stata 18.1 and StataNow 18.6.

The documentation and help files for Stata 18.0 and StataNow 18.5 are the same. StataNow features are included in them and clearly marked as such.

The dataset format in StataNow is the same as in Stata.

What are the new features in StataNow, and how often will we add features to StataNow? See the current set of new features. There is no set schedule for releasing new features, but we anticipate new features will be released fairly often – several times a year. We will release no new feature before its time, which means that anything released in StataNow is fully official, tested, validated, certified, and documented, just as all the features we put out in a new release of Stata.

When Stata 19 eventually comes out, it will of course include all the features that have come out along the way in StataNow as well as some additional new ones. Users of StataNow will automatically be able to upgrade to Stata 19 — actually, they will upgrade to StataNow 19.5 when Stata 19.0 comes out, and over time StataNow 19.5 will get additional features as soon as they are ready from the Stata elves.

We are excited to be able to give you the new features we add to Stata on a continuous basis, getting them into your hands sooner!

Categories: New Products, Stata Products Tags: biostatistics, econometrics, Mata, new release, Python, StataNow, statistics, time series

New FAQs about customizable tables are here!

13 March 2024 Mia Lv, Senior Statistician

The new table features introduced in Stata 17 and Stata 18 have made it easy to create and customize tables of descriptive statistics, regression results, and more. These powerful features became popular among our users very soon after they were introduced, and we often get questions from users who want to know how to accomplish specific customizations for their tables. To provide our community with more learning resources, we carefully selected some of the questions that we answered frequently, and turned them into a series of example-enriched FAQs: Read more…

Categories: Reporting Tags: collections, customizable tables, FAQs, reporting

From datasets to framesets and alias variables: Data management advances in Stata

12 September 2023 Kreshna Gopal, Principal Computer Scientist and Software Developer

The aim of this blog is to describe two novel features introduced in Stata 18 (released in 2023): 1) framesets and 2) alias variables across frames. These features enable Stata to deal with a multiplicity of potentially very large datasets efficiently and conveniently. Framesets allow you to bundle, save on file, and load in memory a set of related frames that hold datasets. Alias variables allow you to access variables in other frames as if they were part of the current frame, with very little memory overhead. Read more…

Categories: Data Management Tags: alias variables, frames, framesets

StataCorp’s Author Support Program—Publish with confidence

17 August 2023 Stacey Ksionda, Marketing Specialist

Are you writing a book featuring Stata programs or output? We’re here to help! We know you want your book to be modern and accurate in all aspects, including any portions that discuss and demonstrate Stata. That’s why we created the Author Support Program—a program that gives you direct access to Stata experts who will review all the Stata-related content in your book to make sure it is accurate, up to date, and reflective of best practices. Read more…

Categories: New Books, Resources, Support Tags: accuracy, asp, Author Support Program, authors, edit, editors, graphics, publish, resources, review, support

A Stata command to run ChatGPT

25 July 2023 Chuck Huber, Director of Statistical Outreach

Artificial intelligence (AI) is a popular topic in the media these days, and ChatGPT is, perhaps, the most well-known AI tool. I recently tweeted that I had written a Stata command called chatgpt for myself that runs ChatGPT. I promised to explain how I did it, so here is the explanation. Read more…

Categories: Programming Tags: ado, artificial intelligence, chatgpt, programming, PyStata, Python, stata

Creating tables of descriptive statistics in Stata 18: The new dtable command

26 June 2023 Mia Lv, Senior Statistician

In Stata 17, we introduced the new collect suite of commands for creating and customizing tables and the etable command for easily creating and exporting a table of estimation results. Stata 18 offers another new command, dtable, that easily builds and exports a table of descriptive statistics, often called Table 1 in publications. Now generating tables of descriptive statistics for both categorical and continuous variables is easier than ever. It is worth mentioning that the twin commands etable and dtable are both built on the collect framework we introduced in Stata 17, so they share a lot of properties.

In this post, I’ll demonstrate how to create and export simple tables of descriptive statistics and more complex ones that display statistics by group, test for differences across groups, and more. I will also show how you can use the collect suite of commands to further customize the look of your tables and how to include tables created with dtable in complete reports.

Stata 18 released

25 April 2023 Alan Riley, President

Stata 18 is available now. Visit stata.com/new-in-stata to read all about it.

Major new features include

And there is more.

We are excited about the new features and can’t wait for you to try them out!

Categories: New Products Tags: Bayesian, biostatistics, cox, data science, DID, econometrics, frames, lasso, meta-analysis, new release, RERI, Stata 18, statistics

Prediction intervals with gradient boosting machine

Approximate statistical tests for comparing binary classifier error rates using H2OML

Stata 19 is released!

Announcing StataNow

New FAQs about customizable tables are here!

From datasets to framesets and alias variables: Data management advances in Stata

StataCorp’s Author Support Program—Publish with confidence

A Stata command to run ChatGPT

Creating tables of descriptive statistics in Stata 18: The new dtable command

Stata 18 released

Subscribe to the Stata Blog

Recent articles

Archives

Categories

Links