Teach statistical concepts with Stata


My first statistics course primarily consisted of plugging numbers into formulas. I did not leave that course with any real idea of how statistics differed from basic algebra. The next course I took is what put it all together, and I’ve loved statistics ever since. In that course, we examined the relationships between a population and its samples. I learned that using just a small amount of data, statistics enables us to make inferences about the entire population. Now that is cool! Read more…

A new update to StataNow has just been released


We just released more new features in StataNow. When you update your copy of StataNow, you will have access to the following: Read more…

Essential tools for data quality checks

Before we fit statistical models with our datasets, we typically go through a few checks to confirm that our data are accurate and complete. Regardless of whether you have obtained data from an organization or built the dataset yourself, it is worthwhile to check for data entry errors. Below, we will show you four essential Stata commands for performing quality checks on your data: duplicates, isid, assert, and misstable.

Read more…

A new update to StataNow has just been released

An update to StataNow is now available. You can update your copy of StataNow to access the latest features, including the following: Read more…

A new update to StataNow has just been released


We just released another update to StataNow. Make sure your copy of StataNow is up to date so that you have the latest features, including the following: Read more…

Heterogeneous treatment-effect estimation with S-, T-, and X-learners using H2OML

Motivation

In an era of large-scale experimentation and rich observational data, the one-size-fits-all paradigm is giving way to individualized decision-making. Whether targeting messages to voters, assigning medical treatments to patients, or recommending products to consumers, practitioners increasingly seek to tailor interventions based on individual characteristics. This shift hinges on understanding how treatment effects vary across individuals, not just whether interventions work on average, but for whom they work best. Read more…

Stata commands to run ChatGPT, Claude, Gemini, and Grok

I wrote a blog post in 2023 titled A Stata command to run ChatGPT, and it remains popular. Unfortunately, OpenAI has changed the API code, and the chatgpt command in that post no longer runs. In this post, I will show you how to update the API code and how to write similar Stata commands that use Claude, Gemini, and Grok like this: Read more…

A new update to StataNow has just been released

A new update to StataNow has just been released. With new statistical features and interface improvements, there is something for everyone. We are excited to share the new features with you. Read more…

Looking ahead to the 2025 Stata Conference: A celebration of data, discovery, and 40 years of Stata

graph1

Excitement is building for the 2025 Stata Conference, where researchers, analysts, and data scientists from around the world will come together to share ideas, showcase their work, and explore the frontiers of statistical analysis using Stata.

Set to take place in Nashville, TN, on 31 July–1 August, this year’s conference will once again highlight the creativity and rigor that define the Stata user community. From clever programming tips to pioneering research, the event promises a wide range of presentations that reflect the diversity of disciplines and applications where Stata is making an impact. Read more…

Prediction intervals with gradient boosting machine

Introduction
Machine learning methods, such as ensemble decision trees, are widely used to predict outcomes based on data. However, these methods often focus on providing point predictions, which limits their ability to quantify prediction uncertainty. In many applications, such as healthcare and finance, the goal is not only to predict accurately but also to assess the reliability of those predictions. Prediction intervals, which provide lower and upper bounds such that the true response lies within them with high probability, are a reliable tool for quantifying prediction accuracy. An ideal prediction interval should meet several criteria: it should offer valid coverage (defined below) without relying on strong distributional assumptions, be informative by being as narrow as possible for each observation, and be adaptive—provide wider intervals for observations that are “difficult” to predict and narrower intervals for “easy” ones. Read more…