I have a confession. I wasn’t excited about the addition of **frames** to Stata 16. Yes, **frames** has been one of the most requested features for many years, and our website analytics show that **frames** is wildly popular. Adding **frames** was a smart decision and our customers are excited. But I have used Stata for over 20 years, and I have been perfectly happy using one dataset at a time. So I ignored **frames**.

Then I started working on an example for lasso using genetic data. I simulated patient data along with genetic data for each of 22 chromosomes saved in 22 separate datasets. Working with 23 datasets became cumbersome, so I thought I’d check out **frames**. I began by reading the manual and then tinkered with my genetic data. Along the way, I discovered a feature of **frames** that completely blew my mind. I’m going to show you that feature below, and I expect that it will blow your mind as well.

This blog post is not meant to be an introduction to **frames**. There is a detailed introduction to frames in the Stata 16 manual that will make you an expert. I simply want to show you some of the useful things that you can do with **frames**, including the following: Read more…

In my last three posts, I showed you how to calculate power for a *t* test using Monte Carlo simulations, how to integrate your simulations into Stata’s **power** command, and how to do this for linear and logistic regression models. In today’s post, I’m going to show you how to estimate power for multilevel/longitudinal models using simulations. You may want to review my earlier post titled “How to simulate multilevel/longitudinal data” before you read this post. Read more…

In my last two posts, I showed you how to calculate power for a *t* test using Monte Carlo simulations and how to integrate your simulations into Stata’s **power** command. In today’s post, I’m going to show you how to do these tasks for linear and logistic regression models. The strategy and overall structure of the programs for linear and logistic regression are similar to the *t* test examples. The parts that will change are the simulation of the data and the models used to test the null hypothesis. Read more…

In my last post, I showed you how to calculate power for a *t* test using Monte Carlo simulations. In this post, I will show you how to integrate your simulations into Stata’s **power** command so that you can easily create custom tables and graphs for a range of parameter values. Read more…

Power and sample-size calculations are an important part of planning a scientific study. You can use Stata’s **power** commands to calculate power and sample-size requirements for dozens of commonly used statistical tests. But there are no simple formulas for more complex models such as multilevel/longitudinal models and structural equation models (SEMs). Monte Carlo simulations are one way to calculate power and sample-size requirements for complex models, and Stata provides all the tools you need to do this. You can even integrate your simulations into Stata’s **power** commands so that you can easily create custom tables and graphs for a range of parameter values. Read more…

Data management and data cleaning are critically important steps in any data analysis. Many of us learned this lesson the hard way. Have you ever fit a model that includes age as a covariate and forgotten to convert the missing value codes of -99 to missing values? I have. Or maybe you overlooked a data entry error that resulted in an age of 354 that should have been 54. I’ve done that too. Careful data management and cleaning can help us avoid these kinds of embarrassing mistakes.

I recently recorded a series of data management videos for the Stata Youtube Channel. You can click on the links below to watch the videos. I included topics that I think are important, but the list is far from exhaustive. If you would like to see videos on additional topics, please leave your suggestion in the comments below.

**Data management playlist**

You can learn more about these topics and many others in the Data Management Reference Manual.

In my last post, I demonstrated how to use **putexcel** to recreate common Stata output in Microsoft Excel. Today I want to show you how to create custom reports for arbitrary variables. I am going to create tables that combine cell counts with row percentages, and means with standard deviations. But you could modify the examples below to include column percentages, percentiles, standard errors, confidence intervals or any statistic. I am also going to pass the variable names into my programs using local macros. This will allow me to create the same report for arbitrary variables by simply assigning new variable names to the macros. You could extend this idea by creating a do-file for each report and passing the variable names into the do-files. This is another important step toward our goal of automating the creation of reports in Excel.

Today’s blog post is Read more…

In my last post, I showed how to use **putexcel** to write simple expressions to Microsoft Excel and format the resulting text and cells. Today, I want to show you how to write more complex expressions such as macros, graphs, and matrices. I will even show you how to write formulas to Excel to create calculated cells. These are important steps toward our goal of automating the creation of reports in Excel.

Before we begin the examples, Read more…

For a long time, I have wanted to type a Stata command like this,

. ExcelTable race, cont(age height weight) cat(sex diabetes)
The Excel table table.xlsx was created successfully

and get an Excel table that looks like this:

So I wrote a program called **ExcelTable** for my own use Read more…

In this blog post, I’d like to give you a relatively nontechnical introduction to Markov chain Monte Carlo, often shortened to “MCMC”. MCMC is frequently used for fitting Bayesian statistical models. There are different variations of MCMC, and I’m going to focus on the Metropolis–Hastings (M–H) algorithm. In the interest of brevity, I’m going to omit some details, and I strongly encourage you to read the [BAYES] manual before using MCMC in practice.

Let’s continue with the coin toss example from my previous post Introduction to Bayesian statistics, part 1: The basic concepts. We are interested in the posterior distribution of the parameter \(\theta\), which is the probability that a coin toss results in “heads”. Our prior distribution is a flat, uninformative beta distribution with parameters 1 and 1. And we will use a binomial likelihood function to quantify the data from our experiment, which resulted in 4 heads out of 10 tosses. Read more…