Archive

Archive for the ‘Statistics’ Category

Using the lasso for inference in high-dimensional models

Why use lasso to do inference about coefficients in high-dimensional models?

High-dimensional models, which have too many potential covariates for the sample size at hand, are increasingly common in applied research. The lasso, discussed in the previous post, can be used to estimate the coefficients of interest in a high-dimensional model. This post discusses commands in Stata 16 that estimate the coefficients of interest in a high-dimensional model. Read more…

An introduction to the lasso in Stata

Why is the lasso interesting?

The least absolute shrinkage and selection operator (lasso) estimates model coefficients and these estimates can be used to select which covariates should be included in a model. The lasso is used for outcome prediction and for inference about causal parameters. In this post, we provide an introduction to the lasso and discuss using the lasso for prediction. In the next post, we discuss using the lasso for inference about causal parameters. Read more…

Fun with frames

I have a confession. I wasn’t excited about the addition of frames to Stata 16. Yes, frames has been one of the most requested features for many years, and our website analytics show that frames is wildly popular. Adding frames was a smart decision and our customers are excited. But I have used Stata for over 20 years, and I have been perfectly happy using one dataset at a time. So I ignored frames.

Then I started working on an example for lasso using genetic data. I simulated patient data along with genetic data for each of 22 chromosomes saved in 22 separate datasets. Working with 23 datasets became cumbersome, so I thought I’d check out frames. I began by reading the manual and then tinkered with my genetic data. Along the way, I discovered a feature of frames that completely blew my mind. I’m going to show you that feature below, and I expect that it will blow your mind as well.

This blog post is not meant to be an introduction to frames. There is a detailed introduction to frames in the Stata 16 manual that will make you an expert. I simply want to show you some of the useful things that you can do with frames, including the following: Read more…

Calculating power using Monte Carlo simulations, part 4: Multilevel/longitudinal models

In my last three posts, I showed you how to calculate power for a t test using Monte Carlo simulations, how to integrate your simulations into Stata’s power command, and how to do this for linear and logistic regression models. In today’s post, I’m going to show you how to estimate power for multilevel/longitudinal models using simulations. You may want to review my earlier post titled “How to simulate multilevel/longitudinal data” before you read this post. Read more…

Calculating power using Monte Carlo simulations, part 3: Linear and logistic regression

In my last two posts, I showed you how to calculate power for a t test using Monte Carlo simulations and how to integrate your simulations into Stata’s power command. In today’s post, I’m going to show you how to do these tasks for linear and logistic regression models. The strategy and overall structure of the programs for linear and logistic regression are similar to the t test examples. The parts that will change are the simulation of the data and the models used to test the null hypothesis. Read more…

Calculating power using Monte Carlo simulations, part 2: Running your simulation using power

In my last post, I showed you how to calculate power for a t test using Monte Carlo simulations. In this post, I will show you how to integrate your simulations into Stata’s power command so that you can easily create custom tables and graphs for a range of parameter values. Read more…

Calculating power using Monte Carlo simulations, part 1: The basics

Power and sample-size calculations are an important part of planning a scientific study. You can use Stata’s power commands to calculate power and sample-size requirements for dozens of commonly used statistical tests. But there are no simple formulas for more complex models such as multilevel/longitudinal models and structural equation models (SEMs). Monte Carlo simulations are one way to calculate power and sample-size requirements for complex models, and Stata provides all the tools you need to do this. You can even integrate your simulations into Stata’s power commands so that you can easily create custom tables and graphs for a range of parameter values. Read more…

Exploring results of nonparametric regression models

In his blog post, Enrique Pinzon discussed how to perform regression when we don’t want to make any assumptions about functional form—use the npregress command. He concluded by asking and answering a few questions about the results using the margins and marginsplot commands.

Recently, I have been thinking about all the different types of questions that we could answer using margins after nonparametric regression, or really after any type of regression. margins and marginsplot are powerful tools for exploring the results of a model and drawing many kinds of inferences. In this post, I will show you how to ask and answer very specific questions and how to explore the entire response surface based on the results of your nonparametric regression.
Read more…

Dynamic stochastic general equilibrium models for policy analysis

What are DSGE models?

Dynamic stochastic general equilibrium (DSGE) models are used by macroeconomists to model multiple time series. A DSGE model is based on economic theory. A theory will have equations for how individuals or sectors in the economy behave and how the sectors interact. What emerges is a system of equations whose parameters can be linked back to the decisions of economic actors. In many economic theories, individuals take actions based partly on the values they expect variables to take in the future, not just on the values those variables take in the current period. The strength of DSGE models is that they incorporate these expectations explicitly, unlike other models of multiple time series.

DSGE models are often used in the analysis of shocks or counterfactuals. A researcher might subject the model economy to an unexpected change in policy or the environment and see how variables respond. For example, what is the effect of an unexpected rise in interest rates on output? Or a researcher might compare the responses of economic variables with different policy regimes. For example, a model might be used to compare outcomes under a high-tax versus a low-tax regime. A researcher would explore the behavior of the model under different settings for tax rate parameters, holding other parameters constant.

In this post, I show you how to estimate the parameters of a DSGE model, how to create and interpret an impulse response, and how to compare the impulse response estimated from the data with an impulse response generated by a counterfactual policy regime. Read more…

Categories: Statistics Tags: ,

Ermistatas and Stata’s new ERMs commands

Ermistatas is our most popular t-shirt these days. See it and you will understand why.

graph1

We call the character Ermistatas and he is thinking—Ermistatas cogitatu. Notice the electricity bolts being emitted and received by his three antennae.

The shirt is popular even among those who do not use Stata and it’s leading them to ask questions. “Who or what is Ermistatas and why is he, she, or it deserving of a t-shirt?”. Then they add, “And why three and not the usual two antennae?”

Ermistatas is the creation of our arts-and-graphics department to represent Stata 15’s new commands for fitting Extended Regression Models—a term we coined. We call it ERMs for short. The new commands are Read more…