Bayesian inference using multiple Markov chains

Overview

Markov chain Monte Carlo (MCMC) is the principal tool for performing Bayesian inference. MCMC is a stochastic procedure that utilizes Markov chains simulated from the posterior distribution of model parameters to compute posterior summaries and make predictions. Given its stochastic nature and dependence on initial values, verifying Markov chain convergence can be difficult—visual inspection of the trace and autocorrelation plots are often used. A more formal method for checking convergence relies on simulating and comparing results from multiple Markov chains; see, for example, Gelman and Rubin (1992) and Gelman et al. (2013). Using multiple chains, rather than a single chain, makes diagnosing convergence easier.

As of Stata 16, bayesmh and its bayes prefix commands support a new option, nchains(), for simulating multiple Markov chains. There is also a new convergence diagnostic command, bayesstats grubin. All Bayesian postestimation commands now support multiple chains. In this blog post, I show you how to check MCMC convergence and improve your Bayesian inference using multiple chains through a series of examples. I also show you how to speed up your sampling by running multiple Markov chains in parallel. Read more…

Adding recession shading to time-series graphs

Introduction

Sometimes, I like to augment a time-series graph with shading that indicates periods of recession. In this post, I will show you a simple way to add recession shading to graphs using data provided by import fred. This post also demostrates how to build a complex graph in Stata, beginning with the basic pieces and finishing with a polished product.

Read more…
Categories: Graphics Tags: ,

Stata Certified Gift Guide 2019

The holidays are fast approaching, and if you’re like most people, you’re still not exactly sure what gift or gifts to get those special people in your life. Enter the Stata Certified Gift Guide. We polled our team and compiled their favorites into the ultimate gift guide for data lovers! Sure, you could go the typical gift card route, but where’s the fun in that?

Power Nap Pillow
$99.00
Sometimes, you just need to close the door and take a power nap.

graph1 Read more…

Stata in the Cloud

As more organizations move their IT, data management, and data analysis needs to the Cloud, I often have to answer these questions:

  1. Can Stata run in the Cloud?
  2. Am I allowed to run my copy of Stata in the Cloud?
  3. What is the best setup for Stata in the Cloud?
  4. How does Stata perform in the Cloud?

Read more…

Using the lasso for inference in high-dimensional models

Why use lasso to do inference about coefficients in high-dimensional models?

High-dimensional models, which have too many potential covariates for the sample size at hand, are increasingly common in applied research. The lasso, discussed in the previous post, can be used to estimate the coefficients of interest in a high-dimensional model. This post discusses commands in Stata 16 that estimate the coefficients of interest in a high-dimensional model. Read more…

An introduction to the lasso in Stata

Why is the lasso interesting?

The least absolute shrinkage and selection operator (lasso) estimates model coefficients and these estimates can be used to select which covariates should be included in a model. The lasso is used for outcome prediction and for inference about causal parameters. In this post, we provide an introduction to the lasso and discuss using the lasso for prediction. In the next post, we discuss using the lasso for inference about causal parameters. Read more…

Fun with frames

I have a confession. I wasn’t excited about the addition of frames to Stata 16. Yes, frames has been one of the most requested features for many years, and our website analytics show that frames is wildly popular. Adding frames was a smart decision and our customers are excited. But I have used Stata for over 20 years, and I have been perfectly happy using one dataset at a time. So I ignored frames.

Then I started working on an example for lasso using genetic data. I simulated patient data along with genetic data for each of 22 chromosomes saved in 22 separate datasets. Working with 23 datasets became cumbersome, so I thought I’d check out frames. I began by reading the manual and then tinkered with my genetic data. Along the way, I discovered a feature of frames that completely blew my mind. I’m going to show you that feature below, and I expect that it will blow your mind as well.

This blog post is not meant to be an introduction to frames. There is a detailed introduction to frames in the Stata 16 manual that will make you an expert. I simply want to show you some of the useful things that you can do with frames, including the following: Read more…

Calculating power using Monte Carlo simulations, part 4: Multilevel/longitudinal models

In my last three posts, I showed you how to calculate power for a t test using Monte Carlo simulations, how to integrate your simulations into Stata’s power command, and how to do this for linear and logistic regression models. In today’s post, I’m going to show you how to estimate power for multilevel/longitudinal models using simulations. You may want to review my earlier post titled “How to simulate multilevel/longitudinal data” before you read this post. Read more…

Calculating power using Monte Carlo simulations, part 3: Linear and logistic regression

In my last two posts, I showed you how to calculate power for a t test using Monte Carlo simulations and how to integrate your simulations into Stata’s power command. In today’s post, I’m going to show you how to do these tasks for linear and logistic regression models. The strategy and overall structure of the programs for linear and logistic regression are similar to the t test examples. The parts that will change are the simulation of the data and the models used to test the null hypothesis. Read more…

Compatibility and reproducibility

30 July 2019

I saw a tweet the other day where someone claimed that StataCorp ensures that the dataset format in Stata X is always different from Stata X-1.

This reminded me of an email I wrote a few years ago to a user who had questions about backward compatibility and reproducibility. I’m going to use large parts of that email in this blog post to share my thoughts on those topics.

I understand the frustration of incompatibilities between software versions. While it may not ease the inevitable difficulties that arise, I would like to explain our efforts in this regard. Read more…