Archive

Archive for the ‘Statistics’ Category

Nonlinear multilevel mixed-effects models

You have a model that is nonlinear in the parameters. Perhaps it is a model of tree growth and therefore asymptotes to a maximum value. Perhaps it is a model of serum concentrations of a drug that rise rapidly to a peak concentration and then decay exponentially. Easy enough, use nonlinear regression ([R] nl) to fit your model. But … what if you have repeated measures for each tree or repeated blood serum levels for each patient? You might want to account for the correlation within tree or patient. You might even believe that each tree has its own asymptotic growth. You need nonlinear mixed-effects models—also called nonlinear hierarchical models or nonlinear multilevel models. Read more…

Bayesian logistic regression with Cauchy priors using the bayes prefix

Introduction

Stata 15 provides a convenient and elegant way of fitting Bayesian regression models by simply prefixing the estimation command with bayes. You can choose from 45 supported estimation commands. All of Stata’s existing Bayesian features are supported by the new bayes prefix. You can use default priors for model parameters or select from many prior distributions. I will demonstrate the use of the bayes prefix for fitting a Bayesian logistic regression model and explore the use of Cauchy priors (available as of the update on July 20, 2017) for regression coefficients. Read more…

Estimating the parameters of DSGE models

Introduction

Dynamic stochastic general equilibrium (DSGE) models are used in macroeconomics to model the joint behavior of aggregate time series like inflation, interest rates, and unemployment. They are used to analyze policy, for example, to answer the question, “What is the effect of a surprise rise in interest rates on inflation and output?” To answer that question we need a model of the relationship among interest rates, inflation, and output. DSGE models are distinguished from other models of multiple time series by their close connection to economic theory. Macroeconomic theories consist of systems of equations that are derived from models of the decisions of households, firms, policymakers, and other agents. These equations form the DSGE model. Because the DSGE model is derived from theory, its parameters can be interpreted directly in terms of the theory.

In this post, I build a small DSGE model that is similar to models used for monetary policy analysis. I show how to estimate the parameters of this model using the new dsge command in Stata 15. I then shock the model with a contraction in monetary policy and graph the response of model variables to the shock. Read more…

Categories: Statistics Tags: ,

Nonparametric regression: Like parametric regression, but not

Initial thoughts

Nonparametric regression is similar to linear regression, Poisson regression, and logit or probit regression; it predicts a mean of an outcome for a set of covariates. If you work with the parametric models mentioned above or other models that predict means, you already understand nonparametric regression and can work with it.

The main difference between parametric and nonparametric models is the assumptions about the functional form of the mean conditional on the covariates. Parametric models assume the mean is a known function of \(\mathbf{x}\beta\). Nonparametric regression makes no assumptions about the functional form.

In practice, this means that nonparametric regression yields consistent estimates of the mean function that are robust to functional form misspecification. But we do not need to stop there. With npregress, introduced in Stata 15, we may obtain estimates of how the mean changes when we change discrete or continuous covariates, and we can use margins to answer other questions about the mean function.

Below I illustrate how to use npregress and how to interpret its results. As you will see, the results are interpreted in the same way you would interpret the results of a parametric model using margins. Read more…

Stata 15 announced, available now

We announced Stata 15 today. It’s a big deal because this is Stata’s biggest release ever.

I posted to Statalist this morning and listed sixteen of the most important new features. Here on the blog I will say more about them, and you can learn even more by visiting our website and seeing the Stata 15 features page.

I go into depth below on the sixteen highlighted features. They are (click to jump)

Read more…

Estimation under omitted confounders, endogeneity, omitted variable bias, and related problems

Initial thoughts

Estimating causal relationships from data is one of the fundamental endeavors of researchers, but causality is elusive. In the presence of omitted confounders, endogeneity, omitted variables, or a misspecified model, estimates of predicted values and effects of interest are inconsistent; causality is obscured.

A controlled experiment to estimate causal relations is an alternative. Yet conducting a controlled experiment may be infeasible. Policy makers cannot randomize taxation, for example. In the absence of experimental data, an option is to use instrumental variables or a control function approach.

Stata has many built-in estimators to implement these potential solutions and tools to construct estimators for situations that are not covered by built-in estimators. Below I illustrate both possibilities for a linear model and, in a later post, will talk about nonlinear models. Read more…

Understanding truncation and censoring

Truncation and censoring are two distinct phenomena that cause our samples to be incomplete. These phenomena arise in medical sciences, engineering, social sciences, and other research fields. If we ignore truncation or censoring when analyzing our data, our estimates of population parameters will be inconsistent.

Truncation or censoring happens during the sampling process. Let’s begin by defining left-truncation and left-censoring:

Our data are left-truncated when individuals below a threshold are not present in the sample. For example, if we want to study the size of certain fish based on the specimens captured with a net, fish smaller than the net grid won’t be present in our sample.

Our data are left-censored at \(\kappa\) if every individual with a value below \(\kappa\) is present in the sample, but the actual value is unknown. This happens, for example, when we have a measuring instrument that cannot detect values below a certain level.

We will focus our discussion on left-truncation and left-censoring, but the concepts we will discuss generalize to all types of censoring and truncation—right, left, and interval.

When performing estimations with truncated or censored data, we need to use tools that account for that type of incomplete data. For truncated linear regression, we can use the truncreg command, and for censored linear regression, we can use the intreg or tobit command.

In this blog post, we will analyze the characteristics of truncated and censored data and discuss using truncreg and tobit to account for the incomplete data. Read more…

Introduction to Bayesian statistics, part 2: MCMC and the Metropolis–Hastings algorithm

In this blog post, I’d like to give you a relatively nontechnical introduction to Markov chain Monte Carlo, often shortened to “MCMC”. MCMC is frequently used for fitting Bayesian statistical models. There are different variations of MCMC, and I’m going to focus on the Metropolis–Hastings (M–H) algorithm. In the interest of brevity, I’m going to omit some details, and I strongly encourage you to read the [BAYES] manual before using MCMC in practice.

Let’s continue with the coin toss example from my previous post Introduction to Bayesian statistics, part 1: The basic concepts. We are interested in the posterior distribution of the parameter \(\theta\), which is the probability that a coin toss results in “heads”. Our prior distribution is a flat, uninformative beta distribution with parameters 1 and 1. And we will use a binomial likelihood function to quantify the data from our experiment, which resulted in 4 heads out of 10 tosses. Read more…

Introduction to Bayesian statistics, part 1: The basic concepts

In this blog post, I’d like to give you a relatively nontechnical introduction to Bayesian statistics. The Bayesian approach to statistics has become increasingly popular, and you can fit Bayesian models using the bayesmh command in Stata. This blog entry will provide a brief introduction to the concepts and jargon of Bayesian statistics and the bayesmh syntax. In my next post, I will introduce the basics of Markov chain Monte Carlo (MCMC) using the Metropolis–Hastings algorithm. Read more…

Long-run restrictions in a structural vector autoregression

\(\def\bfA{{\bf A}}
\def\bfB{{\bf }}
\def\bfC{{\bf C}}\)Introduction

In this blog post, I describe Stata’s capabilities for estimating and analyzing vector autoregression (VAR) models with long-run restrictions by replicating some of the results of Blanchard and Quah (1989). Read more…