Home > Statistics > Nonparametric regression: Like parametric regression, but not

Nonparametric regression: Like parametric regression, but not

Initial thoughts

Nonparametric regression is similar to linear regression, Poisson regression, and logit or probit regression; it predicts a mean of an outcome for a set of covariates. If you work with the parametric models mentioned above or other models that predict means, you already understand nonparametric regression and can work with it.

The main difference between parametric and nonparametric models is the assumptions about the functional form of the mean conditional on the covariates. Parametric models assume the mean is a known function of \(\mathbf{x}\beta\). Nonparametric regression makes no assumptions about the functional form.

In practice, this means that nonparametric regression yields consistent estimates of the mean function that are robust to functional form misspecification. But we do not need to stop there. With npregress, introduced in Stata 15, we may obtain estimates of how the mean changes when we change discrete or continuous covariates, and we can use margins to answer other questions about the mean function.

Below I illustrate how to use npregress and how to interpret its results. As you will see, the results are interpreted in the same way you would interpret the results of a parametric model using margins.

Regression example

To illustrate, I will simulate data where the true model satisfies the linear regression assumptions. I will use a continuous covariate and a discrete covariate. The outcome changes for different values of the discrete covariate as follows:

\begin{equation*}
y = \left\{
\begin{array}{cccccccl}
10 & + & x^3 & & & + &\varepsilon & \text{if} \quad a=0 \\
10 & + & x^3 & – & 10x &+ & \varepsilon & \text{if} \quad a=1 \\
10 & + & x^3 & + & 3x &+ & \varepsilon & \text{if} \quad a=2 \\
\end{array}\right.
\end{equation*}

Here, \(x\) is the continuous covariate and \(a\) is the discrete covariate with values 0, 1, and 2. I generate data using the code below:

clear

set seed 111
set obs 1000

generate x   = rnormal(1,1)
generate a   = int(runiform()*3)
generate e   = rnormal()
generate gx  = 10 + x^3 if a==0
replace  gx  = 10 + x^3 - 10*x if a==1
replace  gx  = 10 + x^3 + 3*x  if a==2
generate  y  = gx + e

Often the mean function is not known to the researchers. If I knew the true functional relationship between \(y\), \(a\), and \(x\), I could use regress to estimate the mean function. For now, I assume I do know the true relationship and estimate the mean function by typing

. regress y c.x#c.x#c.x c.x#i.a

Then I calculate the average of the mean function, the average marginal effect of \(x\), and average treatment effects of \(a\).

The average of the mean function is estimated to be \(12.02\), which I obtained by typing

. margins

Predictive margins                              Number of obs     =      1,000
Model VCE    : OLS

Expression   : Linear prediction, predict()

------------------------------------------------------------------------------
             |            Delta-method
             |     Margin   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
       _cons |   12.02269   .0313691   383.26   0.000     11.96114    12.08425
------------------------------------------------------------------------------

The average marginal effect of of \(x\) is estimated to be \(3.96\), which I obtained by typing

. margins, dydx(x)

Average marginal effects                        Number of obs     =      1,000
Model VCE    : OLS

Expression   : Linear prediction, predict()
dy/dx w.r.t. : x

------------------------------------------------------------------------------
             |            Delta-method
             |      dy/dx   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
           x |   3.957383   .0313871   126.08   0.000      3.89579    4.018975
------------------------------------------------------------------------------

The average treatment effect of \(a=1\), relative to \(a=0\), is estimated to be \(-9.78\). The average treatment effect of \(a=2\), relative to \(a=0\), is estimated to be \(3.02\). I obtained these by typing

. margins, dydx(a)

Average marginal effects                        Number of obs     =      1,000
Model VCE    : OLS

Expression   : Linear prediction, predict()
dy/dx w.r.t. : 1.a 2.a

------------------------------------------------------------------------------
             |            Delta-method
             |      dy/dx   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
           a |
          1  |  -9.776916   .0560362  -174.47   0.000    -9.886879   -9.666953
          2  |   3.019998   .0519195    58.17   0.000     2.918114    3.121883
------------------------------------------------------------------------------
Note: dy/dx for factor levels is the discrete change from the base level.

I now use npregress to estimate the mean function, making no assumptions about the functional form:

. npregress kernel y x i.a, vce(bootstrap, reps(100) seed(111))
(running npregress on estimation sample)

Bootstrap replications (100)
----+--- 1 ---+--- 2 ---+--- 3 ---+--- 4 ---+--- 5
..................................................    50
..................................................   100

Bandwidth
------------------------------------
             |      Mean     Effect
-------------+----------------------
Mean         |
           x |  .3630656   .5455175
           a |  3.05e-06   3.05e-06
------------------------------------

Local-linear regression                    Number of obs      =          1,000
Continuous kernel : epanechnikov           E(Kernel obs)      =            363
Discrete kernel   : liracine               R-squared          =         0.9888
Bandwidth         : cross validation
------------------------------------------------------------------------------
             |   Observed   Bootstrap                          Percentile
           y |   Estimate   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
Mean         |
           y |   12.34335   .3195918    38.62   0.000     11.57571    12.98202
-------------+----------------------------------------------------------------
Effect       |
           x |   3.619627   .2937529    12.32   0.000     3.063269    4.143166
             |
           a |
   (1 vs 0)  |  -9.881542   .3491042   -28.31   0.000     -10.5277   -9.110781
   (2 vs 0)  |   3.168084   .2129506    14.88   0.000      2.73885    3.570004
------------------------------------------------------------------------------
Note: Effect estimates are averages of derivatives for continuous covariates
      and averages of contrasts for factor covariates.

The average of the mean estimate is \(12.34\), the average marginal effect of \(x\) is estimated to be \(3.62\), the average treatment effect of \(a=1\) is estimated to be \(-9.88\), and the average treatment effect of \(a=2\) is estimated to be \(3.17\). All values are reasonably close to the ones I obtained using regress when I assumed I knew the true mean function.

Furthermore, the confidence interval for each estimate includes both the true parameter value I simulated and the regress parameter estimate. This highlights another important point. In general, the confidence intervals I obtain from npregress are wider than those from regress with the correctly specified model. This is not surprising. Nonparametric regression is consistent, but it cannot be more efficient than fitting a correctly specified parametric model.

Using regress and margins and knowing the functional form of the mean is equivalent to using npregress in this example. You get similar point estimates and the results have the same interpretation.

Binary outcome example

Above I presented a result for a continuous outcome. However, the outcome does not need to be continuous. I can estimate a conditional mean, which is the same as the conditional probability, for binary outcomes.

The true model is given by

\begin{equation*}
y = \left\{
\begin{array}{cl}
1 & \text{if} \quad -1 + x – a + \varepsilon > 0\\
0 & \text{otherwise}
\end{array}\right.
\end{equation*}

where

\begin{equation*}
\varepsilon | x, a \sim \mathrm{Logistic} \left(0, \frac{\pi}{\sqrt{3}} \right)
\end{equation*}

And \(a\) again takes on discrete values 0, 1, and 2. The results of estimation using logit would be

. quietly logit y x i.a

. margins

Predictive margins                              Number of obs     =      1,000
Model VCE    : OIM

Expression   : Pr(y), predict()

------------------------------------------------------------------------------
             |            Delta-method
             |     Margin   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
       _cons |       .486      .0137    35.47   0.000     .4591485    .5128515
------------------------------------------------------------------------------

. margins, dydx(*)

Average marginal effects                        Number of obs     =      1,000
Model VCE    : OIM

Expression   : Pr(y), predict()
dy/dx w.r.t. : x 1.a 2.a

------------------------------------------------------------------------------
             |            Delta-method
             |      dy/dx   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
           x |   .1984399   .0117816    16.84   0.000     .1753483    .2215315
             |
           a |
          1  |  -.1581501   .0347885    -4.55   0.000    -.2263344   -.0899658
          2  |   -.363564   .0319078   -11.39   0.000     -.426102   -.3010259
------------------------------------------------------------------------------
Note: dy/dx for factor levels is the discrete change from the base level.

The average of the conditional mean estimate is \(0.486\), which is the same as the average probability of a positive outcome; the marginal effect of \(x\) is estimated to be \(0.198\), the average treatment effects of \(a=1\) is estimated to be \(-0.158\), and the average treatment effects of \(a=2\) is estimated to be \(-0.364\).

Let’s see if npregress can obtain similar results without knowing the functional form is logistic.

. npregress kernel y x i.a, vce(bootstrap, reps(100) seed(111))
(running npregress on estimation sample)

Bootstrap replications (100)
----+--- 1 ---+--- 2 ---+--- 3 ---+--- 4 ---+--- 5
..................................................    50
..................................................   100

Bandwidth
------------------------------------
             |      Mean     Effect
-------------+----------------------
Mean         |
           x |  .4321719   1.410937
           a |        .4         .4
------------------------------------

Local-linear regression                    Number of obs      =          1,000
Continuous kernel : epanechnikov           E(Kernel obs)      =            432
Discrete kernel   : liracine               R-squared          =         0.2545
Bandwidth         : cross validation
------------------------------------------------------------------------------
             |   Observed   Bootstrap                          Percentile
           y |   Estimate   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
Mean         |
           y |   .4840266   .0160701    30.12   0.000     .4507854    .5158817
-------------+----------------------------------------------------------------
Effect       |
           x |   .2032644   .0143028    14.21   0.000     .1795428    .2350924
             |
           a |
   (1 vs 0)  |  -.1745079   .0214352    -8.14   0.000    -.2120486   -.1249168
   (2 vs 0)  |  -.3660315   .0331167   -11.05   0.000    -.4321482    -.300859
------------------------------------------------------------------------------
Note: Effect estimates are averages of derivatives for continuous covariates and
      averages of contrasts for factor covariates.

The conditional mean estimate is \(0.484\), the marginal effect of \(x\) is estimated to be \(0.203\), the average treatment effects of \(a=1\) is estimated to be \(-0.174\), and the average treatment effects of \(a=2\) is estimated to be \(-0.366\). So, yes, it can.

Answering other questions

npregress provides marginal effects and average treatment effect estimates as part of its outcome, yet I can also obtain answers to other relevant questions using margins.

Let’s go back to the regression example.

Say I wanted to see the mean function at different values of the covariate \(x\), averaging over \(a\). I could type:

. margins, at(x=(1(.5)3)) vce(bootstrap, reps(100) seed(111))
(running margins on estimation sample)

Bootstrap replications (100)
----+--- 1 ---+--- 2 ---+--- 3 ---+--- 4 ---+--- 5
..................................................    50
..................................................   100

Predictive margins                              Number of obs     =      1,000
                                                Replications      =        100

Expression   : mean function, predict()

1._at        : x               =           1

2._at        : x               =         1.5

3._at        : x               =           2

4._at        : x               =         2.5

5._at        : x               =           3

------------------------------------------------------------------------------
             |   Observed   Bootstrap                          Percentile
             |     Margin   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         _at |
          1  |   9.309943   .1538459    60.51   0.000     9.044058    9.689572
          2  |   10.96758   .2336364    46.94   0.000     10.53089    11.52332
          3  |   14.78267    .311172    47.51   0.000     14.21305    15.50895
          4  |   21.50949   .3955136    54.38   0.000     20.86696    22.34698
          5  |   32.16382    .529935    60.69   0.000     31.10559    33.25611
------------------------------------------------------------------------------

and then, using marginsplot, I obtain the following graph:

Figure 1: Mean outcome at different values of x
graph1

As \(x\) increases, so does the outcome. The increase is nonlinear. It is much greater for larger values of \(x\) than for smaller ones.

I could instead trace the mean function for different values of \(x\), but now, obtaining the expected mean for each level of \(a\) rather than averaging over \(a\), I type

. margins a, at(x=(-1(1)3)) vce(bootstrap, reps(100) seed(111))

and then use marginsplot to visualize the results:

Figure 2: Mean outcome at different values of x for fixed values of a
graph1

I see that the effect on the mean, as \(x\) increases, differs for different values of \(a\). Because our model has only two covariates, the graph above maps the whole mean function.

I could even ask what the average effect of a 10% increase in \(x\) is. By “average” in this case, I mean giving each observation in the dataset a 10% larger \(x\). Perhaps \(x\) is a rebate and I wonder what would happen if that rebate were increased by 10%. I type

. margins, at(x=generate(x*1.1)) at(x=generate(x)) 
>         contrast(at(r) nowald) vce(bootstrap, reps(100) seed(111))
(running margins on estimation sample)

Bootstrap replications (100)
----+--- 1 ---+--- 2 ---+--- 3 ---+--- 4 ---+--- 5
..................................................    50
..................................................   100

Contrasts of predictive margins

                                                Number of obs     =      1,000
                                                Replications      =        100

Expression   : mean function, predict()

1._at        : x               = x*1.1

2._at        : x               = x

--------------------------------------------------------------
             |   Observed   Bootstrap          Percentile
             |   Contrast   Std. Err.     [95% Conf. Interval]
-------------+------------------------------------------------
         _at |
   (2 vs 1)  |  -1.088438   .0944531      -1.31468    -.915592
--------------------------------------------------------------

I can use margins and npregress together to obtain effects at different points in my data, average effects over my population, or any question that would make sense with a parametric model in Stata.

Closing remarks

npregress estimates a mean function with all types of outcomes—continuous, binary, count outcomes, and more. The interpretation of the results is equivalent to the interpretation, and their usefulness is equivalent to that of margins after fitting a parametric model. What makes npregress special is that we do not need to assume a functional form. With parametric models, our inferences will likely be meaningless if we do not know the true functional form. With npregress, our inferences are valid regardless of the true functional form.

  • Kevin Denny

    This looks very useful but I’m surprised it doesn’t allow one to estimate a semi-parametric model. If you have lots of variables you may just want to treat some non-parametrically: it’s a compromise given that non-p models require large datasets ( curse of dimensionality) . The downloads nonpar and xtnonpar for the partial linear model are quite good.

  • Alvaro Fuentes

    There is a small typo in the margins command before the last marginsplot. It should read at(x=(1(.5)3)) instead of at(x1=(1(.5)3)).

  • Enrique Pinzon

    You are correct. Thank you very much !

  • SLIM HADDAD

    A quesiton please from a non very experienced user: Can estimates be adjusted for longitudinal / clustered data? Options such as vce(cluster var) do not seem to work with npregress. Thanks in advance, S.

  • Shyam Kumar Basnet

    A question on number of observations:
    I am running the “npregress” command with bandwidth specification: cross validation, Discrete kernel: liracine, and continuous kernel: epanechnikov. I have a total observations of 239, but this regression takes into account of only 216 observations. As a result, i will have only 216 predicted values for the dependent variable. Could you please suggest me what could be the reason? or, how can I address this problem? Thanks in advance, Shyam