Home > Reporting > Customizable tables in Stata 17, part 5: Tables for one regression model

Customizable tables in Stata 17, part 5: Tables for one regression model

In my last post, I showed you how to use the new and improved table command with the command() option to create a table of statistical tests. In this post, I want to show you how to use the command() option to create a table for a single regression model. Our goal is to create the table in the Microsoft Word document below.

graph1

Create the basic table

Let’s begin by typing webuse nhanes2l to open the NHANES dataset, and let’s type describe to examine some of the variables.

. webuse nhanes2l
(Second National Health and Nutrition Examination Survey)

. describe highbp age sex diabetes

Variable      Storage   Display    Value
    name         type    format    label      Variable label
-------------------------------------------------------------------------------
highbp          byte    %8.0g               * High blood pressure
age             byte    %9.0g                 Age (years)
sex             byte    %9.0g      sex        Sex
diabetes        byte    %12.0g     diabetes   Diabetes status

The dataset includes age, sex, an indicator for high blood pressure (highbp), and an indicator for diabetes (diabetes). We would like to fit a logistic regression model for the binary outcome highbp and create a table of the odds ratios, standard errors, z statistics, p-values, and confidence intervals. Note that I have used Stata’s factor-variable notation in the example below to include the main effect of the continuous variable age, the main effect of the categorical variables sex and diabetes, and the interaction of age and sex.

. logistic highbp c.age##i.sex i.diabetes

Logistic regression                                    Number of obs =  10,349
                                                       LR chi2(4)    = 1691.59
                                                       Prob > chi2   =  0.0000
Log likelihood = -6203.8722                            Pseudo R2     =  0.1200

------------------------------------------------------------------------------
      highbp | Odds ratio   Std. err.      z    P>|z|     [95% conf. interval]
-------------+----------------------------------------------------------------
         age |   1.034281   .0018566    18.78   0.000     1.030648    1.037926
             |
         sex |
     Female  |   .1549363   .0223461   -12.93   0.000     .1167849    .2055511
             |
   sex#c.age |
     Female  |   1.028856   .0027958    10.47   0.000     1.023391    1.034351
             |
    diabetes |
   Diabetic  |   1.521011    .154103     4.14   0.000     1.247073    1.855124
       _cons |   .1730928   .0157789   -19.24   0.000      .144772    .2069537
------------------------------------------------------------------------------
Note: _cons estimates baseline odds.

Let’s begin by placing the logistic regression command in the command() option of a table command. There are no row dimensions, and the column dimensions are command and result.

. table () (command result),
>          command(logistic highbp c.age##i.sex i.diabetes)

-----------------------------------------------------------------------
                             |  logistic highbp c.age##i.sex i.diabetes
-----------------------------+-----------------------------------------
Age (years)                  |                                 1.034281
Sex=Male                     |                                        1
Sex=Female                   |                                 .1549363
Sex=Male # Age (years)       |                                        1
Sex=Female # Age (years)     |                                 1.028856
Diabetes status=Not diabetic |                                        1
Diabetes status=Diabetic     |                                 1.521011
Intercept                    |                                 .1730928
-----------------------------------------------------------------------

By default, the table displays the coefficients, which are actually odds ratios because we used the logistic command.

table automatically created a collection named Table, and we can view the dimensions by typing collect dims.

. collect dims

Collection dimensions
Collection: Table
-----------------------------------------
                   Dimension   No. levels
-----------------------------------------
Layout, style, header, label
                      cmdset   1
                       coleq   2
                     colname   13
           colname_remainder   2
                     command   1
                    diabetes   2
               program_class   1
                      result   43
                 result_type   3
                       roweq   1
                     rowname   2
                         sex   2
                     statcmd   1

Style only
                border_block   4
                   cell_type   4
-----------------------------------------

The dimension result has 43 levels. Let’s type collect label list to view the levels and their labels.

. collect label list result, all

  Collection: Table
   Dimension: result
       Label: Result
Level labels:
           N  Number of observations
       N_cdf  Number of completely determined failures
       N_cds  Number of completely determined successes
        _r_b  Coefficient
       _r_ci  __LEVEL__% CI
       _r_df  df
       _r_lb  __LEVEL__% lower bound
        _r_p  p-value
       _r_se  Std. error
       _r_ub  __LEVEL__% upper bound
        _r_z  z
       (output omitted)

We are interested in the levels that begin with _r, so I have omitted much of the output. The levels that begin with _r are the contents of the table of coefficients. For example, the level _r_b contains coefficients (that is, odds ratios), the level _r_se contains the standard errors, and so forth. Note that the confidence interval is stored in a single level, _r_ci, and also in separate levels for the upper and lower bounds, _r_lb and r_ub, respectively.

Let’s add the odds ratio, standard error, and confidence interval to our table by including _r_b _r_se _r_ci to our command() option. We will add the z test and p-value later.

. table () (command result),
>          command(_r_b _r_se _r_ci
>                  : logistic highbp c.age##i.sex i.diabetes)

-------------------------------------------------------------------------------
                             |      logistic highbp c.age##i.sex i.diabetes
                             |  Coefficient   Std. error          95% CI
-----------------------------+-------------------------------------------------
Age (years)                  |     1.034281     .0018566   1.030648    1.037926
Sex=Male                     |            1            0
Sex=Female                   |     .1549363     .0223461   .1167849    .2055511
Sex=Male # Age (years)       |            1            0
Sex=Female # Age (years)     |     1.028856     .0027958   1.023391    1.034351
Diabetes status=Not diabetic |            1            0
Diabetes status=Diabetic     |     1.521011      .154103   1.247073    1.855124
Intercept                    |     .1730928     .0157789    .144772    .2069537
-------------------------------------------------------------------------------

Next let’s customize the display of the numbers in our table. I’ve used nformat() to display the odds ratios, standard errors, and condidence interval with two digits to the right of the decimal. I’ve used sformat() to place square brackets around the confidence interval, and I’ve used cidelimiter() to place a comma between the lower and upper bounds of the confidence interval.

. table () (command result),
>          command(_r_b _r_se _r_ci
>                  : logistic highbp c.age##i.sex i.diabetes)
>          nformat(%5.2f  _r_b _r_se _r_ci )
>          sformat("[%s]"  _r_ci )
>          cidelimiter(,)

---------------------------------------------------------------------------
                             |    logistic highbp c.age##i.sex i.diabetes
                             |    Coefficient    Std. error       95% CI
-----------------------------+---------------------------------------------
Age (years)                  |           1.03          0.00    [1.03, 1.04]
Sex=Male                     |           1.00          0.00
Sex=Female                   |           0.15          0.02    [0.12, 0.21]
Sex=Male # Age (years)       |           1.00          0.00
Sex=Female # Age (years)     |           1.03          0.00    [1.02, 1.03]
Diabetes status=Not diabetic |           1.00          0.00
Diabetes status=Diabetic     |           1.52          0.15    [1.25, 1.86]
Intercept                    |           0.17          0.02    [0.14, 0.21]
---------------------------------------------------------------------------

The column of odds ratios is labeled Coefficient, and we can change it to Odds Ratio using collect label levels.

. collect label levels result _r_b "Odds Ratio", modify

. collect preview

---------------------------------------------------------------------------
                             |    logistic highbp c.age##i.sex i.diabetes
                             |    Odds Ratio     Std. error       95% CI
-----------------------------+---------------------------------------------
Age (years)                  |          1.03           0.00    [1.03, 1.04]
Sex=Male                     |          1.00           0.00
Sex=Female                   |          0.15           0.02    [0.12, 0.21]
Sex=Male # Age (years)       |          1.00           0.00
Sex=Female # Age (years)     |          1.03           0.00    [1.02, 1.03]
Diabetes status=Not diabetic |          1.00           0.00
Diabetes status=Diabetic     |          1.52           0.15    [1.25, 1.86]
Intercept                    |          0.17           0.02    [0.14, 0.21]
---------------------------------------------------------------------------

The dimension command has one level that is labeled with our logistic regression command. We can also modify its label using collect label levels.

. collect label levels command 1
>          "Logistic Regression Model for Hypertension", modify

. collect preview

------------------------------------------------------------------------------
                             |    Logistic Regression Model for Hypertension
                             |     Odds Ratio      Std. error       95% CI
-----------------------------+------------------------------------------------
Age (years)                  |           1.03            0.00     [1.03, 1.04]
Sex=Male                     |           1.00            0.00
Sex=Female                   |           0.15            0.02     [0.12, 0.21]
Sex=Male # Age (years)       |           1.00            0.00
Sex=Female # Age (years)     |           1.03            0.00     [1.02, 1.03]
Diabetes status=Not diabetic |           1.00            0.00
Diabetes status=Diabetic     |           1.52            0.15     [1.25, 1.86]
Intercept                    |           0.17            0.02     [0.14, 0.21]
------------------------------------------------------------------------------

By default, table displays the base level, also known as the ‘referent category’ or ‘referent group’, of factor variables. For example, the row labeled Sex=Male is the base level for the factor variable i.sex. The category Male is used in the denominator of the odds ratio. We can hide the base levels of all factor variables, including interactions, by typing collect style showbase off.

. collect style showbase off

. collect preview

--------------------------------------------------------------------------
                         |    Logistic Regression Model for Hypertension
                         |     Odds Ratio      Std. error       95% CI
-------------------------+------------------------------------------------
Age (years)              |           1.03            0.00     [1.03, 1.04]
Sex=Female               |           0.15            0.02     [0.12, 0.21]
Sex=Female # Age (years) |           1.03            0.00     [1.02, 1.03]
Diabetes status=Diabetic |           1.52            0.15     [1.25, 1.86]
Intercept                |           0.17            0.02     [0.14, 0.21]
--------------------------------------------------------------------------

Next let’s use collect style row to customize the row labels. By default, the variables and categories are displayed side by side with a “binder” character. For example, Sex=Female displays the variable Sex followed by the binder = followed by the category Female. The option stack displays the variable name once and then displays each category below the variable name. The option nobinder removes the binder character, =. Interactions are displayed using the # character and we can use the delimiter(” x “) option to change the interaction delimiter to x.

. collect style row stack, nobinder delimiter(" x ")

. collect preview

-------------------------------------------------------------------
                  |    Logistic Regression Model for Hypertension
                  |     Odds Ratio      Std. error       95% CI
------------------+------------------------------------------------
Age (years)       |           1.03            0.00     [1.03, 1.04]
Sex               |
  Female          |           0.15            0.02     [0.12, 0.21]
Sex x Age (years) |
  Female          |           1.03            0.00     [1.02, 1.03]
Diabetes status   |
  Diabetic        |           1.52            0.15     [1.25, 1.86]
Intercept         |           0.17            0.02     [0.14, 0.21]
-------------------------------------------------------------------

We removed the vertical line from the tables in my previous posts, and we can do the same thing here using collect style cell to remove the right border from the first column.

. collect style cell border_block, border(right, pattern(nil))

. collect preview

-----------------------------------------------------------------
                     Logistic Regression Model for Hypertension
                      Odds Ratio      Std. error       95% CI
-----------------------------------------------------------------
Age (years)                 1.03            0.00     [1.03, 1.04]
Sex
  Female                    0.15            0.02     [0.12, 0.21]
Sex x Age (years)
  Female                    1.03            0.00     [1.02, 1.03]
Diabetes status
  Diabetic                  1.52            0.15     [1.25, 1.86]
Intercept                   0.17            0.02     [0.14, 0.21]
-----------------------------------------------------------------

We could stop here and export our table to a Microsoft Word document. But you may wish to include columns for the z statistic and the p-value in your table. I have added those columns in the code block below using the levels _r_z and _r_p.

table () (command result),                          ///
command(_r_b _r_se _r_z _r_p _r_ci                  ///
: logistic highbp c.age##i.sex i.diabetes)          ///
nformat(%5.2f  _r_b _r_se _r_ci )                   ///
nformat(%5.4f  _r_p)                                ///
sformat("[%s]"  _r_ci )                             ///
cidelimiter(,)

collect label levels result _r_b "Odds Ratio", modify
collect label levels command 1 "Logistic Regression Model for Hypertension", modify
collect style showbase off
collect style row stack, delimiter(" x ") nobinder
collect style cell border_block, border(right, pattern(nil))
. collect preview

----------------------------------------------------------------------------
                           Logistic Regression Model for Hypertension
                   Odds Ratio   Std. error        z   p-value      95% CI
----------------------------------------------------------------------------
Age (years)              1.03         0.00    18.78    0.0000   [1.03, 1.04]
Sex
  Female                 0.15         0.02   -12.93    0.0000   [0.12, 0.21]
Sex x Age (years)
  Female                 1.03         0.00    10.47    0.0000   [1.02, 1.03]
Diabetes status
  Diabetic               1.52         0.15     4.14    0.0000   [1.25, 1.86]
Intercept                0.17         0.02   -19.24    0.0000   [0.14, 0.21]
----------------------------------------------------------------------------

And now we can export our final table to a Microsoft Word document using putdocx.

putdocx clear
putdocx begin
putdocx paragraph, style(Title)
putdocx text ("Hypertension in the United States")
putdocx paragraph, style(Heading1)
putdocx text ("The National Health and Nutrition Examination Survey (NHANES)")
putdocx paragraph
putdocx text ("Hypertension is a major cause of morbidity and mortality in ")
putdocx text ("the United States.  This report will explore the predictors ")
putdocx text ("of hypertension using the NHANES dataset.")
collect style putdocx, layout(autofitcontents)               ///
title("Table 4: Logistic Regression Model for Hypertension Status")
putdocx collect
putdocx save MyTable4.docx, replace

graph1

Conclusion

In this post, we learned how to use the command() option with the table command to create a table from a logistic regression model. The steps would be nearly identical for other regression models such as linear regression or probit regression.

First, specify the column dimensions column and result. Second, select the columns, such as _r_b and _r_ci, then place your regression command in the command() option. Then customize the display of the row and column labels and the numbers as you wish.

I will show you how to use collect to create a table for several regression models in my next post.