## Customizable tables in Stata 17, part 5: Tables for one regression model

In my last post, I showed you how to use the new and improved **table** command with the **command()** option to create a table of statistical tests. In this post, I want to show you how to use the **command()** option to create a table for a single regression model. Our goal is to create the table in the Microsoft Word document below.

__Create the basic table__

Let’s begin by typing **webuse nhanes2l** to open the NHANES dataset, and let’s type **describe** to examine some of the variables.

. webuse nhanes2l (Second National Health and Nutrition Examination Survey) . describe highbp age sex diabetes Variable Storage Display Value name type format label Variable label ------------------------------------------------------------------------------- highbp byte %8.0g * High blood pressure age byte %9.0g Age (years) sex byte %9.0g sex Sex diabetes byte %12.0g diabetes Diabetes status

The dataset includes **age**, **sex**, an indicator for high blood pressure (**highbp**), and an indicator for diabetes (**diabetes**). We would like to fit a logistic regression model for the binary outcome **highbp** and create a table of the odds ratios, standard errors, *z* statistics, *p*-values, and confidence intervals. Note that I have used Stata’s factor-variable notation in the example below to include the main effect of the continuous variable **age**, the main effect of the categorical variables **sex** and **diabetes**, and the interaction of **age** and **sex**.

. logistic highbp c.age##i.sex i.diabetes Logistic regression Number of obs = 10,349 LR chi2(4) = 1691.59 Prob > chi2 = 0.0000 Log likelihood = -6203.8722 Pseudo R2 = 0.1200 ------------------------------------------------------------------------------ highbp | Odds ratio Std. err. z P>|z| [95% conf. interval] -------------+---------------------------------------------------------------- age | 1.034281 .0018566 18.78 0.000 1.030648 1.037926 | sex | Female | .1549363 .0223461 -12.93 0.000 .1167849 .2055511 | sex#c.age | Female | 1.028856 .0027958 10.47 0.000 1.023391 1.034351 | diabetes | Diabetic | 1.521011 .154103 4.14 0.000 1.247073 1.855124 _cons | .1730928 .0157789 -19.24 0.000 .144772 .2069537 ------------------------------------------------------------------------------ Note: _cons estimates baseline odds.

Let’s begin by placing the logistic regression command in the **command()** option of a **table** command. There are no row dimensions, and the column dimensions are **command** and **result**.

. table () (command result), > command(logistic highbp c.age##i.sex i.diabetes) ----------------------------------------------------------------------- | logistic highbp c.age##i.sex i.diabetes -----------------------------+----------------------------------------- Age (years) | 1.034281 Sex=Male | 1 Sex=Female | .1549363 Sex=Male # Age (years) | 1 Sex=Female # Age (years) | 1.028856 Diabetes status=Not diabetic | 1 Diabetes status=Diabetic | 1.521011 Intercept | .1730928 -----------------------------------------------------------------------

By default, the table displays the coefficients, which are actually odds ratios because we used the **logistic** command.

**table** automatically created a collection named Table, and we can view the dimensions by typing **collect dims**.

. collect dims Collection dimensions Collection: Table ----------------------------------------- Dimension No. levels ----------------------------------------- Layout, style, header, label cmdset 1 coleq 2 colname 13 colname_remainder 2 command 1 diabetes 2 program_class 1 result 43 result_type 3 roweq 1 rowname 2 sex 2 statcmd 1 Style only border_block 4 cell_type 4 -----------------------------------------

The dimension **result** has 43 levels. Let’s type **collect label list** to view the levels and their labels.

. collect label list result, all Collection: Table Dimension: result Label: Result Level labels: N Number of observations N_cdf Number of completely determined failures N_cds Number of completely determined successes _r_b Coefficient _r_ci __LEVEL__% CI _r_df df _r_lb __LEVEL__% lower bound _r_p p-value _r_se Std. error _r_ub __LEVEL__% upper bound _r_z z (output omitted)

We are interested in the levels that begin with **_r**, so I have omitted much of the output. The levels that begin with **_r** are the contents of the table of coefficients. For example, the level **_r_b** contains coefficients (that is, odds ratios), the level **_r_se** contains the standard errors, and so forth. Note that the confidence interval is stored in a single level, **_r_ci**, and also in separate levels for the upper and lower bounds, **_r_lb** and **r_ub**, respectively.

Let’s add the odds ratio, standard error, and confidence interval to our table by including **_r_b _r_se _r_ci** to our **command()** option. We will add the *z* test and *p*-value later.

. table () (command result), > command(_r_b _r_se _r_ci > : logistic highbp c.age##i.sex i.diabetes) ------------------------------------------------------------------------------- | logistic highbp c.age##i.sex i.diabetes | Coefficient Std. error 95% CI -----------------------------+------------------------------------------------- Age (years) | 1.034281 .0018566 1.030648 1.037926 Sex=Male | 1 0 Sex=Female | .1549363 .0223461 .1167849 .2055511 Sex=Male # Age (years) | 1 0 Sex=Female # Age (years) | 1.028856 .0027958 1.023391 1.034351 Diabetes status=Not diabetic | 1 0 Diabetes status=Diabetic | 1.521011 .154103 1.247073 1.855124 Intercept | .1730928 .0157789 .144772 .2069537 -------------------------------------------------------------------------------

Next let’s customize the display of the numbers in our table. I’ve used **nformat()** to display the odds ratios, standard errors, and condidence interval with two digits to the right of the decimal. I’ve used **sformat()** to place square brackets around the confidence interval, and I’ve used **cidelimiter()** to place a comma between the lower and upper bounds of the confidence interval.

. table () (command result), > command(_r_b _r_se _r_ci > : logistic highbp c.age##i.sex i.diabetes) > nformat(%5.2f _r_b _r_se _r_ci ) > sformat("[%s]" _r_ci ) > cidelimiter(,) --------------------------------------------------------------------------- | logistic highbp c.age##i.sex i.diabetes | Coefficient Std. error 95% CI -----------------------------+--------------------------------------------- Age (years) | 1.03 0.00 [1.03, 1.04] Sex=Male | 1.00 0.00 Sex=Female | 0.15 0.02 [0.12, 0.21] Sex=Male # Age (years) | 1.00 0.00 Sex=Female # Age (years) | 1.03 0.00 [1.02, 1.03] Diabetes status=Not diabetic | 1.00 0.00 Diabetes status=Diabetic | 1.52 0.15 [1.25, 1.86] Intercept | 0.17 0.02 [0.14, 0.21] ---------------------------------------------------------------------------

The column of odds ratios is labeled **Coefficient**, and we can change it to **Odds Ratio** using **collect label levels**.

. collect label levels result _r_b "Odds Ratio", modify . collect preview --------------------------------------------------------------------------- | logistic highbp c.age##i.sex i.diabetes | Odds Ratio Std. error 95% CI -----------------------------+--------------------------------------------- Age (years) | 1.03 0.00 [1.03, 1.04] Sex=Male | 1.00 0.00 Sex=Female | 0.15 0.02 [0.12, 0.21] Sex=Male # Age (years) | 1.00 0.00 Sex=Female # Age (years) | 1.03 0.00 [1.02, 1.03] Diabetes status=Not diabetic | 1.00 0.00 Diabetes status=Diabetic | 1.52 0.15 [1.25, 1.86] Intercept | 0.17 0.02 [0.14, 0.21] ---------------------------------------------------------------------------

The dimension **command** has one level that is labeled with our logistic regression command. We can also modify its label using **collect label levels**.

. collect label levels command 1 > "Logistic Regression Model for Hypertension", modify . collect preview ------------------------------------------------------------------------------ | Logistic Regression Model for Hypertension | Odds Ratio Std. error 95% CI -----------------------------+------------------------------------------------ Age (years) | 1.03 0.00 [1.03, 1.04] Sex=Male | 1.00 0.00 Sex=Female | 0.15 0.02 [0.12, 0.21] Sex=Male # Age (years) | 1.00 0.00 Sex=Female # Age (years) | 1.03 0.00 [1.02, 1.03] Diabetes status=Not diabetic | 1.00 0.00 Diabetes status=Diabetic | 1.52 0.15 [1.25, 1.86] Intercept | 0.17 0.02 [0.14, 0.21] ------------------------------------------------------------------------------

By default, **table** displays the base level, also known as the ‘referent category’ or ‘referent group’, of factor variables. For example, the row labeled **Sex=Male** is the base level for the factor variable **i.sex**. The category **Male** is used in the denominator of the odds ratio. We can hide the base levels of all factor variables, including interactions, by typing **collect style showbase off**.

. collect style showbase off . collect preview -------------------------------------------------------------------------- | Logistic Regression Model for Hypertension | Odds Ratio Std. error 95% CI -------------------------+------------------------------------------------ Age (years) | 1.03 0.00 [1.03, 1.04] Sex=Female | 0.15 0.02 [0.12, 0.21] Sex=Female # Age (years) | 1.03 0.00 [1.02, 1.03] Diabetes status=Diabetic | 1.52 0.15 [1.25, 1.86] Intercept | 0.17 0.02 [0.14, 0.21] --------------------------------------------------------------------------

Next let’s use **collect style row** to customize the row labels. By default, the variables and categories are displayed side by side with a “binder” character. For example, **Sex=Female** displays the variable **Sex** followed by the binder **=** followed by the category **Female**. The option **stack** displays the variable name once and then displays each category below the variable name. The option **nobinder** removes the binder character, **=**. Interactions are displayed using the **#** character and we can use the **delimiter(” x “)** option to change the interaction delimiter to **x**.

. collect style row stack, nobinder delimiter(" x ") . collect preview ------------------------------------------------------------------- | Logistic Regression Model for Hypertension | Odds Ratio Std. error 95% CI ------------------+------------------------------------------------ Age (years) | 1.03 0.00 [1.03, 1.04] Sex | Female | 0.15 0.02 [0.12, 0.21] Sex x Age (years) | Female | 1.03 0.00 [1.02, 1.03] Diabetes status | Diabetic | 1.52 0.15 [1.25, 1.86] Intercept | 0.17 0.02 [0.14, 0.21] -------------------------------------------------------------------

We removed the vertical line from the tables in my previous posts, and we can do the same thing here using **collect style cell** to remove the right border from the first column.

. collect style cell border_block, border(right, pattern(nil)) . collect preview ----------------------------------------------------------------- Logistic Regression Model for Hypertension Odds Ratio Std. error 95% CI ----------------------------------------------------------------- Age (years) 1.03 0.00 [1.03, 1.04] Sex Female 0.15 0.02 [0.12, 0.21] Sex x Age (years) Female 1.03 0.00 [1.02, 1.03] Diabetes status Diabetic 1.52 0.15 [1.25, 1.86] Intercept 0.17 0.02 [0.14, 0.21] -----------------------------------------------------------------

We could stop here and export our table to a Microsoft Word document. But you may wish to include columns for the *z* statistic and the *p*-value in your table. I have added those columns in the code block below using the levels **_r_z** and **_r_p**.

table () (command result), /// command(_r_b _r_se _r_z _r_p _r_ci /// : logistic highbp c.age##i.sex i.diabetes) /// nformat(%5.2f _r_b _r_se _r_ci ) /// nformat(%5.4f _r_p) /// sformat("[%s]" _r_ci ) /// cidelimiter(,) collect label levels result _r_b "Odds Ratio", modify collect label levels command 1 "Logistic Regression Model for Hypertension", modify collect style showbase off collect style row stack, delimiter(" x ") nobinder collect style cell border_block, border(right, pattern(nil))

. collect preview ---------------------------------------------------------------------------- Logistic Regression Model for Hypertension Odds Ratio Std. error z p-value 95% CI ---------------------------------------------------------------------------- Age (years) 1.03 0.00 18.78 0.0000 [1.03, 1.04] Sex Female 0.15 0.02 -12.93 0.0000 [0.12, 0.21] Sex x Age (years) Female 1.03 0.00 10.47 0.0000 [1.02, 1.03] Diabetes status Diabetic 1.52 0.15 4.14 0.0000 [1.25, 1.86] Intercept 0.17 0.02 -19.24 0.0000 [0.14, 0.21] ----------------------------------------------------------------------------

And now we can export our final table to a Microsoft Word document using **putdocx**.

putdocx clear putdocx begin putdocx paragraph, style(Title) putdocx text ("Hypertension in the United States") putdocx paragraph, style(Heading1) putdocx text ("The National Health and Nutrition Examination Survey (NHANES)") putdocx paragraph putdocx text ("Hypertension is a major cause of morbidity and mortality in ") putdocx text ("the United States. This report will explore the predictors ") putdocx text ("of hypertension using the NHANES dataset.") collect style putdocx, layout(autofitcontents) /// title("Table 4: Logistic Regression Model for Hypertension Status") putdocx collect putdocx save MyTable4.docx, replace

__Conclusion__

In this post, we learned how to use the **command()** option with the **table** command to create a table from a logistic regression model. The steps would be nearly identical for other regression models such as linear regression or probit regression.

First, specify the column dimensions **column** and **result**. Second, select the columns, such as **_r_b** and **_r_ci**, then place your regression command in the **command()** option. Then customize the display of the row and column labels and the numbers as you wish.

I will show you how to use **collect** to create a table for several regression models in my next post.