## Customizable tables in Stata 17, part 6: Tables for multiple regression models

In my last post, I showed you how to create a table of statistical tests using the **command()** option in the new and improved **table** command. In this post, I will show you how to gather information and create tables using the new **collect** suite of commands. Our goal is to fit three logistic regression models and create the table in the Adobe PDF document below.

__Create the basic table__

Let’s begin by typing **webuse nhanes2l** to open the NHANES dataset and then typing **describe** to examine some of the variables.

. webuse nhanes2l (Second National Health and Nutrition Examination Survey) . describe highbp age sex diabetes Variable Storage Display Value name type format label Variable label ------------------------------------------------------------------------------- highbp byte %8.0g * High blood pressure age byte %9.0g Age (years) sex byte %9.0g sex Sex diabetes byte %12.0g diabetes Diabetes status

The dataset includes **age**, **sex**, an indicator for high blood pressure (**highbp**), and an indicator for diabetes (**diabetes**).

__A new strategy for building tables__

We will fit three logistic regression models for the binary outcome **highbp**. For each model, we will use the **logistic** command to estimate the odds ratios and standard errors. Then we will use **estat ic** to estimate the Akaike’s information criterion (AIC) and Schwarz’s Bayesian information criterion (BIC) for each model. Our final table will include information for three models from six different commands.

Given the relative complexity of our table, we are going to use a new strategy to build it. We will use **collect get** to gather information from each command. Then we will use **collect layout** to define the layout of our table. Let’s do a simple example to illustrate this strategy before we begin the full table.

Let’s type **collect get:** before our first logistic regression command.

. collect get: logistic highbp c.age i.sex Logistic regression Number of obs = 10,351 LR chi2(2) = 1563.54 Prob > chi2 = 0.0000 Log likelihood = -6268.9975 Pseudo R2 = 0.1109 ------------------------------------------------------------------------------ highbp | Odds ratio Std. err. z P>|z| [95% conf. interval] -------------+---------------------------------------------------------------- age | 1.049042 .0013945 36.02 0.000 1.046313 1.051779 | sex | Female | .648767 .0280172 -10.02 0.000 .5961141 .7060706 _cons | .0887874 .0063561 -33.83 0.000 .0771641 .1021615 ------------------------------------------------------------------------------ Note: _cons estimates baseline odds.

Next let’s type **collect layout** to define the layout of a table with row dimension **colname** and column dimension **result**. I have used square brackets to include only the levels **_r_b** and **_r_se** from the dimension **result**. This will add columns for the coefficients and standard errors, respectively.

. collect layout (colname) (result[_r_b _r_se]) Collection: default Rows: colname Columns: result[_r_b _r_se] Table 1: 4 x 2 ------------------------------------ | Coefficient Std. error ------------+----------------------- Age (years) | 1.049042 .0013945 Male | 1 0 Female | .648767 .0280172 Intercept | .0887874 .0063561 ------------------------------------

That was easy! We created a basic table of regression output with two commands. The output tells us that **collect get** created a new collection named **default**.

Let’s repeat this strategy and add some options. Let’s begin by typing **collect clear** to clear any collections from Stata’s memory. Then let’s use **collect create** to create a new collection named **MyModels**.

collect clear collect create MyModels

Next let’s use the **collect get** option **name()** to gather the results from our logistic regression model in our collection named **MyModels**. Note that I have typed **collect** rather than **collect get**. The word “get” is not necessary.

collect, name(MyModels): logistic highbp c.age i.sex

I would also like to specify a new dimension and level for the results of my logistic regression model. I can do this using the **tag()** option. The basic syntax is **tag(***dimension***[***level***])**. The example below stores the results of the logistic regression model to the level **(1)** in the dimension **model**.

collect, name(MyModels) /// tag(model[(1)]) /// : logistic highbp c.age i.sex

The example above stores all the results from the model. But we will only need the coefficients and standard errors. We can specify a list of results to be automatically reported in the table by including those results after **collect**. The example below collects only the coefficients (**_r_b**) and the standard errors (**_r_se**) from the logistic regression model.

collect _r_b _r_se, /// name(MyModels) /// tag(model[(1)]) /// : logistic highbp c.age i.sex

Now we can use **collect layout** to create a table from the results we stored in level **(1)** of dimension **model** in the collection **MyTables**.

. collect layout (colname#result) (model[(1)]), name(MyModels) Collection: MyModels Rows: colname#result Columns: model[(1)] Table 1: 12 x 1 ------------------------ | (1) --------------+--------- Age (years) | Coefficient | 1.049042 Std. error | .0013945 Male | Coefficient | 1 Std. error | 0 Female | Coefficient | .648767 Std. error | .0280172 Intercept | Coefficient | .0887874 Std. error | .0063561 ------------------------

You may be wondering how I selected the row and column dimensions for **collect layout**. I could explain why this particular example worked. But it may not work for your tables. So let’s walk through the steps I used to figure it out.

__Some details about collect layout__

Let’s begin by typing **collect dims** to view a list of the dimensions in our collection.

. collect dims Collection dimensions Collection: MyModels ----------------------------------------- Dimension No. levels ----------------------------------------- Layout, style, header, label cmdset 1 coleq 1 colname 8 colname_remainder 1 model 1 program_class 1 result 43 result_type 3 rowname 1 sex 2 Style only border_block 4 cell_type 4 -----------------------------------------

The dimension **result** catches my eye because of the name and because it has 43 levels. We can view a list of the levels by typing **collect levelsof result**.

. collect levelsof result Collection: MyModels Dimension: result Levels: N N_cdf N_cds _r_b _r_ci _r_df _r_lb _r_p _r_se _r_ub _r_z chi2 chi2type cmd cmdline converged depvar df_m estat_cmd ic k k_dv k_eq k_eq_model ll ll_0 marginsnotok marginsok ml_method mns opt p predict properties r2_p rank rc rules technique title user vce which

The dimension **result** contains estimates of the coefficients, standard errors, and many other statistical results from our model. Let’s use **collect layout** to create a table for the dimension **result**.

. collect layout (result), name(MyModels) Collection: MyModels Rows: result Your layout specification does not uniquely match any items. Dimension colname might help uniquely match items.

That didn’t work. But the output suggests that including the dimension **colname** might help. The dimension named **colname** has eight levels and we can view a list of the levels by typing **collect levelsof colname**.

. collect levelsof colname Collection: MyModels Dimension: colname Levels: age 1.sex 2.sex c1 c2 c3 c4 _cons

The dimension **colname** includes the variable names, including factor variables, from our logistic regression model. It also contains levels named **c1**, **c2**, **c3**, and **c4**. Let’s add the row dimension **colname** and see what happens.

. collect layout (colname) (result), name(MyModels) Collection: MyModels Rows: colname Columns: result Table 1: 4 x 2 ------------------------------------ | Coefficient Std. error ------------+----------------------- Age (years) | 1.049042 .0013945 Male | 1 0 Female | .648767 .0280172 Intercept | .0887874 .0063561

That workedâ€”we have a table! But the table raises an important question. The dimension **result** has 43 levels, and the dimension **colname** included levels like **c1**. Why aren’t all of those levels displayed in the table?

The answer is that **collect layout** only includes cells where there is a value associated with each level of both the row and the column dimensions. Recall that we requested that only the coefficients (**_r_b**) and standard errors (**_r_se**) from our model be displayed. And those coefficients and standard errors were only collected for the levels **age**, **1.sex**, **2.sex**, and **_cons** for the dimension **colname**.

Once we understand this concept, we can explore other layouts for our table. For example, we could stack the coefficients and standard errors under each variable in our model.

. collect layout (colname#result) (), name(MyModels) Collection: MyModels Rows: colname#result Table 1: 12 x 1 ------------------------ Age (years) | Coefficient | 1.049042 Std. error | .0013945 Male | Coefficient | 1 Std. error | 0 Female | Coefficient | .648767 Std. error | .0280172 Intercept | Coefficient | .0887874 Std. error | .0063561 ------------------------

We will eventually create a similar column of results for each of these three models. Recall that we created the dimension **model** with **collect, tag()**. Let’s view the levels of the dimension **model** by typing **collect levelsof model**.

. collect levelsof model Collection: MyModels Dimension: model Levels: (1)

For now, the dimension **model** has one level named **(1)**, and we can specify **model** as our column dimension.

. collect layout (colname#result) (model[(1)]), name(MyModels) Collection: MyModels Rows: colname#result Columns: model[(1)] Table 1: 12 x 1 ------------------------ | (1) --------------+--------- Age (years) | Coefficient | 1.049042 Std. error | .0013945 Male | Coefficient | 1 Std. error | 0 Female | Coefficient | .648767 Std. error | .0280172 Intercept | Coefficient | .0887874 Std. error | .0063561 ------------------------

This approach to building tables in steps can be helpful if you are unsure how to begin. Start by typing **collect dims** to view the dimensions in the collection. Then use **collect levelsof** to view the levels of each dimension. Then experiment with **collect layout** to design your table. The output of **collect layout** will often provide helpful instructions.

__Collecting results from multiple commands__

Recall that we would also like to include the AIC and BIC for each model in our table, and we can estimate them by typing **estat ic** after we fit the model.

. estat ic Akaike's information criterion and Bayesian information criterion ----------------------------------------------------------------------------- Model | N ll(null) ll(model) df AIC BIC -------------+--------------------------------------------------------------- . | 10,351 -7050.765 -6268.998 3 12544 12565.73 ----------------------------------------------------------------------------- Note: BIC uses N = number of observations. See [R] BIC note.

The estimates of the AIC and BIC are stored in a matrix named **r(S)**.

. return list matrices: r(S) : 1 x 6

And we can view the matrix by typing **matlist r(S)**.

. matlist r(S) | N ll0 ll df AIC BIC -------------+------------------------------------------------------------------ . | 10351 -7050.765 -6268.998 3 12544 12565.73

We can refer to the AIC and BIC in the matrix **r(S)** using matrix subscripting. The general syntax to refer to an element in a matrix is *matname***[***row***,***column***]**. Using this syntax, we can refer to the BIC as **r(S)[1,6]**. Column 6 is named **BIC**, so we can also refer to the BIC as **r(S)[1,”BIC”]**.

. display r(S)[1,"BIC"] 12565.73

Let’s **collect** the AIC and BIC and store them in level **(1)** of dimension **model** in our **MyModels** collection.

collect AIC=r(S)[1,"AIC"] /// BIC=r(S)[1,"BIC"], /// name(MyModels) /// tag(model[(1)]) /// : estat ic

Then we can include them in our table by adding **result[AIC BIC]** to the row dimension of **collect layout**.

. collect layout (colname#result result[AIC BIC]) (model[(1)]), name(MyModels) Collection: MyModels Rows: colname#result result[AIC BIC] Columns: model[(1)] Table 1: 14 x 1 ------------------------ | (1) --------------+--------- Age (years) | Coefficient | 1.049042 Std. error | .0013945 Male | Coefficient | 1 Std. error | 0 Female | Coefficient | .648767 Std. error | .0280172 Intercept | Coefficient | .0887874 Std. error | .0063561 AIC | 12544 BIC | 12565.73 ------------------------

Notice that I did not include the “#” operator when I added the row dimension **result[AIC BIC]**. This is because AIC and BIC are not nested within each level of the dimension **colname**. I simply wanted to add rows for AIC and BIC at the bottom of the table.

__Adding more models to the table__

Let’s add a second model to our table. Notice that the commands below are nearly identical to the commands I used above. There are only two differences. First, I have used factor-variable notation to add the interaction of **age** and **sex** to the logistic regression model. And second, I am storing the results to level **(2)** of the dimension **model**.

collect _r_b _r_se, /// name(MyModels) /// tag(model[(2)]) /// : logistic highbp c.age##i.sex collect AIC=r(S)[1,"AIC"] /// BIC=r(S)[1,"BIC"], /// name(MyModels) /// tag(model[(2)]) /// : estat ic

We can use **collect layout** to make sure that it worked.

. collect layout (colname#result result[AIC BIC]) (model), name(MyModels) Collection: MyModels Rows: colname#result result[AIC BIC] Columns: model Table 1: 20 x 2 ---------------------------------------- | (1) (2) ---------------------+------------------ Age (years) | Coefficient | 1.049042 1.035184 Std. error | .0013945 .0018459 Male | Coefficient | 1 1 Std. error | 0 0 Female | Coefficient | .648767 .1556985 Std. error | .0280172 .0224504 Male # Age (years) | Coefficient | 1 Std. error | 0 Female # Age (years) | Coefficient | 1.028811 Std. error | .002794 Intercept | Coefficient | .0887874 .1690035 Std. error | .0063561 .0153794 AIC | 12544 12434.34 BIC | 12565.73 12463.32 ----------------------------------------

That worked, so let’s add a third model to our table. Let’s add the variable **diabetes** to our second model. And we will store the results to level **(3)** of the dimension **model**.

collect _r_b _r_se, /// name(MyModels) /// tag(model[(3)]) /// : logistic highbp c.age##i.sex i.diabetes collect AIC=r(S)[1,"AIC"] /// BIC=r(S)[1,"BIC"], /// name(MyModels) /// tag(model[(3)]) /// : estat ic

Let’s use **collect layout** again to make sure that it worked.

. collect layout (colname#result result[AIC BIC]) (model), name(MyModels) Collection: MyModels Rows: colname#result result[AIC BIC] Columns: model Table 1: 26 x 3 ------------------------------------------------- | (1) (2) (3) ---------------------+--------------------------- Age (years) | Coefficient | 1.049042 1.035184 1.034281 Std. error | .0013945 .0018459 .0018566 Male | Coefficient | 1 1 1 Std. error | 0 0 0 Female | Coefficient | .648767 .1556985 .1549363 Std. error | .0280172 .0224504 .0223461 Male # Age (years) | Coefficient | 1 1 Std. error | 0 0 Female # Age (years) | Coefficient | 1.028811 1.028856 Std. error | .002794 .0027958 Not diabetic | Coefficient | 1 Std. error | 0 Diabetic | Coefficient | 1.521011 Std. error | .154103 Intercept | Coefficient | .0887874 .1690035 .1730928 Std. error | .0063561 .0153794 .0157789 AIC | 12544 12434.34 12417.74 BIC | 12565.73 12463.32 12453.97 -------------------------------------------------

Now we have the basic layout of our table. All we need to do now is customize the layout and export it to an Adobe PDF document.

__Use collect style to format the table__

I will use **collect style showbase**, **collect style row**, **collect style cell**, and **collect style header** to customize the layout of our table. The commands in the code block below are the same commands I used in previous posts, so I won’t explain each step here. But I have included comments to refresh our memory.

// TURN OFF BASE LEVELS FOR FACTOR VARIABLES collect style showbase off // CHANGE THE INTERACTION DELIMITER collect style row stack, spacer delimiter(" x ") // REMOVE THE VERTICAL LINE collect style cell border_block, border(right, pattern(nil)) // FORMAT THE NUMBERS collect style cell, nformat(%5.2f) collect style cell result[AIC BIC], nformat(%8.0f) // PUT PARENTHESES AROUND THE STANDARD ERRORS collect style cell result[_r_se], sformat("(%s)") // LABEL AIC AND BIC collect style header result[AIC BIC], level(label)

Let’s type **collect preview** to check our work so far.

. collect preview ----------------------------------------- (1) (2) (3) ----------------------------------------- Age (years) Coefficient 1.05 1.04 1.03 Std. error (0.00) (0.00) (0.00) Female Coefficient 0.65 0.16 0.15 Std. error (0.03) (0.02) (0.02) Female x Age (years) Coefficient 1.03 1.03 Std. error (0.00) (0.00) Diabetic Coefficient 1.52 Std. error (0.15) Intercept Coefficient 0.09 0.17 0.17 Std. error (0.01) (0.02) (0.02) AIC 12544 12434 12418 BIC 12566 12463 12454 -----------------------------------------

Next I will use some options that are unique to this table. First, I will use the **collect style cell** option **halign()** to center the items and column headers in the table.

. collect style cell cell_type[item column-header], halign(center) . collect preview ----------------------------------------- (1) (2) (3) ----------------------------------------- Age (years) Coefficient 1.05 1.04 1.03 Std. error (0.00) (0.00) (0.00) Female Coefficient 0.65 0.16 0.15 Std. error (0.03) (0.02) (0.02) Female x Age (years) Coefficient 1.03 1.03 Std. error (0.00) (0.00) Diabetic Coefficient 1.52 Std. error (0.15) Intercept Coefficient 0.09 0.17 0.17 Std. error (0.01) (0.02) (0.02) AIC 12544 12434 12418 BIC 12566 12463 12454 -----------------------------------------

Then I will use the **collect style header** option **level()** to hide the labels for the row dimension **result**.

. collect style header result, level(hide) . collect preview ----------------------------------------- (1) (2) (3) ----------------------------------------- Age (years) 1.05 1.04 1.03 (0.00) (0.00) (0.00) Female 0.65 0.16 0.15 (0.03) (0.02) (0.02) Female x Age (years) 1.03 1.03 (0.00) (0.00) Diabetic 1.52 (0.15) Intercept 0.09 0.17 0.17 (0.01) (0.02) (0.02) AIC 12544 12434 12418 BIC 12566 12463 12454 -----------------------------------------

And finally, I will use the **collect style column** option **extraspace** to add an extra space between the columns. I think this makes it easier to read the table.

. collect style column, extraspace(1) . collect preview ---------------------------------------------- (1) (2) (3) ---------------------------------------------- Age (years) 1.05 1.04 1.03 (0.00) (0.00) (0.00) Female 0.65 0.16 0.15 (0.03) (0.02) (0.02) Female x Age (years) 1.03 1.03 (0.00) (0.00) Diabetic 1.52 (0.15) Intercept 0.09 0.17 0.17 (0.01) (0.02) (0.02) AIC 12544 12434 12418 BIC 12566 12463 12454 ----------------------------------------------

We did it! We collected the results from our models and customized the layout.

__Export the table to an Adobe PDF document__

I showed you how to export your tables to a Microsoft Word document in my previous posts. Let’s try something new and export our table to an Adobe PDF document. Most of the **putpdf** commands are identical to their corresponding **putdocx** commands, with the obvious exception that they begin with **putpdf** rather than **putdocx**. But there are a few important differences.

First, I have replaced **putdocx paragraph, style()** with **putpdf paragraph, font() halign()**. The first instance sets the font to a 26-point Calibri Light font and centers the text horizonally on the page. The second instance sets the font to a 14-point Calibri Light font and begins the text on the left of the page. The third instance does not specify a **font()** or **halign()** option, so the default 11-point Helvetica font is used.

Second, I have replaced the **collect style putdocx** option **layout(autofitcontents)** with the **collect style putpdf** options **width()** and **indent()**. The **width(60%)** option sets the width of the table to 60% of the full width of the page. The **indent(1 in)** option indents the table one inch from the left side of the page.

And third, I have used the **note()** option with **collect style putpdf** to add a note to the table to tell the reader that the table displays odds ratios with standard errors in parentheses.

putpdf clear putpdf begin putpdf paragraph, font("Calibri Light",26) halign(center) putpdf text ("Hypertension in the United States") putpdf paragraph, font("Calibri Light",14) halign(left) putpdf text ("The National Health and Nutrition Examination Survey (NHANES)") putpdf paragraph putpdf text ("Hypertension is a major cause of morbidity and mortality in ") putpdf text ("the United States. This report will explore the predictors ") putpdf text ("of hypertension using the NHANES dataset.") collect style putpdf, width(60%) indent(1 in) /// title("Table 3: Logistic Regression Models for Hypertension Status") /// note("Note: Odds ratio (standard error)") putpdf collect putpdf save MyTable3.pdf, replace

The resulting Adobe PDF document looks like the image below.

__Conclusion__

In this post, we learned a new strategy to create tables using only the **collect** suite of commands. We used **collect get** to collect results from Stata commands, and we used **collect layout** to specify the layout of our table. We learned how to name our collections and store the results from commands to specific levels of dimensions.

You have probably noticed that we have used the same set of **collect style** commands in these blog posts. We could continue to copy and paste them into our future do-files, but there is an easier way to reuse **collect style** commands. In my next post, I will show you how to use **collect style save** and **collect style use** to save styles and reuse them with other tables. And I will show you how to use **collect label save** and **collect label use** to save labels for the levels of dimensions.