Home > Reporting > Customizable tables in Stata 17, part 4: Table of statistical tests

Customizable tables in Stata 17, part 4: Table of statistical tests

In my last post, I showed you how to use the new and improved table command with the statistic() option to create a classic table 1. In this post, I want to show you how to use the command() option to create a table of statistical tests. Our goal is to create the table in the Microsoft Word document below.

graph1

Create the basic table

Let’s begin by typing webuse nhanes2l to open the NHANES dataset, and let’s type describe to examine some of the variables.

. webuse nhanes2l
(Second National Health and Nutrition Examination Survey)

. describe highbp age hgb hct iron albumin vitaminc zinc copper lead     
>          height weight bmi bpsystol bpdiast tcresult tgresult hdresult

Variable      Storage   Display    Value
    name         type    format    label      Variable label
-------------------------------------------------------------------------------
highbp          byte    %8.0g               * High blood pressure
age             byte    %9.0g                 Age (years)
hgb             float   %9.0g                 Hemoglobin (g/dL)
hct             float   %9.0g                 Hematocrit (%)
iron            int     %9.0g                 Serum iron (mcg/dL)
albumin         float   %9.0g                 Serum albumin (g/dL)
vitaminc        float   %9.0g                 Serum vitamin C (mg/dL)
zinc            int     %9.0g                 Serum zinc (mcg/dL)
copper          int     %9.0g                 Serum copper (mcg/dL)
lead            byte    %9.0g                 Lead (mcg/dL)
height          float   %9.0g                 Height (cm)
weight          float   %9.0g                 Weight (kg)
bmi             float   %9.0g                 Body mass index (BMI)
bpsystol        int     %9.0g                 Systolic blood pressure
bpdiast         int     %9.0g                 Diastolic blood pressure
tcresult        int     %9.0g                 Serum cholesterol (mg/dL)
tgresult        int     %9.0g                 Serum triglycerides (mg/dL)
hdresult        int     %9.0g                 High density lipids (mg/dL)

The dataset includes an indicator for high blood pressure (highbp), age, and many lab measurements. We would like to test the null hypothesis that the average age and lab measurments are the same in the groups with and without, hypertension. We can do this with Stata’s ttest command. Let’s use ttest to test the null hypothesis that the average age is the same in the two groups.

. ttest age, by(highbp)

Two-sample t test with equal variances
------------------------------------------------------------------------------
   Group |     Obs        Mean    Std. err.   Std. dev.   [95% conf. interval]
---------+--------------------------------------------------------------------
       0 |   5,975    42.16502    .2169725    16.77157    41.73968    42.59037
       1 |   4,376    54.97281    .2253767    14.90897    54.53095    55.41466
---------+--------------------------------------------------------------------
Combined |  10,351    47.57965    .1692044    17.21483    47.24798    47.91133
---------+--------------------------------------------------------------------
    diff |           -12.80779    .3185604               -13.43223   -12.18335
------------------------------------------------------------------------------
    diff = mean(0) - mean(1)                                      t = -40.2052
H0: diff = 0                                     Degrees of freedom =    10349

    Ha: diff < 0                 Ha: diff != 0                 Ha: diff > 0
 Pr(T < t) = 0.0000         Pr(|T| > |t|) = 0.0000          Pr(T > t) = 1.0000

The output displays many statistics, including the two-sided p-value for the t test. Some of these statistics are temporarily left in memory and we can view them by typing return list.

. return list

scalars:
              r(level) =  95
                 r(sd) =  17.21482923023818
               r(sd_2) =  14.9089715191102
               r(sd_1) =  16.77156676799842
                 r(se) =  .3185603831285
                r(p_u) =  1
                r(p_l) =  0
                  r(p) =  0
                  r(t) =  -40.20520433030012
               r(df_t) =  10349
               r(mu_2) =  54.97280621572212
                r(N_2) =  4376
               r(mu_1) =  42.16502092050209
                r(N_1) =  5975

We can use the command() option in table to run a command, such as ttest, and put the results in a table. The example below shows the basic syntax for creating a table of results from ttest. The row dimension is command, the column dimension is result, and we place our ttest command in the command() option.

. table (command) (result),                       
>       command(ttest age, by(highbp))
(output omitted)

I omitted the output because table tried to include all the results in the table and the output doesn’t fit on the screen. Let’s be more specific in command() about the statistics we wish to include in our table. Let’s add a column named Normotensive for the mean age of people without hypertension, which is stored in the scalar r(mu_1) by ttest. We can also add a column named Hypertensive for the mean age of people with hypertension, which is stored in the scalar r(mu_2), a column named Diff for the difference between the group means, and a column named pvalue that displays the p-value stored in r(p).

. table (command) (result),                       
>       command(Normotensive = r(mu_1)            
>               Hypertensive = r(mu_2)            
>               Diff         = (r(mu_2)-r(mu_1))  
>               pvalue       =  r(p)              
>               : ttest age, by(highbp))

------------------------------------------------------------------------
                      |  Normotensive   Hypertensive       Diff   pvalue
----------------------+-------------------------------------------------
ttest age, by(highbp) |      42.16502       54.97281   12.80779        0
------------------------------------------------------------------------

The output displays the ttest command in the first column, followed by the means, difference, and p-value that we specified in our table command. I would like to replace the command name in the table with the label Age (years). Recall that dimensions have levels and levels can have labels. Let’s type collect label list command, all to view the levels and labels of the dimension command.

. collect label list command, all

  Collection: Table
   Dimension: command
       Label: Command option index
Level labels:
           1  ttest age, by(highbp)

The output tells us that the dimension command has one level, 1, which is labeled ttest age, by(highbp). We can change the label for level 1 using collect label levels. Then we can type collect preview to view our updated table.

. collect label levels command 1 "Age (years)", modify

. collect preview

--------------------------------------------------------------
            |  Normotensive   Hypertensive       Diff   pvalue
------------+-------------------------------------------------
Age (years) |      42.16502       54.97281   12.80779        0
--------------------------------------------------------------

Next we can use collect style cell to change the number of decimals displayed in each cell and remove the right border from the first column.

. collect style cell result[Normotensive Hypertensive Diff], 
>         nformat(%6.1f)

. collect style cell result[pvalue], nformat(%6.4f)

. collect style cell border_block, border(right, pattern(nil))

. collect preview

--------------------------------------------------------
             Normotensive   Hypertensive   Diff   pvalue
--------------------------------------------------------
Age (years)          42.2           55.0   12.8   0.0000
--------------------------------------------------------

We did it! Our table looks good, but we have only included a test for one variable. Let’s add more.

Create a larger table

The obvious way to add more tests to our table would be to add more command() options. That would work as we can see below.

. table (command) (result),                                   
>       command(Normotensive = r(mu_1) Hypertensive = r(mu_2) 
>               Diff = (r(mu_2)-r(mu_1)) pvalue = r(p)        
>               : ttest age, by(highbp))                      
>       command(Normotensive = r(mu_1) Hypertensive = r(mu_2) 
>               Diff = (r(mu_2)-r(mu_1)) pvalue = r(p)        
>               : ttest tcresult, by(highbp))                 
>       command(Normotensive = r(mu_1) Hypertensive = r(mu_2) 
>               Diff = (r(mu_2)-r(mu_1)) pvalue = r(p)        
>               : ttest tgresult, by(highbp))                 
>       command(Normotensive = r(mu_1) Hypertensive = r(mu_2) 
>               Diff = (r(mu_2)-r(mu_1)) pvalue = r(p)        
>               : ttest hdresult, by(highbp))                 
>       nformat(%6.2f  Normotensive Hypertensive Diff)        
>       nformat(%6.4f  pvalue)

--------------------------------------------------------------------------
                           |  Normotensive   Hypertensive    Diff   pvalue
---------------------------+----------------------------------------------
ttest age, by(highbp)      |         42.17          54.97   12.81   0.0000
ttest tcresult, by(highbp) |        208.73         229.88   21.15   0.0000
ttest tgresult, by(highbp) |        129.23         166.04   36.81   0.0000
ttest hdresult, by(highbp) |         49.94          49.22   -0.73   0.0195
--------------------------------------------------------------------------

But our table command is growing faster than our table. Fortunately, we can use a little programming trick to make our task easier and our code neater. Notice that most of the code in our command() options is identical. We can store that code in a local macro and use the local macro in our command() options. I have stored the commands in a local macro named myresults.

. local myresults "Normotensive = r(mu_1) Hypertensive = r(mu_2) Diff = (r(mu_2)-r(mu_1)) pvalue = r(p)"

. display "`myresults'"
Normotensive = r(mu_1) Hypertensive = r(mu_2)  Diff = (r(mu_2)-r(mu_1))  pvalue = r(p)

Now we can replace the lengthy column definitions with the local macro `myresults’ in our command() options.

. table (command) (result),                                   
>       command(`myresults' : ttest age,      by(highbp))     
>       command(`myresults' : ttest tcresult, by(highbp))     
>       command(`myresults' : ttest tgresult, by(highbp))     
>       command(`myresults' : ttest hdresult, by(highbp))     
>       nformat(%6.2f  Normotensive Hypertensive Diff)        
>       nformat(%6.0f  pvalue)

--------------------------------------------------------------------------
                           |  Normotensive   Hypertensive    Diff   pvalue
---------------------------+----------------------------------------------
ttest age, by(highbp)      |         42.17          54.97   12.81        0
ttest tcresult, by(highbp) |        208.73         229.88   21.15        0
ttest tgresult, by(highbp) |        129.23         166.04   36.81        0
ttest hdresult, by(highbp) |         49.94          49.22   -0.73        0
--------------------------------------------------------------------------

Let’s add the rest of the lab variables to our table using our clever new trick.

. table (command) (result),                                
>       command(`myresults' : ttest age,      by(highbp))  
>       command(`myresults' : ttest hgb,      by(highbp))  
>       command(`myresults' : ttest hct,      by(highbp))  
>       command(`myresults' : ttest iron,     by(highbp))  
>       command(`myresults' : ttest albumin,  by(highbp))  
>       command(`myresults' : ttest vitaminc, by(highbp))  
>       command(`myresults' : ttest zinc,     by(highbp))  
>       command(`myresults' : ttest copper,   by(highbp))  
>       command(`myresults' : ttest lead,     by(highbp))  
>       command(`myresults' : ttest height,   by(highbp))  
>       command(`myresults' : ttest weight,   by(highbp))  
>       command(`myresults' : ttest bmi,      by(highbp))  
>       command(`myresults' : ttest bpsystol, by(highbp))  
>       command(`myresults' : ttest bpdiast,  by(highbp))  
>       command(`myresults' : ttest tcresult, by(highbp))  
>       command(`myresults' : ttest tgresult, by(highbp))  
>       command(`myresults' : ttest hdresult, by(highbp))
--------------------------------------------------------------------------------
                           |  Normotensive   Hypertensive        Diff     pvalue
---------------------------+----------------------------------------------------
ttest age, by(highbp)      |      42.16502       54.97281    12.80779          0
ttest height, by(highbp)   |      167.7243       167.5506   -.1736495   .3661002
ttest weight, by(highbp)   |      68.26626       76.85565    8.589386   9.1e-181
ttest bmi, by(highbp)      |      24.20231       27.36081    3.158506   4.5e-241
ttest bpsystol, by(highbp) |       116.485       150.5388    34.05383          0
ttest bpdiast, by(highbp)  |      74.17222       92.01394    17.84172          0
ttest tcresult, by(highbp) |      208.7272       229.8798     21.1526   4.3e-105
ttest tgresult, by(highbp) |      129.2284       166.0427     36.8143   7.01e-41
ttest hdresult, by(highbp) |      49.94449       49.21784   -.7266526   .0194611
ttest hgb, by(highbp)      |      14.14038       14.42436    .2839752   4.99e-25
ttest hct, by(highbp)      |      41.65235       42.44271    .7903588   2.16e-27
ttest iron, by(highbp)     |       101.842       96.17436   -5.667648   5.70e-17
ttest albumin, by(highbp)  |      4.680295       4.654088   -.0262068   .0000896
ttest vitaminc, by(highbp) |      1.048238       1.016469   -.0317686   .0070212
ttest zinc, by(highbp)     |      87.06462       85.74782   -1.316802   .0000162
ttest copper, by(highbp)   |      125.0756       126.3356    1.259952   .0673572
ttest lead, by(highbp)     |      13.87513       14.93369    1.058555   2.36e-09
--------------------------------------------------------------------------------

Next we can change the labels of the levels of the dimension command. Let’s begin by listing the labels for each level.

. collect label list command, all

  Collection: Table
   Dimension: command
       Label: Command option index
Level labels:
           1  ttest age, by(highbp)
          10  ttest height, by(highbp)
          11  ttest weight, by(highbp)
          12  ttest bmi, by(highbp)
          13  ttest bpsystol, by(highbp)
          14  ttest bpdiast, by(highbp)
          15  ttest tcresult, by(highbp)
          16  ttest tgresult, by(highbp)
          17  ttest hdresult, by(highbp)
           2  ttest hgb, by(highbp)
           3  ttest hct, by(highbp)
           4  ttest iron, by(highbp)
           5  ttest albumin, by(highbp)
           6  ttest vitaminc, by(highbp)
           7  ttest zinc, by(highbp)
           8  ttest copper, by(highbp)
           9  ttest lead, by(highbp)

Notice that the levels are sorted as strings rather than numbers. This is because levels can be strings or numbers. We can see the variable names associated with each level, and we can relabel them using collect label levels.

. collect label levels command 1  "Age (years)"                 
>                              10 "Height (cm)"                 
>                              11 "Weight (kg)"                 
>                              12 "Body Mass Index"             
>                              13 "Systolic Blood Pressure"     
>                              14 "Diastolic Blood Pressure"    
>                              15 "Serum cholesterol (mg/dL)"   
>                              16 "Serum triglycerides (mg/dL)" 
>                              17 "High density lipids (mg/dL)" 
>                              2  "Hemoglobin (g/dL)"           
>                              3  "Hematocrit (%)"              
>                              4  "Serum iron (mcg/dL)"         
>                              5  "Serum albumin (g/dL)"        
>                              6  "Serum vitamin C (mg/dL)"     
>                              7  "Serum zinc (mcg/dL)"         
>                              8  "Serum copper (mcg/dL)"       
>                              9 "Lead (mcg/dL)"                
>                              , modify

. collect preview

---------------------------------------------------------------------------------
                            |  Normotensive   Hypertensive        Diff     pvalue
----------------------------+----------------------------------------------------
Age (years)                 |      42.16502       54.97281    12.80779          0
Height (cm)                 |      167.7243       167.5506   -.1736495   .3661002
Weight (kg)                 |      68.26626       76.85565    8.589386   9.1e-181
Body Mass Index             |      24.20231       27.36081    3.158506   4.5e-241
Systolic Blood Pressure     |       116.485       150.5388    34.05383          0
Diastolic Blood Pressure    |      74.17222       92.01394    17.84172          0
Serum cholesterol (mg/dL)   |      208.7272       229.8798     21.1526   4.3e-105
Serum triglycerides (mg/dL) |      129.2284       166.0427     36.8143   7.01e-41
High density lipids (mg/dL) |      49.94449       49.21784   -.7266526   .0194611
Hemoglobin (g/dL)           |      14.14038       14.42436    .2839752   4.99e-25
Hematocrit (%)              |      41.65235       42.44271    .7903588   2.16e-27
Serum iron (mcg/dL)         |       101.842       96.17436   -5.667648   5.70e-17
Serum albumin (g/dL)        |      4.680295       4.654088   -.0262068   .0000896
Serum vitamin C (mg/dL)     |      1.048238       1.016469   -.0317686   .0070212
Serum zinc (mcg/dL)         |      87.06462       85.74782   -1.316802   .0000162
Serum copper (mcg/dL)       |      125.0756       126.3356    1.259952   .0673572
Lead (mcg/dL)               |      13.87513       14.93369    1.058555   2.36e-09
---------------------------------------------------------------------------------

Finally, let’s format the numbers using collect style cell and remove the right border from the first column.

. collect style cell result[Normotensive Hypertensive Diff], nformat(%8.2f)

. collect style cell result[pvalue], nformat(%6.4f)

. collect style cell border_block, border(right, pattern(nil))

. collect preview

-------------------------------------------------------------------------
                             Normotensive   Hypertensive    Diff   pvalue
-------------------------------------------------------------------------
Age (years)                         42.17          54.97   12.81   0.0000
Height (cm)                        167.72         167.55   -0.17   0.3661
Weight (kg)                         68.27          76.86    8.59   0.0000
Body Mass Index                     24.20          27.36    3.16   0.0000
Systolic Blood Pressure            116.49         150.54   34.05   0.0000
Diastolic Blood Pressure            74.17          92.01   17.84   0.0000
Serum cholesterol (mg/dL)          208.73         229.88   21.15   0.0000
Serum triglycerides (mg/dL)        129.23         166.04   36.81   0.0000
High density lipids (mg/dL)         49.94          49.22   -0.73   0.0195
Hemoglobin (g/dL)                   14.14          14.42    0.28   0.0000
Hematocrit (%)                      41.65          42.44    0.79   0.0000
Serum iron (mcg/dL)                101.84          96.17   -5.67   0.0000
Serum albumin (g/dL)                 4.68           4.65   -0.03   0.0001
Serum vitamin C (mg/dL)              1.05           1.02   -0.03   0.0070
Serum zinc (mcg/dL)                 87.06          85.75   -1.32   0.0000
Serum copper (mcg/dL)              125.08         126.34    1.26   0.0674
Lead (mcg/dL)                       13.88          14.93    1.06   0.0000
-------------------------------------------------------------------------

Export the table to Microsoft Word

Once we’re happy with the layout of our table, we can export it to many different file formats. I’m going to use putdocx, collect style putdocx, and putdocx collect to export our table to a Microsoft Word document. Many of you will notice that the commands below are almost identical to the commands in my previous posts about tables.

. putdocx clear

. putdocx begin

. putdocx paragraph, style(Title)

. putdocx text ("Hypertension in the United States")

. putdocx paragraph, style(Heading1)

. putdocx text ("The National Health and Nutrition Examination Survey (NHANES)")

. putdocx paragraph

. putdocx text ("Hypertension is a major cause of morbidity and mortality in ")

. putdocx text ("the United States.  This report will explore the predictors ")

. putdocx text ("of hypertension using the NHANES dataset.")

. collect style putdocx, layout(autofitcontents)               
>         title("Table 2: Comparison of demographic, anthropometric, and lab results by Hypertension Status")

. putdocx collect
(collection Table posted to putdocx)

. putdocx save MyTable2.docx, replace
successfully replaced "MyTable2.docx"

graph1

Conclusion

In this post, we learned how to use the command() option with the table command to create a table of statistical tests. The steps are simple: run the command of interest, type return list to view the statistics left in memory, create your table using the row dimension command and the column dimension result, and place your command in the command() option. You may wish to specify custom columns for the statistics in your table, and we learned how to use local macros to simplify that task.

I will show you how to use the command() option to create a table of regression coefficients in my next post.