## Creating Excel tables with putexcel, part 2: Macro, picture, matrix, and formula expressions

In my last post, I showed how to use **putexcel** to write simple expressions to Microsoft Excel and format the resulting text and cells. Today, I want to show you how to write more complex expressions such as macros, graphs, and matrices. I will even show you how to write formulas to Excel to create calculated cells. These are important steps toward our goal of automating the creation of reports in Excel.

Before we begin the examples, I want to point out that I am going to create a single Excel file called **putexcel2.xlsx** and write each example to a separate sheet within that file. So the **putexcel set** commands will look like this:

putexcel set putexcel2.xlsx, sheet(example1) replace putexcel set putexcel2.xlsx, sheet(example2) modify ... putexcel set putexcel2.xlsx, sheet(example8) modify

I could create a separate Excel file for each example, but I would prefer to have a single file with multiple sheets. You may wish to use a similar approach when creating large reports.

You can download the dataset used in the examples by typing **webuse nhanes2**. I am using a subset of the data and have relabeled some of the variables so don’t be concerned if your numbers don’t match mine exactly.

**Example 1: Writing returned results to Excel**

Many Stata commands return scalars, macros, and matrices. We can write each of these to an Excel table using **putexcel**. For example, I might wish to write the mean of **age** to Excel. I can type **return list** after **summarize age** to see a list of returned results. The mean is stored in the scalar **r(mean)**.

. summarize age Variable | Obs Mean Std. Dev. Min Max -------------+--------------------------------------------------------- age | 1,266 48.44076 16.98858 20 74 . return list scalars: r(N) = 1266 r(sum_w) = 1266 r(mean) = 48.44075829383886 r(Var) = 288.6119026656426 r(sd) = 16.98858153777539 r(min) = 20 r(max) = 74 r(sum) = 61326

I begin by setting the destination to the sheet “example1” in the file putexcel2.xlxs. I can then write the expression “Mean Age = ” to cell A1 and the expression **`r(mean)’** to cell B1. Note that **r(mean)** is enclosed in single quotes. This tells Stata that I want to write the value of **r(mean)** to cell B2.

. putexcel set putexcel2.xlsx, sheet(example1) replace Note: file will be replaced when the first putexcel command is issued . putexcel A1 = "Mean Age = " file putexcel2.xlsx saved . putexcel B1 = `r(mean)' file putexcel2.xlsx saved

I can open the resulting Excel file and see that the mean was successfully written to cell B2.

**Example 2: Formatting numbers with Excel formats**

I can specify the display format of a number using Excel formats in the **nformat()** option. For example, I could display the mean with three decimal places using the option **nformat(“#.###”)**.

. putexcel set putexcel2.xlsx, sheet(example2) modify . putexcel A1 = "Mean Age = " file putexcel2.xlsx saved . putexcel B1 = `r(mean)', nformat("#.###") file putexcel2.xlsx saved

You can view a complete description of Excel’s formatting options in the Options section of [P] **putexcel advanced**.

**Example 3: Formatting numbers with Stata formats**

Stata includes many shortcut format codes that can be used with **nformat()**. For example, I could display the mean with two decimal places using the option **number_d2**.

. putexcel set putexcel2.xlsx, sheet(example3) modify . putexcel A1 = "Mean Age = " file putexcel2.xlsx saved . putexcel B1 = `r(mean)', nformat(number_d2) file putexcel2.xlsx saved

You can view a complete list of codes for numeric formats in the Appendix of [P] **putexcel**.

**Example 4: Formatting numbers with the string() function**

I could also report the mean and standard deviation in a single cell. This can be accomplished in two steps. First, I store **r(mean)** and **r(sd)** to the local macros **meanage** and **sdage**, respectively. The **string()** function allows me to specify one decimal place. Second, I create a local macro named **meansd** that combines **meanage** and **sdage** into a single expression. Note that I must use double quotes in the command **putexcel B1 = “`meansd'”** because **meansd** is a string.

. // put the mean of age in a cell using the string() function . putexcel set putexcel2.xlsx, sheet(example4) modify . putexcel A1 = "Mean (SD) = " file putexcel2.xlsx saved . local meanage = string(`r(mean)',"%9.1f") . local sdage = string(`r(sd)',"%9.1f") . local meansd = "`meanage' (`sdage')" . putexcel B1 = "`meansd'" file putexcel2.xlsx saved

**Example 5: Adding a graph to an Excel file**

I might wish to include a histogram of **age** in my Excel file. First, I create the histogram and use **graph export** to save the graph to a .png file.

. histogram age (bin=31, start=20, width=1.7419355)

. graph export age.png, replace (file age.png written in PNG format)

Then, I can put the graph in the spreadsheet using the expression **picture(age.png)**

. putexcel set putexcel2.xlsx, sheet(example5) modify . putexcel A1 = picture(age.png) file putexcel2.xlsx saved

**Example 6: Writing a matrix to Excel**

Some Stata commands return matrices. For example, I can use **tabstat** to calculate descriptive statistics for a list of variables. The **save** option tells **tabstat** to save the results to a matrix.

. tabstat age height weight sbp dbp, stat(n mean sd min max) save stats | age height weight sbp dbp ---------+-------------------------------------------------- N | 1266 1266 1266 1267 1267 mean | 48.44076 167.1308 72.17252 130.6843 80.54854 sd | 16.98858 9.61487 16.28185 24.34302 13.99187 min | 20 144.199 39.12 65 35 max | 74 193.398 175.88 254 150 ------------------------------------------------------------

When I type **return list**, I see that **tabstat** returned the matrix **r(StatTotal)**.

. return list matrices: r(StatTotal) : 5 x 5 . matlist r(StatTotal) | age height weight sbp dbp -------------+------------------------------------------------------- N | 1266 1266 1266 1267 1267 mean | 48.44076 167.1308 72.17252 130.6843 80.54854 sd | 16.98858 9.61487 16.28185 24.34302 13.99187 min | 20 144.199 39.12 65 35 max | 74 193.398 175.88 254 150

I would prefer to create an Excel table that looks like the output from **summarize** with variables in rows and statistics in columns. So I create a matrix called **results**, which is equal to the transpose of **r(StatTotal)**.

. matrix results = r(StatTotal)' . matlist results | N mean sd min max -------------+------------------------------------------------------- age | 1266 48.44076 16.98858 20 74 height | 1266 167.1308 9.61487 144.199 193.398 weight | 1266 72.17252 16.28185 39.12 175.88 sbp | 1267 130.6843 24.34302 65 254 dbp | 1267 80.54854 13.99187 35 150

Then, I write the matrix **results** to Excel using the expression **matrix(results)**. I could use the expression **matrix(r(StatTotal)’)** rather than creating a new matrix, but I wanted to show you the transposed matrix before writing it to Excel. The upper-left corner of the matrix will be placed in cell A1 of the Excel table. The **names** option tells **putexcel** to write the row and column names to Excel along with the matrix. The **nformat(number_d2)** option tells **putexcel** to display the matrix with two decimal places.

. putexcel set putexcel2.xlsx, sheet(example6) modify . putexcel A1 = matrix(results), names nformat(number_d2) file putexcel2.xlsx saved

Next, I would like to format the Excel table to make it look more like a table of results rather than a matrix. I can change the format of a cell without writing anything to the cell. I can even format a range of cells using the syntax *ul:br*, where *ul* is the upper-left corner of the cell range and *br* is the bottom-right cell.

The sample sizes for each variable displayed in cells B2:B6 do not need two decimal places. I can display cells B2:B6 with no decimals using the **nformat(number)** option. The option **overwritefmt** tells **putexcel** to overwrite the existing cell format.

. putexcel B2:B6, nformat(number) overwritefmt file putexcel2.xlsx saved

Next, I format the cells A1:A6. The **right** option aligns the text to the right side of the cells, and the **border(right)** option adds a border to the right side of the cells.

. putexcel A1:A6, right border(right) overwritefmt file putexcel2.xlsx saved

I format cells A1:F1 similarly. The **hcenter** option centers the labels horizontally, and the **border(bottom)** option adds a border to the bottom of cells A1:F1.

. putexcel A1:F1, hcenter bold border(bottom) overwritefmt file putexcel2.xlsx saved

Finally, I display the numbers in cells B2:F1 using a bold font to make the table look similar to the output of **summarize**.

. putexcel B2:F6, bold file putexcel2.xlsx saved

The formatted Excel table looks like this:

**Example 7: Writing regression coefficients to Excel**

Most Stata regression commands return the table of coefficients in a matrix named **r(table)**. For example, I could fit the following linear regression model with **regress**.

. regress sbp age sex Source | SS df MS Number of obs = 1,266 -------------+---------------------------------- F(2, 1263) = 236.23 Model | 204030.521 2 102015.261 Prob > F = 0.0000 Residual | 545432.459 1,263 431.854678 R-squared = 0.2722 -------------+---------------------------------- Adj R-squared = 0.2711 Total | 749462.98 1,265 592.460854 Root MSE = 20.781 ------------------------------------------------------------------------------ sbp | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- age | .7328247 .0343946 21.31 0.000 .6653479 .8003014 sex | 5.292591 1.170872 4.52 0.000 2.995522 7.58966 _cons | 92.6976 1.852992 50.03 0.000 89.06232 96.33289 ------------------------------------------------------------------------------

and type **matlist r(table)** to see the coefficient matrix.

. matlist r(table) | age sex _cons -------------+--------------------------------- b | .7328247 5.292591 92.6976 se | .0343946 1.170872 1.852992 t | 21.30641 4.520211 50.02589 pvalue | 2.65e-86 6.76e-06 6.9e-302 ll | .6653479 2.995522 89.06232 ul | .8003014 7.58966 96.33289 df | 1263 1263 1263 crit | 1.961844 1.961844 1.961844 eform | 0 0 0

**r(table)** contains the rows **df**, **crit**, and **eform**, which are not displayed in the regression output. I would like to reproduce the table of coefficients in my Excel table, so I store **r(table)** to the matrix **results**, extract the first six rows of **results**, and then transpose **results**.

. matrix results = r(table) . matrix results = results[1..6,1...]' . matlist results | b se t pvalue ll ul -------------+------------------------------------------------------------------ age | .7328247 .0343946 21.30641 2.65e-86 .6653479 .8003014 sex | 5.292591 1.170872 4.520211 6.76e-06 2.995522 7.58966 _cons | 92.6976 1.852992 50.02589 6.9e-302 89.06232 96.33289

Now, I can write **results** to my Excel file.

. putexcel set putexcel2.xlsx, sheet(example8) modify . putexcel A1 = matrix(results), names nformat(number_d2) hcenter file putexcel2.xlsx saved

A few adjustments to the font and cell formats make my matrix look more like a table of coefficients.

. putexcel B2:G4, bold overwritefmt file putexcel2.xlsx saved . putexcel A1:A4, right border(right) overwritefmt file putexcel2.xlsx saved . putexcel A1:G1, hcenter border(bottom) overwritefmt file putexcel2.xlsx saved

**Example 8: Writing cross-tabulations to Excel**

We can also write the results from **tabulate** to Excel using matrices. The **matcell()** option stores the cell counts from **tabulate** in a matrix. For example, I can store the results of the **tabulate** command below in the matrix **cellcounts**.

. tabulate sex race, matcell(cellcounts) | Race Sex | Black Other White | Total -----------+---------------------------------+---------- Female | 101 12 563 | 676 Male | 75 10 506 | 591 -----------+---------------------------------+---------- Total | 176 22 1,069 | 1,267 . matlist cellcounts | c1 c2 c3 -------------+--------------------------------- r1 | 101 12 563 r2 | 75 10 506

I can rename the rows of **cellcounts** with the value labels from **sex** using the following steps. First, **sex** is stored as a numeric variable, so I can use **decode** to create a string variable called **sex_s**. I could omit this step if **sex** was stored as a string variable. Next, I use **levelsof** to store the levels of **sex_s** to the local macro **sexlabels**. Then, I use **matrix rownames** to label the rows of **cellcounts** with the labels stored **sexlabels**.

. decode sex, generate(sex_s) . levelsof sex_s, local(sexlabels) `"Female"' `"Male"' . matrix rownames cellcounts = `sexlabels' . matlist cellcounts | c1 c2 c3 -------------+--------------------------------- Female | 101 12 563 Male | 75 10 506

I can use the same steps to rename the columns of **cellcounts** with the value labels of **race**.

. decode race, generate(race_s) . levelsof race_s, local(racelabels) `"Black"' `"Other"' `"White"' . matrix colnames cellcounts = `racelabels' . matlist cellcounts | Black Other White -------------+--------------------------------- Female | 101 12 563 Male | 75 10 506

I can then write **cellcounts** to Excel just like I did in the last two examples.

. putexcel set putexcel2.xlsx, sheet(example9) modify . putexcel A1 = matrix(cellcounts), names hcenter file putexcel2.xlsx saved

This approach worked, but there are no row and column totals in my Excel table. One way to add them is to put calculated cells in the table using **formula()** expressions. For example, the first line of **putexcel** below puts the Excel function SUM(B2:D2) in cell E2. This calculates the total for the first row of the table. The **putexcel** command below puts formulas in the table to calculate the row, column, and overall totals.

. putexcel E2 = formula(=SUM(B2:D2)) /// > E3 = formula(=SUM(B3:D3)) /// > B4 = formula(=SUM(B2:B3)) /// > C4 = formula(=SUM(C2:C3)) /// > D4 = formula(=SUM(D2:D3)) /// > E4 = formula(=SUM(B2:D3)), hcenter file putexcel2.xlsx saved

I can visually differentiate the row and column totals from the cells by adding labels and borders. I can also display the numbers with a bold font.

. putexcel A4 = "Total" E1 = "Total", hcenter file putexcel2.xlsx saved . putexcel A1:A4 D1:D4, border(right) overwritefmt file putexcel2.xlsx saved . putexcel A1:E1 A3:E3, border(bottom) overwritefmt file putexcel2.xlsx saved . putexcel B2:E4, bold file putexcel2.xlsx saved

The Excel table looks similar to the output from **tabulate**.

Re-creating Stata output in Excel tables is easy with **putexcel** using returned scalars, macros, and matrices. Examples 1-7 generalize well for arbitrary variables, but I hardcoded the row and column totals for the variables **sex** and **race** in example 8.

Next time, I will show you how to generalize these examples for arbitrary variables and put more eleborate expressions in the table cells.

Pingback: The Stata Blog » Creating Excel tables with putexcel part 3: Writing custom reports for arbitrary variables()