Archive

Posts Tagged ‘Excel’

Retaining an Excel cell’s format when using putexcel

In a previous blog entry, I talked about the new Stata 13 command putexcel and how we could use putexcel with a Stata command’s stored results to create tables in an Excel file.

After the entry was posted, a few users pointed out two features they wanted added to putexcel:

  1. Retain a cell’s format after writing numeric data to it.
  2. Allow putexcel to format a cell.

In Stata 13.1, we added the new option keepcellformat to putexcel. This option retains a cell’s format after writing numeric data to it. keepcellformat is useful for people who want to automate the updating of a report or paper.

To review, the basic syntax of putexcel is as follows:

putexcel excel_cell=(expression) … using filename[, options]

If you are working with matrices, the syntax is

putexcel excel_cell=matrix(expression) … using filename[, options]

In the previous blog post, we exported a simple table created by the correlate command by using the commands below.

. sysuse auto
(1978 Automobile Data)

. correlate foreign mpg
(obs=74)

             |  foreign      mpg
-------------+------------------
     foreign |   1.0000
         mpg |   0.3934   1.0000

. putexcel A1=matrix(r(C), names) using corr

These commands created the file corr.xlsx, which contained the table below in the first worksheet.

results0

As you can see, this table is not formatted. So, I formatted the table by hand in Excel so that the correlations were rounded to two digits and the column and row headers were bold with a blue background.

results1

putexcel‘s default behavior is to remove the formatting of cells. Thus, if we want to change the correlated variables in our command from foreign and mpg to foreign and weight using the below commands, the new correlations shown in Excel will revert to the default format:

. sysuse auto, clear
(1978 Automobile Data)

. correlate foreign weight
(obs=74)

             |  foreign   weight
-------------+------------------
     foreign |   1.0000
      weight |  -0.5928   1.0000

. putexcel A1=matrix(r(C), names) using corr, modify

results2

As of Stata 13.1, you can now use the keepcellformat option to preserve a numeric cell’s format when writing to it. For example, the command

. putexcel A1=matrix(r(C), names) using corr, modify keepcellformat

will produce

results3

Let’s look at a real-world problem and really see how the keepcellformat option can help us. Suppose we need to export the following tabulate table to a report we wrote in Word.

. webuse auto2, clear
(1978 Automobile Data)

. label variable rep78 "Repair Record"

. tabulate rep78

     Repair |
     Record |      Freq.     Percent        Cum.
------------+-----------------------------------
       Poor |          2        2.90        2.90
       Fair |          8       11.59       14.49
    Average |         30       43.48       57.97
       Good |         18       26.09       84.06
  Excellent |         11       15.94      100.00
------------+-----------------------------------
      Total |         69      100.00

In the previous putexcel blog post, I mentioned my user-written command tab2xl, which exports a one-way tabulation to an Excel file. I have since updated the command so that it uses the new keepcellformat option to preserve cell formatting. You can download the updated tab2xl command by typing the following:

. net install http://www.stata.com/users/kcrow/tab2xl, replace

Using this command, I can now export my tabulate table to Excel by typing

. tab2xl rep78 using tables, row(1) col(1)

Once the table is in Excel, I format it by hand so that it looks like this:

results4

I then link this Excel table to a Word document. When you link an Excel table to a Word document, it

  1. preserves the formatting of the table and
  2. automatically updates the Word document when you update the Excel table.

It is fairly easy to link an Excel table to a Word document or PowerPoint presentation. In Excel/Word 2010, you would do as follows:

  1. Highlight the table/data in Excel.
  2. On the Home tab, click on the Copy button.
  3. Open the Word document and scroll to where you want the table pasted.
  4. On the Home tab of Word, click on the Paste button.
  5. Select Link & Keep Source Formatting, Link, from the Paste icon menu.

My report now looks like this:

report1

With the Excel table linked into Word, any time we update our Excel table using putexcel, we also update our table in Word.

Suppose that after a few weeks, we get more repair record data. We now need to update our report, and our new tabulate table looks like this:

. tabulate rep78

     Repair |
     Record |      Freq.     Percent        Cum.
------------+-----------------------------------
       Poor |          4        2.90        2.90
       Fair |          8        5.80        8.70
    Average |         60       43.48       52.17
       Good |         44       31.88       84.06
  Excellent |         22       15.94      100.00
------------+-----------------------------------
      Total |        138      100.00

To update the report, we simply need to reissue the putexcel command after tabulate.

. tabulate rep78
. tab2xl rep78 using tables, row(1) col(1)

The linked Word report will automatically reflect the changes:

report2

Categories: Programming Tags: , , , , ,

Export tables to Excel

There is a new command in Stata 13, putexcel, that allows you to easily export matrices, expressions, and stored results to an Excel file. Combining putexcel with a Stata command’s stored results allows you to create the table displayed in your Stata Results window in an Excel file.

A stored result is simply a scalar, macro, or matrix stored in memory after you run a Stata command. The two main types of stored results are e-class (for estimation commands) and r-class (for general commands). You can list a command’s stored results after it has been run by typing ereturn list (for estimation commands) and return list (for general commands). Let’s try a simple example by loading the auto dataset and running correlate on the variables foreign and mpg

. sysuse auto
(1978 Automobile Data)

. correlate foreign mpg
(obs=74)

             |  foreign      mpg
-------------+------------------
     foreign |   1.0000
         mpg |   0.3934   1.0000

Because correlate is not an estimation command, use the return list command to see its stored results.

. return list

scalars:
                  r(N) =  74
                r(rho) =  .3933974152205484

matrices:
                  r(C) :  2 x 2

Now we can use putexcel to export these results to Excel. The basic syntax of putexcel is

putexcel excel_cell=(expression) … using filename [, options]

If you are working with matrices, the syntax is

putexcel excel_cell=matrix(expression) … using filename [, options]

It is easy to build the above syntax in the putexcel dialog. There is a helpful video on Youtube about the dialog here. Let’s list the matrix r(C) to see what it contains.

. matrix list r(C)

symmetric r(C)[2,2]
           foreign        mpg
foreign          1
    mpg  .39339742          1

To re-create the table in Excel, we need to export the matrix r(C) with the matrix row and column names. The command to type in your Stata Command window is

putexcel A1=matrix(r(C), names) using corr

Note that to export the matrix row and column names, we used the names option after we specifed the matrix r(C). When I open the file corr.xlsx in Excel, the table below is displayed.

results0

Next let’s try a more involved example. Load the auto dataset, and run a tabulation on the variable foreign. Because tabulate is not an estimation command, use the return list command to see its stored results.

. sysuse auto
(1978 Automobile Data)

. tabulate foreign

   Car type |      Freq.     Percent        Cum.
------------+-----------------------------------
   Domestic |         52       70.27       70.27
    Foreign |         22       29.73      100.00
------------+-----------------------------------
      Total |         74      100.00

. return list

scalars:
                  r(N) =  74
                  r(r) =  2

tabulate is different from most commands in Stata in that it does not automatically save all the results we need into the stored results (we will use scalar r(N)). We need to use the matcell() and matrow() options of tabulate to save the results produced by the command into two Stata matrices.

. tabulate foreign, matcell(freq) matrow(names)

   Car type |      Freq.     Percent        Cum.
------------+-----------------------------------
   Domestic |         52       70.27       70.27
    Foreign |         22       29.73      100.00
------------+-----------------------------------
      Total |         74      100.00

. matrix list freq

freq[2,1]
    c1
r1  52
r2  22

. matrix list names

names[2,1]
    c1
r1   0
r2   1

The putexcel commands used to create a basic tabulation table in Excel column 1 row 1 are

putexcel A1=("Car type") B1=("Freq.") C1=("Percent") using results, replace
putexcel A2=matrix(names) B2=matrix(freq) C2=matrix(freq/r(N)) using results,
     modify

Below is the table produced in Excel by these commands.

results1

Again this is a basic tabulation table. You probably noticed that we did not have the Cum. column or the Total row in the export table. Also our Car type column contains the numeric values (0,1), not the value lables (Domestic, Foreign) of the variable foreign, and our Percent column is not formatted correctly. To get the exact table displayed in the Results window into an Excel file takes a little programming. With a few functions and a forvalues loop, we can easily export any table produced by running the tabulate command on a numeric variable.

There are two extended macro functions, label and display, that can help us. The label function can extract the value labels for each variable, and the display function can correctly format numbers for our numeric columns. Last, we use forvalues to loop over the rows of the returned matrices to produce our final tables. Our do-file to produce the tabulate table in Excel looks like

sysuse auto
tabulate foreign, matcell(freq) matrow(names)

putexcel A1=("Car type") B1=("Freq.") C1=("Percent") D1=("Cum.") using results, replace

local rows = rowsof(names)
local row = 2
local cum_percent = 0

forvalues i = 1/`rows' {

        local val = names[`i',1]
        local val_lab : label (foreign) `val'

        local freq_val = freq[`i',1]

        local percent_val = `freq_val'/`r(N)'*100
        local percent_val : display %9.2f `percent_val'

        local cum_percent : display %9.2f (`cum_percent' + `percent_val')

        putexcel A`row'=("`val_lab'") B`row'=(`freq_val') C`row'=(`percent_val') ///
                D`row'=(`cum_percent') using results, modify
        local row = `row' + 1
}

putexcel A`row'=("Total") B`row'=(r(N)) C`row'=(100.00) using results, modify

The above commands produce this table in Excel:

results2

The solution above works well for this one table, but what if we need to export the tabulation table for 100 variables to the same Excel spreadsheet? It would be very tedious to run the same do-file 100 times, each time changing the cell and row numbers. Now we could easily change our do-file into the Stata command (ado-file) called tab2xl. The syntax for our new command could be

tab2xl varname using filename, row(rownumber) col(colnumber) [replace sheet(name)]

The pseudocode of our program (file tab2xl.ado) looks like

program tab2xl
  /* parse command syntax */

  /* tabulate varname */

  /* get column letters based on starting column number passed in */

  /* write header row to filename in starting row number passed in */

  /* loop over rows of returned matrix and calculate/write values to filename */

  /* write total row to filename */
end

If you would like to download a working version of our tab2xl command, type

net install http://www.stata.com/users/kcrow/tab2xl

in Stata.

Categories: Programming Tags: , , , ,

Using import excel with real world data

Stata 12′s new import excel command can help you easily import real-world Excel files into Stata. Excel files often contain header and footer information in the first few and last few rows of a sheet, and you may not want that information loaded. Also, the column labels used in the sheet are invalid Stata variable names and therefore cannot be loaded. Both of these issues can be easily solved using import excel.

Let’s start by looking at an Excel spreadsheet, metro_gdp.xls, that is downloaded from the Bureau of Economic Analysis website.

Microsoft Excel screenshot

 

As you can see, the first five rows of the Excel file contain a description of the data, and rows 374 through 381 contain footer notes. We don’t want to load these rows into Stata. import excel has a cellrange() option that can help us avoid unwanted information being loaded.

With cellrange(), you specify the upper left cell and the lower right cell (using standard Excel notation) of the area of data you want loaded. In the file metro_gdp.xls, we want all the data from column A row 6 (upper left cell) to column L row 373 (lower right cell) loaded into Stata. To do this, we type

. import excel metro_gdp.xls, cellrange(A6:L373) clear

In Stata, we open the Data Editor to inspect the loaded data.

Stata Data Editor

 

The first row of the data we loaded contained column labels. Because of these labels, import excel loaded all the data as strings. import excel again has an easy fix. We need to specify the firstrow option to tell import excel that the first row of data contains the variable names.

. import excel metro_gdp.xls, cellrange(A6:L373) firstrow clear

We again open the Data Editor to inspect the data.

Stata Data Editor

 

The data are now in the correct format, but we are missing the year column labels. Stata does not accept numeric variable names, so import excel has to use the Excel column name (C, D, …) for the variable names instead of 2001, 2002, …. The simple solution is to rename the column headers in Excel to something like y2001, y2002, etc., before loading. You can also use Stata to rename the column headers. import excel saves the values in the first row of data as variable labels so that the information is not lost. If we describe the data, we will see all the column labels from the Excel file saved as variable labels.

. describe

Contains data
  obs:           367
 vars:            12
 size:        37,067
-------------------------------------------------------------------------------
              storage  display     value
variable name   type   format      label      variable label
-------------------------------------------------------------------------------
Fips            str5   %9s                    Fips
Area            str56  %56s                   Area
C               long   %10.0g                 2001
D               long   %10.0g                 2002
E               long   %10.0g                 2003
F               long   %10.0g                 2004
G               long   %10.0g                 2005
H               long   %10.0g                 2006
I               long   %10.0g                 2007
J               long   %10.0g                 2008
K               long   %10.0g                 2009
L               long   %10.0g                 2010
-------------------------------------------------------------------------------
Sorted by:
     Note:  dataset has changed since last saved

We want to grab the variable label for each variable by using the extended macro function :variable label varname, create a valid lowercase variable name from that label by using the strtoname() and lower() functions, and rename the variable to the new name by using rename. We can do this with a foreach loop.

foreach var of varlist _all {
        local label : variable label `var'
        local new_name = lower(strtoname("`label'"))
        rename `var' `new_name'
}

Now when we describe our data, they look like this:

. describe

Contains data
  obs:           367
 vars:            12
 size:        37,067                          
-------------------------------------------------------------------------------
              storage  display     value      
variable name   type   format      label      variable label
-------------------------------------------------------------------------------
fips            str5   %9s                    Fips
area            str56  %56s                   Area
_2001           long   %10.0g                 2001
_2002           long   %10.0g                 2002
_2003           long   %10.0g                 2003
_2004           long   %10.0g                 2004
_2005           long   %10.0g                 2005
_2006           long   %10.0g                 2006
_2007           long   %10.0g                 2007
_2008           long   %10.0g                 2008
_2009           long   %10.0g                 2009
_2010           long   %10.0g                 2010
-------------------------------------------------------------------------------
Sorted by:  
     Note:  dataset has changed since last saved

One last thing we might want to do is to rename the year variables from _20## to y20##, which we can easily accomplish with rename:

. rename (_*) (y*)

. describe

Contains data
  obs:           367
 vars:            12
 size:        37,067                          
-------------------------------------------------------------------------------
              storage  display     value      
variable name   type   format      label      variable label
-------------------------------------------------------------------------------
fips            str5   %9s                    Fips
area            str56  %56s                   Area
y2001           long   %10.0g                 2001
y2002           long   %10.0g                 2002
y2003           long   %10.0g                 2003
y2004           long   %10.0g                 2004
y2005           long   %10.0g                 2005
y2006           long   %10.0g                 2006
y2007           long   %10.0g                 2007
y2008           long   %10.0g                 2008
y2009           long   %10.0g                 2009
y2010           long   %10.0g                 2010
-------------------------------------------------------------------------------
Sorted by:  
     Note:  dataset has changed since last saved
Categories: Data Management Tags: ,