Home > Data Management > A tour of datetime in Stata

A tour of datetime in Stata

Converting a string date

Stata has a wide array of tools to work with dates. You can have dates in years, months, or even milliseconds. In this post, I will provide a brief tour of working with dates that will help you get started using all of Stata’s tools.

When you load a dataset, you will notice that every variable has a display format. For date variables, the display format is %td for daily dates, %tm for monthly dates, etc. Let’s load the wpi1 dataset as an example.

. webuse wpi1

. describe

Contains data from http://www.stata-press.com/data/r14/wpi1.dta
  obs:           124                          
 vars:             3                          28 Nov 2014 10:31
 size:         1,240                          
-------------------------------------------------------------------------------
              storage   display    value
variable name   type    format     label      variable label
-------------------------------------------------------------------------------
wpi             float   %9.0g                 U.S. Wholesale Price Index
t               int     %tq                   quarterly date
ln_wpi          float   %9.0g                 
-------------------------------------------------------------------------------
Sorted by: t

The display format for the variable t is %tq which indicates a quarterly date. At this point, we can use tsset to declare the data as time series and continue our statistical endeavors. In reality, however, date variables are stored in string or numeric formats, which are not directly interpretable by Stata. Below, we use functions for translating dates stored in string and numeric formats into a Stata date format. For details about converting dates from other software, see Using dates and times from other software

The two most widely used functions for translating string dates are date() and clock(). date() converts daily dates into the number of days elapsed since 01 Jan 1960. Similarly, clock() converts dates with time stamps into the number of milliseconds elapsed since 01 January 1960 00:00:00.000. For dates before 01 Jan 1960, Stata assigns negative numbers as the number of elapsed days. For example, 31 Dec 1959 is -1 days elapsed since 01 Jan 1960.

All date functions takes two arguments: the name of the string variable and a string specifying the order of the date and time components called mask. Let’s look at an example.

. clear

. input str18 mydate

                 mydate
  1. "20151001"
  2. "15-10-01"
  3. "102015"
  4. "01Oct2015 20:10"
  5. "14:10:05"
  6. end

I created a string variable mydate with five different types of dates as observations. The first observation “20151001” is ordered as year, month, and day, and the corresponding mask is “YMD”. I can translate this string date using date(mydate,”YMD”) and store the translated date in the variable newdate.

. quietly generate double newdate = date(mydate,"YMD") in 1

I recommend using the storage type double for newdate to minimize loss of precision. This is particularly important for clock time dates, as we will see later. Let’s list what Stata stored in newdate.

. list mydate newdate in 1

     +--------------------+
     |   mydate   newdate |
     |--------------------|
  1. | 20151001     20362 |
     +--------------------+

The numeric value of 01 October 2015 is equivalent to 20,362 days elapsed since 01 January 1960. At this point, we have successfully translated a string date into a numeric date. We can resume our analysis with newdate as our date variable. In a later section, I will change the display format using format to make it readable.

The next observation, “15-10-01”, is ordered the same way as the first except for the missing century and the presence of hyphens. To translate dates with a two-digit year, we need to inform Stata what century the year component refers to. In this case, we can enter the century by specifying “20YMD” as the mask. We can obtain the corresponding numeric date by typing

. quietly replace newdate = date(mydate,"20YMD") in 2

Note that we did not acknowledge the hyphens while converting the date above because Stata ignores any punctuation that exists in string dates. The third observation, “102015”, refers to a month and year. Because the day component does not exist, we only specify “MY” as the mask.

. quietly replace newdate = date(mydate,"MY") in 3

. list mydate newdate in 3

     +------------------+
     | mydate   newdate |
     |------------------|
  3. | 102015     20362 |
     +------------------+

Although we did not specify the day component, the date() function with the “MY” mask still converted the string date as the number of days elapsed. This is because Stata assumes a default value of 1 for the day component, so we inadvertently translated 01 Oct 2015 when in fact the date existed as only Oct 2015 in out data.

The fourth observation, “01Oct2015 20:10:40”, has an hour and a minute time stamp. In this case we use the clock() function instead of date(). The corresponding mask is “DMYhm”.

. quietly replace newdate = clock(mydate,"DMYhm") in 4

. list mydate newdate in 4

     +-----------------------------+
     |          mydate     newdate |
     |-----------------------------|
  4. | 01Oct2015 20:10   1.759e+12 |
     +-----------------------------+

The numeric value in newdate refers to a number with a magnitude \(10^{12}\) milliseconds elapsed since 01 Jan 1960 00:00:00.000.

We can also ignore certain components in the string by using the # symbol for those components. For example, we can specify “DMYh#” as the mask to ignore the minute component. However, if we omit the # symbol and just use “DMYh” instead, we will obtain a missing value. Stata will not assume a default value of 00 for the minute component as it did earlier for the date. The reason is that Stata always expects a mask for the full string in the observation.

Finally, the last observation, “14:10:05”, is in hour, minute, and second form. As you may have guessed, the corresponding mask is simply “hms”. We can convert it to a numeric value using clock(mydate,”hms”).

I have only used five variations of string dates in this post. There are many more Stata functions that fit almost any need. Also, there are numerous translation functions such as weekly(), monthly(), etc., that I have not mentioned.

Displaying dates

We prefer working with readable dates instead of elapsed dates and times. Once we have converted to a numeric date, we can simply use format to change the display format. Let’s look at an example.

. clear

. input str18 mydate

                 mydate
  1. "01Jan2015"
  2. "02Jan2015"
  3. "03Jan2015"
  4. "04Jan2015"
  5. "05Jan2015"
  6. end

The variable mydate contains daily dates with day, month, and year components. As you may have guessed, we use the date() function with a mask “DMY” to translate mydate into a numeric date.

. generate double newdate = date(mydate,"DMY")

. list

     +---------------------+
     |    mydate   newdate |
     |---------------------|
  1. | 01Jan2015     20089 |
  2. | 02Jan2015     20090 |
  3. | 03Jan2015     20091 |
  4. | 04Jan2015     20092 |
  5. | 05Jan2015     20093 |
     +---------------------+

I stored the numeric dates in the variable newdate. The observations are stored as the number of elapsed days. To change the display format, we use the format command.

. format %td newdate

. list

     +-----------------------+
     |    mydate     newdate |
     |-----------------------|
  1. | 01Jan2015   01jan2015 |
  2. | 02Jan2015   02jan2015 |
  3. | 03Jan2015   03jan2015 |
  4. | 04Jan2015   04jan2015 |
  5. | 05Jan2015   05jan2015 |
     +-----------------------+

%td refers to the format for displaying daily dates. We can now tsset our data for time-series or panel-data analysis.

Building dates from numeric components

Sometimes, dates exist as individual numeric components. In Stata, we can combine these individual components to obtain the desired date. Consider the following dataset as an example:

. clear

. input month day year hour minute

         month        day       year       hour     minute
  1. 01 10 2015 10 05
  2. 02 10 2015 05 10
  3. 03 15 2015 20 15
  4. 04 20 2015 12 20
  5. 05 05 2015 02 25
  6. end

I created five variables, namely month, day, year, hour, and minute. Let’s begin with combining the individual month, day, and year components in the variable newdate. As with translating string dates into numeric using functions, Stata provides a different set of functions for combining individual numeric date components. For example, we use the function mdy() to combine the month, day, and year components.

. generate double newdate = mdy(month,day,year)

. format %td newdate

. list newdate

     +-----------+
     |   newdate |
     |-----------|
  1. | 10jan2015 |
  2. | 10feb2015 |
  3. | 15mar2015 |
  4. | 20apr2015 |
  5. | 05may2015 |
     +-----------+

The arguments for mdy() are individual month, day, and year components. Suppose we want to add the hours and minutes to the existing date variable newdate. We use the function dhms(newdate,hour,minute,1), which takes date, hour, minute, and second components as arguments. Because seconds do not exist in the data, we add a 1 to denote the default value for the seconds component in dhms().

. quietly replace newdate = dhms(newdate,hour,minute,1)

. format %tc newdate

. list newdate

     +--------------------+
     |            newdate |
     |--------------------|
  1. | 10jan2015 10:05:01 |
  2. | 10feb2015 05:10:01 |
  3. | 15mar2015 20:15:01 |
  4. | 20apr2015 12:20:01 |
  5. | 05may2015 02:25:01 |
     +--------------------+

We used %tc because we want to display a datetime format instead of a date format. As a final example, I will create a monthly date variable by combining the year and month using the ym() function and use %tm as a display format for monthly dates.

. quietly replace newdate = ym(year,month)

. format %tm newdate

. list newdate

     +---------+
     | newdate |
     |---------|
  1. |  2015m1 |
  2. |  2015m2 |
  3. |  2015m3 |
  4. |  2015m4 |
  5. |  2015m5 |
     +---------+

There are other useful functions for combining individual components that I have not discussed here and you can read about them in the the Stata manuals. But the basic idea remains the same. Once you know the individual date and time components, you can pick any of the datetime functions to combine them.

Summary

We began our tour by noting that date variables have a specific display format such as %td for daily dates. However, date variables in raw data are often stored as strings. We converted such string dates to numeric dates using date() and clock() functions. Stata stores numeric dates as the number of elapsed days since 01 Jan 1960 for date() and the number of elapsed milliseconds since 01 Jan 1960 00:00:00:000 for clock(). We obtained a readable date by using the format command with %td for daily dates, %tc for datetime, and %tm for monthly. Finally, we also combined individual numeric date and time components to form the desired date variable.