Using dates and times from other software
Most software stores dates and times numerically, as durations from some sentinel date, but they differ on the sentinel date and on the units in which the duration is stored. Stata stores dates as the number of days since 01jan1960, and datetimes as the number of milliseconds since 01jan1960 00:00:00.000. January 3, 2011 is stored as 18,630, and 2pm on January 3 is stored as 1,609,682,400,000. Other packages use different choices for bases and units.
It sometimes happens that you need to process in Stata data imported from other software and end up with a numerical variable recording a date or datetime in the other software’s encoding. It is usually possible to adjust the numeric date or datetime values to the sentinel date and units that Stata uses. Below are conversion rules for SAS, SPSS, R, Excel, and Open Office.
SAS
SAS stores dates as the number of days since 01jan1960, the same as Stata:
. gen statadate = sasdate . format statadate %td
SAS stores datetimes as the number of seconds since 01jan1960 00:00:00, assuming 86,400 seconds/day. Thus, all that’s necessary is to multiply SAS datetimes by 1,000 and attach a %tc format to the result,
. gen double statatime = (sastime*1000) . format statatime %tc
It is important that variables containing SAS datetimes, such as sastime above, be imported as doubles into Stata.
SPSS
SPSS stores both dates and datetimes as the number of seconds since 14oct1582 00:00:00, assuming 86,400 seconds/day. To convert SPSS datetimes to Stata datetimes, type
. gen double statatime = (spsstime*1000) + tc(14oct1582 00:00) . format statatime %tc
Multiplying by 1,000 converts from seconds to milliseconds. Adding tc(14oct1582 00:00) accounts for the differing bases.
Function tc() returns the specified datetime as a Stata datetime, which is to say, the number of milliseconds between the specified datetime and 01jan1960 00:00:00.000. We need to add the difference between SPSS’s base and Stata’s base, which is tc(14oct1582 00:00) – tc(01jan1960 00:00), but tc(01jan1960) is definitionally 0, so that just leaves tc(14oct1582 00:00). tc(14oct1582), for your information, is -11,903,760,000,000.
SPSS dates are the same as SPSS datetimes, so to convert an SPSS date to a Stata date, we could type,
. gen double statatime = (spssdate*1000) + tc(14oct1582 00:00) . gen statadate = dofc(statatime) . format statadate %td . drop statatime
Function dofc() converts a Stata datetime to a Stata date. We can combine the above into,
. gen statadate = dofc((spsstime*1000) + tc(14oct1582 00:00)) . format statadate %td
R
R stores dates as days since 01jan1970. To convert to a Stata date,
. gen statadate = rdate - td(01jan1970) . format statadate %td
Stata uses 01jan1960 as the base, R uses 01jan1970, so all you have to do is subtract the number of days between 01jan1970 and 01jan1960.
R stores datetimes as the number of UTC adjusted seconds since 01jan1970 00:00:00. UTC stands for Universal Time Coordinated. Rather than assuming 86,400 seconds/day, some UTC days have 86,401 seconds. Leap seconds are sometimes inserted into UTC days to keep the clock coordinated with the Earth’s rotation. Stata’s datetime %tC format is UTC time, which is to say, it accounts for these leap seconds. Thus, to convert R datetimes to Stata, you type
. gen double statatime = rtime - tC(01jan1970 00:00) . format statatime %tC
Note the use of Stata’s tC() function rather than tc() to obtain the number of milliseconds between the differing bases. tc() returns the number of seconds since 01jan1960 00:00:00 assuming 86,400 seconds/day. tC() returns the number of seconds adjusted for leap seconds. In this case, it would not make a difference if we mistakenly typed tc() rather than tC() because no leap seconds were inserted between 1960 and 1970. Had the base year been 1980, however, the use of tC() would have been important. Nine extra seconds were were inserted between 01jan1970 and 01jan1980!
In many cases you may prefer using a time variable that ignores leap seconds. In that case, You can type
. gen double statatime = cofC(rtime - tC(01jan1970 00:00)) . format statatime %tc
Excel
Excel has used different date systems for different operating systems. Excel for Windows used the “1900 Date System”. Excel for Mac used the “1904 Date System”. More recently, Microsoft has been standardizing on the 1900 Date System.
If you have an Excel for Windows workbook, it is likely to be using 1900.
If you have an Excel for Mac workbook, it is likely to be using 1904, unless it came from a Windows workbook originally.
Anyway, both Excels can use either encoding. See http://support.microsoft.com/kb/214330 for more information and for instrbuctions on converting your workbook between date systems.
In any case, you are unlikely to encounter Excel numerically coded dates. If you cut-and-paste the spreadsheet into Stata’s Data editor, dates and datetimes paste as strings in human-readable form. If you use a conversion package, most know to convert the date for you.
Excel, 1900 date system
For dates on or after 01mar1900, Excel 1900 Date System stores dates as days since 30dec1899. To convert to a Stata date,
. gen statadate = exceldate + td(30dec1899) . format statadate %td
Excel can store dates between 01jan1900 and 28feb1900, too, but the formula above will not handle those two months. See http://www.cpearson.com/excel/datetime.htm for more information.
For datetimes on or after 01may1900 00:00:00, Excel 1900 Date System stores datetimes as days plus fraction of day since 30dec1899 00:00:00. To convert with a one-second resolution to a Stata datetime,
. gen double statatime = round((exceltime+td(30dec1899))*86400)*1000 . format statatime %tc
Excel, 1904 date system
For dates on or after 01jan1904, Excel 1904 Date System stores dates as days since 01jan1904. To convert to a Stata date,
. gen statadate = exceldate + td(01jan1904) . format statadate %td
For datetimes on or after 01jan1904 00:00:00, Excel 1904 Date System stores datetimes as days plus fraction of day since 01jan1904 00:00:00. To convert with a one-second resolution to a Stata datetime,
. gen double statatime = round((exceltime+td(01jan1904))*86400)*1000 . format statatime %tc
Open Office
Open Office uses the Excel, 1900 Date System.
Why Stata has two datetime encodings
We have just seen that most packages assume 86,400 seconds/day, but that one instead uses UTC time, in which days have 86,400 or 86,401 seconds, depending. Stata provides both datetime encodings, called %tc and %tC. That turned out to be convenient in translating times from other packages. Stata will even let you switch from one to the other using the cofC() and Cofc functions, so you know you should be asking, which should I use?
Stata’s %tc format assumes that there are 24*60*60*1,000 ms per day — 86,400 seconds per day — just as an atomic clock does. Atomic clocks count oscillations between the nucleus and electrons of an atom and thus provide a measurement of the real passage of time.
Time of day measurements have historically been based on astronomical observation, which is a fancy way of saying, based on looking at the sun. The sun should be at its highest point at noon, right? So however you kept track of time — falling grains of sand or a wound up spring — you periodically reset your clock and then went about your business. In olden times it was understood that the 60 seconds per minute, 60 minutes per hour, 24 hours per day, were theoretical goals that no mechanical device could reproduce accurately. These days, we have have more accurate definitions for measuring time. A second is 9,192,631,770 periods of the radiation corresponding to the transition between two levels of the ground state of caesium 133. Obviously we have better equipment than the ancients, so problem solved, right? Wrong. There are two problems. The formal definition of a second is just a little too short to match length of a day, and the Earth’s rotation is slowing down.
As a result, since 1972 leap seconds have been added to atomic clocks once or twice a year to keep time measurements in synchronization with the earth’s rotation. Unlike leap years, however, there is no formula for predicting when leap seconds will occur. The Earth may be on average slowing down, but there is a large random component to that. As a result, leap seconds are determined by committee and announced 6 months before they are inserted. Leap seconds are added, if necessary, on the end of the day on June 30 and December 31 of the year. The inserted times are designated as 23:59:60.
Unadjusted atomic clocks may accurately mark the passage of real time, but you need to understand that leap seconds are every bit as real as every other second of the year. Once a leap second is inserted, it ticks just like any other second and real things can happen during that tick.
You may have heard of terms such as GMT and UTC.
GMT is the old Greenwich Mean Time and is based on astronomical observation. GMT has been supplanted by UTC.
UTC stands for coordinated universal time and is measured by atomic clocks, occasionally corrected for leap seconds. UTC is derived from two other times, UT1 and TAI. UT1 is the mean solar time, with which UTC is kept in sync by the occasional addition of a leap second. TAI is the atomically measured pure time. TAI was set to GMT plus 10 seconds in 1958 and has been running unadjusted since then. Update 07 Jan 2010: TAI is a statistical combination of various atomic chronometers and even it has not ticked uniformly over its history; see http://www.ucolick.org/~sla/leapsecs/timescales.html and especially http://www.ucolick.org/~sla/leapsecs/dutc.html#TAI. (Thanks to Steve Allen of the UCO/Lick Observatory for correcting my understanding and for the reference.)
UNK is StataCorp’s term for the time standard most people use. UNK stands for unknowing. UNK is based on a recent time observation, probably UTC, and then just assuming that there are 86,400 seconds per day after that.
The UNK standard is adequate for many purposes, and in such cases, you will want to use %tc rather than the leap second-adjusted %tC encoding. If you are using computer-timestamped data, however, you need to find out whether the timestamping system accounted for leap-second adjustments. Problems can arise even if you do not care about losing or gaining a second here and there.
For instance, you may import timestamp values from other systems recorded in the number of milliseconds that have passed since some agreed upon date. If you choose the wrong encoding scheme, if you chose tc when you should choose %tC, or vice versa, more recent times will be off by 24 seconds.
To avoid such problems, you may decide to import and export data by using Human Readable Forms (HRF) such as “Fri Aug 18 14:05:36 CDT 2006”. This method has advantages, but for %tC (UTC) encoding, times such as 23:59:60 are possible. Some software will refuse to decode such times.
Stata refuses to decode 23:59:60 in the %tc encoding (function clock) and accepts it with %tC (function Clock()). When %tC function Clock() sees a time with a 60th second, Clock() verifies that the time corresponds to an official leap second. Thus, when translating from printable forms, try assuming %tc and check the result for missing values. If there are none, you can assume your use of %tc is valid. If there are missing values and they are due to leap seconds and not some other error, you must use %tC function Clock() to translate from HRF. After that, if you still want to work in %tc units, use function cofC() to translate %tC values into %tc.
If precision matters, the best way to process %tC data is simply to treat them that way. The inconvenience is that you cannot assume that there are 86,400 seconds per day. To obtain the duration between dates, you must subtract the two time values involved. The other difficulty has to do with dealing with dates in the future. Under the %tC (UTC) encoding, there is no set value for any date more than 6 months in the future.
Advice
Stata provides two datetime encodings:
- %tC, also known as UTC, which accounts for leap seconds, and
- %tc, which ignores them (it assumes 86,400 seconds/day).
Systems vary in how they treat time variables. My advice is,
- If you obtain data from a system that accounts for leap seconds, import using Stata’s %tC.
- If you later need to export data to a system that does not account for leap seconds, use Stata’s cofC() function to translate time values before exporting.
- If you intend to tsset the time variable and the analysis will be at the second level or finer, just tsset the %tC variable, specifying the appropriate delta() if necessary, for example, delta(1000) for seconds.
- If you intend to tsset the time variable and the analysis will be at coarser than the second level (minute, hour, etc.), create a %tc variable from the %tC variable (generate double tctime = cofC(tCtime<) and tsset that, specifying the appropriate delta() if necessary. You must do that because, in a %tC variable, there are not necessarily 60 seconds in a minute; some minutes have 61 seconds.
- If you later need to export data to a system that does not account for leap seconds, use Stata’s cofC() function to translate time values before exporting.
- If you obtain data from a system that ignores leap seconds, use Stata’s %tc.
- If you later need to export data to a system that does account for leap seconds, use Stata’s Cofc() function to translate time values.
- If you intend to tsset the time variable, just tsset it, specifying the appropriate delta().
- If you later need to export data to a system that does account for leap seconds, use Stata’s Cofc() function to translate time values.
Some users prefer always to use Stata’s %tc because those values are a little easier to work with. You can do that if
- you do not mind having up to 1 second of error and
- you do not import or export numerical values (clock ticks) from other systems that are using leap seconds, because then there could be nearly 30 seconds of accumulated error.
There are two things to remember if you use %tC variables:
- The number of seconds between two dates is a function of when the dates occurred. Five days from one date is not simply a matter of adding 5*24*60*60*1,000 ms. You might need to add another 1,000 ms. Three hundred and sixty-five days from now might require adding 1,000 or 2,000 ms. The longer the span, the more you might have to add. The best way to add durations to %tC variables is to extract the components, add to them, and then reconstruct from the numerical components.
- You cannot accurately predict datetimes more than six months into the future. We do not know what the %tC value will be of 25dec2026 00:00:00 because every year along the way, the International Earth Rotation Reference Systems Service (IERS) will twice announce whether there will be the insertion of a leap second.
You can help alleviate these inconveniences. Face west and throw rocks. The benefit will be only transitory if the rocks land back on Earth, so you need to throw them really hard. I know what you’re thinking, but this does not need to be a coordinated effort.