Saturday, 21 September 2013

A few gotchas with R date-time classes

Date and time handling is essential to many modelling and analysis exercises, in R and other languages used for scientific computing. Over the past few months I tackled the mapping of date-time concepts between R and the .NET framework as part of the work on the rClr package. A few weeks ago Mollie Taylor posted on Date formats in R, which I found an interesting read, as I always have to remind myself of date-time formats when I need it. I thought I'd share what I learned with date-time handling in R in light of its mapping to .NET (Date-time mapping).

Date-time handling essentials

R has several classes that are representing date-time concepts: Date, POSIXct, , POSIXlt, each with its use of generic as functions to convert back and forth between them (Ripley and Hornick, 2001). Date effectively has a precision limited to one day, whereas POSIXt objects are down to a second. Importantly, POSIXt objects always have time zones attached to them, implicitely or explicitely; checl out ?Sys.timezone for details.

Important contributions have been made to date-time handling in R with the lubridate package (Grolemund and Wickham, 2011). It is very tempting to use lubridate classes, but because of the level of generality at which rClr aims however, it really needs to map to the core R date-time classes.

.NET has the types DateTime, DateTimeOffset, and TimeZoneInfo to deal with most date-time operations. A crucial difference with R is that DateTime purposely does not include time zone information, although it can be tagged as a UTC or Local date-time. Its system is overall less machine-dependent than R's, though not totally.

Daylight Savings Time. Whew! where do I start. You'd think this is bad enough to miss a breakfast catchup with friends, but it gets much worse when dealing with it in software. leap seconds are lurking, but thankfully I think I did not need to worry about it for R/.NET interop.

R date-time gotchas

Here are a few things I noticed when setting up unit tests for rClr . When converting date and times from UTC to local time you want to be careful which timezone you use, in particular avoid Sys.timezone without arguments.

Of course daylight saving times have to have a few gotchas; be careful of the effect if calculating time spans in time zones affected by DST:

If you create some time stamps to use as time series indexes, you have to choose between round stamps and time intervals consistents with the DST affected POSIXt objects: you cannot get both. 'Date' objects in R would work around the issue for daily time step, but if you need sub-daily and you need to think about it more carefully.


Conclusion

I highlighted in this post only a few gotchas: be assured there are more peculiarities and oddities both in R and .NET date-time handling (not to mention the COM stuff, *shiver*). A few take-home messages to avoid the main traps:
  • Use lubridate. I could not in rClr, but you probably should.
  • Use UTC as an explicit time zone in your data time stamp, if you can
  • Prefer, by a long shot, ISO 8601 date-time formats such as '2011-02-23 23:50:53', in R, Excel or anything else. Using it in your data and software will very likely save you a lot of grief down the track.

References

Brian D. Ripley and Kurt Hornik (2001), Date-Time Classes, in  http://www.r-project.org/doc/Rnews/Rnews_2001-2.pdf
Garrett Grolemund and Hadley Wickham (2011), Dates and Times Made Easy with lubridate, Journal of Statistical Software, April 2011, Volume 40, Issue 3. http://www.jstatsoft.org/v40/i03/paper
Choosing Between DateTime, DateTimeOffset, and TimeZoneInfo http://msdn.microsoft.com/en-us/library/bb384267.aspx