Thursday 26 March 2015

rClr 0.7-4 released

Version 0.7-4 of rClr (source code mirrored on GitHub), a package to access arbitrary .NET code seamlessly and in-process, has been released.

This is a maintenance release with an important fix to memory management. In some circumstances, passing data from R to .NET was leading to memory leaks. While it is not noticeable in many cases, this is an important fix and users should upgrade.

A list of salient features in the 0.7-x series can be found in a blog post on the previous release.


Roadmap

Notwithstanding the ever present need for documentation, the next feature I'd like to add, and which would lead to a 0.8-x series, is to have R reference classes. It is already quite functional on a branch, with automatic R reference class code generation, but the R code for reference classes can lead to unacceptably long runtimes even with not so complicated complicated inheritance trees.

I have since looked at the package R6 as a candidate to handle R classes with reference semantics for rClr. Would be nice to present at the useR! conference, but don't think I can make it "funding-wise".

The C/C++ layer or rClr could do with a reengineering to more C++. Also, some of the data conversion features are handled in C# now, so if anything there would be a reduction in code size.

Acknowledgements

My thanks to Justin Hughes for reporting the problem and providing code to help diagnose and reproduce the issue.


Tuesday 27 January 2015

rClr 0.7-3 released

Version 0.7-3 of rClr (source code also on GitHub), a package to access arbitrary .NET code seamlessly in-process, has just been released. The package was first introduced a year and a half ago in another blog post, and you can skim through that prior post to get an overview of the capabilities. This post will summarise newer features.



A lot has happened technologically to enable this release. The library R.NET has evolved and been through several rounds of fixes, runtime performance improvements, and refactoring. It is now the prefered subsystem in rClr to handle data conversion, and is active by default.

An overview of the improvements since the release series 0.5.x is:
  • Running unit tests on a Linux/Mono stack is at feature parity with Windows/Microsoft.NET. A few edge cases with date and time at daylight savings transitions are the only known limitation.
  • The conversion of data between R and .NET is increasingly handled via C# code rather than C++. While transparent to an R user, this is important because the same code can run on Mono and MS.NET, where C++ code is trickier and usually will vary between these two runtimes. Further data conversion features will be much, much easier to implement in C#.
  • Complex numbers are supported
  • Matrix conversions
  • It is possible to transparently .NET dictionaries into R named lists
  • Many difficulties with the discovery/loading of the native R shared libraries have been alleviated.

Roadmap

Some developments are reshaping the landscape of the .NET world. Most of the .NET stack is now being genuinely open sourced. Closer to the R community is the recent announcement of Revolution Analytics joining Microsoft. It is likely that a more open sourced .NET stack will greatly facilitate new features for rClr. I have no insight as to Revolution Analytics, but hope rClr has a role to play and will seek collaboration or advice.

Aside from this changing broader context, some things to tackle are:
  • Documentation, documentation, documentation...
  • Facilities to more easily convert data between .NET DataTable and R data frames
  • Supporting Generic .NET classes
  • Wrapping .NET classes and interfaces with R Reference Classes. This feature has been explored and is already quite functional on a branch, with automatic R reference class code generation, but the R code for reference classes can lead to unacceptably long runtimes even with not so complicated complicated inheritance trees.
  • Submission to CRAN?

Related work

A few packages using rClr are publicly accessible, and may be of interest if you want to build your own package with dependencies on rClr.




Saturday 21 September 2013

A few gotchas with R date-time classes

Date and time handling is essential to many modelling and analysis exercises, in R and other languages used for scientific computing. Over the past few months I tackled the mapping of date-time concepts between R and the .NET framework as part of the work on the rClr package. A few weeks ago Mollie Taylor posted on Date formats in R, which I found an interesting read, as I always have to remind myself of date-time formats when I need it. I thought I'd share what I learned with date-time handling in R in light of its mapping to .NET (Date-time mapping).

Date-time handling essentials

R has several classes that are representing date-time concepts: Date, POSIXct, , POSIXlt, each with its use of generic as functions to convert back and forth between them (Ripley and Hornick, 2001). Date effectively has a precision limited to one day, whereas POSIXt objects are down to a second. Importantly, POSIXt objects always have time zones attached to them, implicitely or explicitely; checl out ?Sys.timezone for details.

Important contributions have been made to date-time handling in R with the lubridate package (Grolemund and Wickham, 2011). It is very tempting to use lubridate classes, but because of the level of generality at which rClr aims however, it really needs to map to the core R date-time classes.

.NET has the types DateTime, DateTimeOffset, and TimeZoneInfo to deal with most date-time operations. A crucial difference with R is that DateTime purposely does not include time zone information, although it can be tagged as a UTC or Local date-time. Its system is overall less machine-dependent than R's, though not totally.

Daylight Savings Time. Whew! where do I start. You'd think this is bad enough to miss a breakfast catchup with friends, but it gets much worse when dealing with it in software. leap seconds are lurking, but thankfully I think I did not need to worry about it for R/.NET interop.

R date-time gotchas

Here are a few things I noticed when setting up unit tests for rClr . When converting date and times from UTC to local time you want to be careful which timezone you use, in particular avoid Sys.timezone without arguments.

Of course daylight saving times have to have a few gotchas; be careful of the effect if calculating time spans in time zones affected by DST:

If you create some time stamps to use as time series indexes, you have to choose between round stamps and time intervals consistents with the DST affected POSIXt objects: you cannot get both. 'Date' objects in R would work around the issue for daily time step, but if you need sub-daily and you need to think about it more carefully.


Conclusion

I highlighted in this post only a few gotchas: be assured there are more peculiarities and oddities both in R and .NET date-time handling (not to mention the COM stuff, *shiver*). A few take-home messages to avoid the main traps:
  • Use lubridate. I could not in rClr, but you probably should.
  • Use UTC as an explicit time zone in your data time stamp, if you can
  • Prefer, by a long shot, ISO 8601 date-time formats such as '2011-02-23 23:50:53', in R, Excel or anything else. Using it in your data and software will very likely save you a lot of grief down the track.

References

Brian D. Ripley and Kurt Hornik (2001), Date-Time Classes, in  http://www.r-project.org/doc/Rnews/Rnews_2001-2.pdf
Garrett Grolemund and Hadley Wickham (2011), Dates and Times Made Easy with lubridate, Journal of Statistical Software, April 2011, Volume 40, Issue 3. http://www.jstatsoft.org/v40/i03/paper
Choosing Between DateTime, DateTimeOffset, and TimeZoneInfo http://msdn.microsoft.com/en-us/library/bb384267.aspx

Tuesday 25 June 2013

rClr: low level access to .NET from R

rClr is a package to access arbitrary .NET code seamlessly. The "CLR" acronym part of the package name stands for Common Language Runtime. C# and R being languages I regularly use, I have felt the need for better interoperability between these for a few years. What started as week-end investigation out of curiosity grew to rClr. There has already been a few rounds of beta releases and it is quite functional running on Windows and using the Microsoft .NET Framework, hence this post. I used it regularly for my work for the past 9 months. Running on other operating systems with the Mono  CLR is also supported and is almost at feature-parity. After a bit more testing a tarball will be available.

A new beta version of the binary for Windows package is currently available at rClr on Codeplex, alongside the source code under LGPL 2.1. While likely to work as is on many Windows boxes, you may need to install the latest Microsoft Visual C++ runtime. Instructions on how to do this are at the web site.

A quick tour with some sample code, starting with a customary "Hello world" with a bit of GUI for good measure.
 The following sample shows that some of the package functions help to discover the content of loaded assemblies (i.e. .NET dynamic libraries), to reduce the need to get back to the source code.
A "complex" .NET object is essentially an external pointer (structure similar to that in rJava)
The package is designed to allow access to existing .NET code without modification to that code (well, for code well designed for access anyway). rClr is also designed to be made as intuitive as possible for users accustomed to R programming idioms. A corollary of that design is that data types are converted to their natural representation in each runtime whenever possible without ambiguity. The following table gives the conversion table for the most used unidimensional vector. This is not an exhaustive list of supported conversions.


mode type class length clrType
character character character 3 System.String[]
numeric integer integer 3 System.Int32[]
numeric double numeric 3 System.Double[]
logical logical logical 3 System.Boolean[]
numeric double Date 3 System.DateTime[]
numeric double POSIXct 3 System.DateTime[]
character character character 1 System.String
numeric integer integer 1 System.Int32
numeric double numeric 1 System.Double
logical logical logical 1 System.Boolean
numeric double Date 1 System.DateTime
numeric double POSIXct 1 System.DateTime

I've used  rClr to access environmental time stepping models in C#, to combine it with the statistical and visualisation strengths of R. One of the tutorials on the web site is a self-contained simplified use case.




Roadmap

I am presenting at the useR conference in a couple of weeks. First attendance, and really looking forward to meet a new crowd.
A few wrinkles needs ironing out for a first stable release of course, notably for running on *nix and MacOS (I "only" develop and test on a Debian box). Trailblazing testers and contributors are very welcome. The build process is inherently more complicated than your average package but this is alleviated with configure scripts. You can post questions/discussions through the web site.
Submission to CRAN is probably the next big item on the list, in preference to more features. While codeplex is fine for my codebase management needs it is not a typical go-to place for R users.

Acknowledgements

I gratefully acknowledge Kosei Abe for the nicely crafted R.NET library that is in places reused in the rClr package. R.NET is primarily designed for .NET developers to access the R engine, but I envisaged a growing role for it in rClr.
The package rJava by Simon Urbanek and other contributors also was a natural source of insight in my early investigations on how to tackle in-process interop of R and .NET.
Simon Knapp a few years ago presented a neat way to mix in-process R code with .NET via Python for .NET, and this led to the idea of the rClr package.