R-Forge Logo

Welcome to SQL-based analysis for large surveys project!

Analysing large complex surveys, such as the American Community Survey, using R to generate SQL code for MonetDB.

This project used to have two packages. It now just has one: sqlsurvey, for analysis of large surveys. You also need MonetDB.R, which replaces my JDBC-based interface. There are other MonetDB to R interfaces, but they won't work with the sqlsurvey package, because we've had to extend the R-DBI interface to handle concurrency problems from garbage collection.

Both packages require MonetDB, so installation is more complicated than just installing an R package. Under Windows it is important to use a 64-bit version of MonetDB to allow creation of large databases.

Examples (small enough to play with): Useful notes: Supported analyses:

None of the analyses will modify any existing database table, and the R survey design objects behave as if they are passed by value, like ordinary R objects. Temporary tables are automatically dropped when the R objects referring to them are garbage-collected. The basic design ideas for the package were described in apresentation at UseR 2007, but were not developed further because of lack of demand. The American Community Survey and some medical-record surveys such as the Nationwide Inpatient Sample do represent a real need, so the project has been restarted. MonetDB turns out to be much faster than SQLite for this sort of analysis, and interactive analysis of millions of records on an ordinary desktop is quite feasible.

The project summary page is here.