DATA MINING
Desktop Survival Guide by Graham Williams |
|||||
Memory Management |
Large datasets often present challenges for R on memory limited machines. While you may be able to load a large dataset, processing it and modelling may lead to an error indicating the memory could not be allocated.
To maximise R's capabilities on large datasets, be sure to run a 64bit operating system on a 64 bit platform (e.g., Debian GNU/Linux) on 64 bit hardware (e.g., AMD64) with plenty of RAM (e.g., 16GB). Such capable machines are now quite affordable.
Selecting and subsetting datasets off a database (e.g., through the RODBC package) or through other means (e.g., using Python) will generally be faster.