We've currently gone to 2E9 rows (the 32bit index limit) with 9 columns (100GB). See benchmarks page on wiki.
Ideally it would be great to compare all available tools that are either specifically developed for large in-memory data manipulation or are capable of handling data at these sizes much better than base. Of course base-R should also be included, typically as control.
Aspect of benchmarking should be to highlight not just run time (speed), but also memory usage. The sorting/ordering by reference, sub-assignment by reference etc.. features, for example, at this data size should display quite clearly on speed and memory gains attainable.
We've currently gone to 2E9 rows (the 32bit index limit) with 9 columns (100GB). See benchmarks page on wiki.
Ideally it would be great to compare all available tools that are either specifically developed for large in-memory data manipulation or are capable of handling data at these sizes much better than base. Of course base-R should also be included, typically as control.
Aspect of benchmarking should be to highlight not just run time (speed), but also memory usage. The sorting/ordering by reference, sub-assignment by reference etc.. features, for example, at this data size should display quite clearly on speed and memory gains attainable.