Extend 2 billion row benchmarks e.g. memory usage, sorting, joining, by-reference

We've currently gone to 2E9 rows (the 32bit index limit) with 9 columns (100GB).  See benchmarks page on wiki.

Ideally it would be great to compare **all** available tools that are either specifically developed for large in-memory data manipulation or are capable of handling data at these sizes much better than base. Of course base-R should also be included, typically as control.

Aspect of benchmarking should be to highlight not just _run time_ (speed), but also _memory usage_. The sorting/ordering by reference, _sub-assignment_ by reference etc.. features, for example, at this data size should display quite clearly on speed and memory gains attainable.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Extend 2 billion row benchmarks e.g. memory usage, sorting, joining, by-reference #2

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Extend 2 billion row benchmarks e.g. memory usage, sorting, joining, by-reference #2

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions