Currently groupBy v2 opens all spill files at the same time for merging. For queries with large result sets, this can lead to extreme numbers of open files. For cases like this we should do some kind of multi-level merging rather than opening them all at once.
Currently groupBy v2 opens all spill files at the same time for merging. For queries with large result sets, this can lead to extreme numbers of open files. For cases like this we should do some kind of multi-level merging rather than opening them all at once.