Hadoop indexing reducers sort their data by timestamp, which is probably somewhat helpful but less helpful than it could be in terms of minimizing the amount of data spilled to disk. For not-already-rolled-up data, we'd spill less and merge faster if we sort by the entire rollup key (truncated timestamp + dimensions).
Hadoop indexing reducers sort their data by timestamp, which is probably somewhat helpful but less helpful than it could be in terms of minimizing the amount of data spilled to disk. For not-already-rolled-up data, we'd spill less and merge faster if we sort by the entire rollup key (truncated timestamp + dimensions).