If you give the indexer dimensions in some arbitrary order, the IncrementalIndex will sort its rows based on the values of dimensions in that order. Then, when the IndexMerger merges multiple indexes, it uses a MergeIterable that assumes rows are sorted on values of dimensions in lexicographic order instead of the order the underlying segments actually use. This causes the rowboats to come in out of order and not get combined properly, and the resulting segments have more rows than they should.
Workarounds include: Supply dimensions in lexicographic order in the indexer config. Avoid indexing with dimensionless schemas.
I can think of two possible fixes:
- IndexMerger could try to use arbitrary dimension orders from the underlying segment when possible. I think this is not possible in general while both getting correct rollup and avoiding re-sorting rows, since the underlying segments could use different row comparators. But it will be possible in many situations. The IndexMerger can also decide to sort rows in situations where it would need to.
- IncrementalIndex could sort rows based on dimensions in lexicographic order instead of arbitrary order. This makes the IndexMerger's job easier but it means we can't control the row comparator, which could be useful if we ever decide to mess with it in order to do something like optimize size of the inverted indices.
If you give the indexer dimensions in some arbitrary order, the IncrementalIndex will sort its rows based on the values of dimensions in that order. Then, when the IndexMerger merges multiple indexes, it uses a MergeIterable that assumes rows are sorted on values of dimensions in lexicographic order instead of the order the underlying segments actually use. This causes the rowboats to come in out of order and not get combined properly, and the resulting segments have more rows than they should.
Workarounds include: Supply dimensions in lexicographic order in the indexer config. Avoid indexing with dimensionless schemas.
I can think of two possible fixes: