fix bug in nested v4 format merger from refactoring#14053
fix bug in nested v4 format merger from refactoring#14053clintropolis merged 3 commits intoapache:masterfrom
Conversation
| if (!(format instanceof NestedCommonFormatColumn.Format) | ||
| && !(format instanceof NestedDataComplexTypeSerde.NestedColumnFormatV4)) { |
There was a problem hiding this comment.
nit: !(format instanceof NestedCommonFormatColumn.Format || format instanceof NestedDataComplexTypeSerde.NestedColumnFormatV4)
is perhaps a bit more easier to understand the intentions of for this.
| ) | ||
| ) | ||
| ); | ||
| // still merge it since that follows the normal path of persist then merge |
There was a problem hiding this comment.
nit: "still merge" implies that this comment is referring to a change. A new reader is not going to know what that change is... It looks like you are exercising the behavior of what happens when it reads back over the segment to persist a new one? Perhaps changing this to
Do a merge, which will do yet another persist and load again to validate that the behavior of writing and then
reading still does good things
This is gonna have performance implications for test run times too, I fear. But, if we only ever do this once for each data set that we are indexing, it shouldn't be good expensive...
There was a problem hiding this comment.
This is gonna have performance implications for test run times too, I fear. But, if we only ever do this once for each data set that we are indexing, it shouldn't be good expensive...
yeah, I was concerned about that, but it looks like it hasn't made the (already terrible) processing tests times any worse, so I think its worth it because of the extra coverage of ensuring both indexable adapters are flexed when building test data segments (and more closely matches current ingest task behavior)
Description
Fixes a regression when ingesting 'v4' nested format columns caused by shuffling around some stuff when refactoring during review of #14014. I realized that I forgot to switch some of the tests back to using the v4 format, so Ive swapped the 'tsv' format tests to go back to using 'json' instead of 'auto' to ingest the test data (there are no arrays in that data so there is no difference in functional behavior between v4 and the new common format).
This PR has: