longer compatibility window for nested column format v4#14955
Conversation
changes: * add back nested column v4 serializers * 'json' schema by default still uses the newer 'nested common format' used by 'auto', but now has an optional 'formatVersion' property which can be specified to override format versions on native ingest jobs * add system config to specify default column format stuff, 'druid.indexing.formats', and property 'druid.indexing.formats.nestedColumnFormatVersion' to specify system level preferred nested column format for friendly rolling upgrades from versions which do not support the newer 'nested common format' used by 'auto'
| } | ||
| } | ||
|
|
||
| private static class OnlyPositionalReadsTypeStrategy<T> implements TypeStrategy<T> |
Check notice
Code scanning / CodeQL
Unused classes and interfaces
| import javax.annotation.Nullable; | ||
| import java.util.Objects; | ||
|
|
||
| public class NestedDataColumnSchema extends DimensionSchema |
There was a problem hiding this comment.
can you add some javadocs here? This is where things kinda fork off between v4 and auto.
|
need to fix the func tests. |
| if (formatVersion < 4 || formatVersion > 5) { | ||
| throw DruidException.forPersona(DruidException.Persona.USER) | ||
| .ofCategory(DruidException.Category.INVALID_INPUT) | ||
| .build("Unsupported nested column format version[%s]", formatVersion); |
There was a problem hiding this comment.
Nit - can you add in the error message that supported values are 4 and 5?
There was a problem hiding this comment.
i think im going to skip this for now to not churn through CI again, and also seems like we should find some programmatic way to list what versions are supported so the error messaging doesn't become stale if we add another version. Also the versions are still kind of confusing so idk how I feel about exposing them too much, 4 makes sense, but 5 is a virtual version since 5 is really version 0 of the new nested common format introduced by 'auto'.
Description
Follow up to #14456, which in retrospect was a bit overly aggressive in removing the ability to serialize v4 of the nested column format in favor of always using the latest and greatest. This PR adds back the serializers, as well as introduces a system config to allow for more flexibility in what versions can be upgraded from.
changes:
I named
DefaultColumnFormatConfiga bit generically because I was imagining putting other stuff on here in the future, like perhaps a way to control the defaultMultiValueHandlingfor classic multi-value string columns, or any other column format related stuff we might want to define defaults for that isn't really appropriate to define on the IndexSpec (which I still also intend to make a system level default for in the near future). I did consider adding this option to theIndexSpecinstead, but it made sense to me to split them up since I didn't really want to wire the rest of theIndexSpecoptions up to things yet on a per column basis (that would be pretty cool though).Release note
Add system level runtime.properties option to specify default column format stuff, with prefix 'druid.indexing.formats', and property 'druid.indexing.formats.nestedColumnFormatVersion' to specify preferred nested column format for friendly rolling upgrades from Druid 25 to Druid 28.
This PR has: