Skip to content

longer compatibility window for nested column format v4#14955

Merged
clintropolis merged 9 commits intoapache:masterfrom
clintropolis:longer-json-backwards-compatibility-window
Sep 12, 2023
Merged

longer compatibility window for nested column format v4#14955
clintropolis merged 9 commits intoapache:masterfrom
clintropolis:longer-json-backwards-compatibility-window

Conversation

@clintropolis
Copy link
Copy Markdown
Member

@clintropolis clintropolis commented Sep 8, 2023

Description

Follow up to #14456, which in retrospect was a bit overly aggressive in removing the ability to serialize v4 of the nested column format in favor of always using the latest and greatest. This PR adds back the serializers, as well as introduces a system config to allow for more flexibility in what versions can be upgraded from.

changes:

  • add back nested column v4 serializers
  • 'json' schema by default still uses the newer 'nested common format' used by 'auto', but now has an optional 'formatVersion' property which can be specified to override format versions on native ingest jobs
  • add system config to specify default column format stuff, 'druid.indexing.formats', and property 'druid.indexing.formats.nestedColumnFormatVersion' to specify system level preferred nested column format for friendly rolling upgrades from versions which do not support the newer 'nested common format' used by 'auto'

I named DefaultColumnFormatConfig a bit generically because I was imagining putting other stuff on here in the future, like perhaps a way to control the default MultiValueHandling for classic multi-value string columns, or any other column format related stuff we might want to define defaults for that isn't really appropriate to define on the IndexSpec (which I still also intend to make a system level default for in the near future). I did consider adding this option to the IndexSpec instead, but it made sense to me to split them up since I didn't really want to wire the rest of the IndexSpec options up to things yet on a per column basis (that would be pretty cool though).

Release note

Add system level runtime.properties option to specify default column format stuff, with prefix 'druid.indexing.formats', and property 'druid.indexing.formats.nestedColumnFormatVersion' to specify preferred nested column format for friendly rolling upgrades from Druid 25 to Druid 28.


This PR has:

  • been self-reviewed.
  • added documentation for new or modified features or behaviors.
  • a release note entry in the PR description.
  • added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links.
  • added or updated version, license, or notice information in licenses.yaml
  • added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
  • added unit tests or modified existing tests to cover new code paths, ensuring the threshold for code coverage is met.
  • added integration tests.
  • been tested in a test Druid cluster.

changes:
* add back nested column v4 serializers
* 'json' schema by default still uses the newer 'nested common format' used by 'auto', but now has an optional 'formatVersion' property which can be specified to override format versions on native ingest jobs
* add system config to specify default column format stuff, 'druid.indexing.formats', and property 'druid.indexing.formats.nestedColumnFormatVersion' to specify system level preferred nested column format for friendly rolling upgrades from versions which do not support the newer 'nested common format' used by 'auto'
}
}

private static class OnlyPositionalReadsTypeStrategy<T> implements TypeStrategy<T>

Check notice

Code scanning / CodeQL

Unused classes and interfaces

Unused class: OnlyPositionalReadsTypeStrategy is not referenced within this codebase. If not used as an external API it should be removed.
import javax.annotation.Nullable;
import java.util.Objects;

public class NestedDataColumnSchema extends DimensionSchema
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add some javadocs here? This is where things kinda fork off between v4 and auto.

@pranavbhole
Copy link
Copy Markdown
Contributor

need to fix the func tests.
Run 4: DumpSegmentTest.testDumpNestedColumnPath:196->createSegments:250 » ValueInstantiation Cannot construct instance oforg.apache.druid.segment.NestedDataColumnSchema, problem: No injectable id with value 'org.apache.druid.segment.DefaultColumnFormatConfig' found (for property '')

if (formatVersion < 4 || formatVersion > 5) {
throw DruidException.forPersona(DruidException.Persona.USER)
.ofCategory(DruidException.Category.INVALID_INPUT)
.build("Unsupported nested column format version[%s]", formatVersion);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit - can you add in the error message that supported values are 4 and 5?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think im going to skip this for now to not churn through CI again, and also seems like we should find some programmatic way to list what versions are supported so the error messaging doesn't become stale if we add another version. Also the versions are still kind of confusing so idk how I feel about exposing them too much, 4 makes sense, but 5 is a virtual version since 5 is really version 0 of the new nested common format introduced by 'auto'.

@clintropolis clintropolis merged commit 891f0a3 into apache:master Sep 12, 2023
@clintropolis clintropolis deleted the longer-json-backwards-compatibility-window branch September 12, 2023 21:07
@LakshSingla LakshSingla added this to the 28.0 milestone Oct 12, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants