Skip to content

Add a StrictTypeIdResolver which would throw an exception if type is not in the predefined json subtype list. Use it in DimensionSchema.#18565

Merged
cecemei merged 8 commits intoapache:masterfrom
cecemei:serde
Oct 7, 2025

Conversation

@cecemei
Copy link
Copy Markdown
Contributor

@cecemei cecemei commented Sep 23, 2025

Description

Add a StrictTypeIdResolver which would throw an exception if type is not in the predefined json subtype list. Use it in DimensionSchema.


Key changed/added classes in this PR
  • StrictTypeIdResolver
  • DimensionSchema
  • DimensionSchemaTest

Release note:

At ingestion time, dimension schemas are now strictly validated against allowed types. Previously an invalid type would fall back to string dimension, now such values are rejected. Users must specify type as one of the allowed types. Omitting type still defaults to string, preserving backward compatibility.


This PR has:

  • been self-reviewed.
  • added documentation for new or modified features or behaviors.
  • a release note entry in the PR description.
  • added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links.
  • added or updated version, license, or notice information in licenses.yaml
  • added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
  • added unit tests or modified existing tests to cover new code paths, ensuring the threshold for code coverage is met.
  • added integration tests.
  • been tested in a test Druid cluster.

@cecemei cecemei changed the title strict-type-id Add a StrictTypeIdResolver which would throw an exception if type is not in the predefined json subtype list. Use it in DimensionSchema. Sep 24, 2025
@cecemei cecemei marked this pull request as ready for review September 24, 2025 16:05
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds strict type validation for JSON deserialization of DimensionSchema by introducing a custom StrictTypeIdResolver that throws exceptions for unknown types while preserving default behavior for missing types.

  • Implemented StrictTypeIdResolver to enforce strict validation of JSON type discriminators
  • Modified DimensionSchema to use the new strict type resolver instead of default Jackson behavior
  • Added comprehensive tests for both invalid type handling and default type fallback scenarios

Reviewed Changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.

File Description
StrictTypeIdResolver.java New custom type resolver that validates type IDs against registered subtypes
DimensionSchema.java Updated to use StrictTypeIdResolver with enhanced documentation
DimensionSchemaTest.java Added tests for strict type validation and default type behavior
StringDimensionSchemaTest.java Removed explicit type from test JSON to verify default behavior

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

Comment on lines +41 to +42
private Map<String, Class<?>> typeMap;
private JavaType baseType;
Copy link

Copilot AI Sep 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The typeMap field is not thread-safe and could cause issues in concurrent environments. Consider using ConcurrentHashMap or making the field volatile with proper synchronization in ensureTypeMap().

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmmm, do we actually need to look into thread safety requirements here? I'm unfamiliar with this exact area but this comment plus a quick query of an ai agent does make me wonder if thread safety is necessary.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually rewrite the pr to use a builder class, PTAL!

Comment thread processing/src/main/java/org/apache/druid/jackson/StrictTypeIdResolver.java Outdated
capistrant
capistrant previously approved these changes Sep 25, 2025
Copy link
Copy Markdown
Contributor

@capistrant capistrant left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

one nit for you to consider if you you want to change

requested copilot review as well. I don't think either is legit blocker

Comment thread processing/src/main/java/org/apache/druid/jackson/StrictTypeIdResolver.java Outdated
@capistrant capistrant self-requested a review September 25, 2025 18:08
@capistrant capistrant dismissed their stale review September 25, 2025 18:11

I want us to double check on any thread safety requirements with the new custom @JsonTypeIdResolver

@abhishekrb19
Copy link
Copy Markdown
Contributor

Would this break existing specs that were persisted with an invalid type specified in the dimension schema? Or would those specs already default to string, so it wouldn’t be an issue? (I'm not sure when the jackson default will apply)

@cecemei cecemei requested a review from Copilot October 3, 2025 21:47
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.


Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.


StrictTypeIdResolver()
{
// Required default constructor for Jackson, the instance is never used
Copy link

Copilot AI Oct 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment states 'the instance is never used' but this may be misleading. Jackson may instantiate this constructor for reflection purposes. Consider clarifying that this constructor creates an unusable instance that should not be used for actual resolution.

Suggested change
// Required default constructor for Jackson, the instance is never used
// Required default constructor for Jackson's reflective instantiation.
// This creates an unusable instance that should not be used for actual resolution.

Copilot uses AI. Check for mistakes.
@cecemei
Copy link
Copy Markdown
Contributor Author

cecemei commented Oct 3, 2025

Would this break existing specs that were persisted with an invalid type specified in the dimension schema? Or would those specs already default to string, so it wouldn’t be an issue? (I'm not sure when the jackson default will apply)

If the type is not set, it'd default to string, but if it's set a value that's not specified in JsonSubTypes, it'd throw an error, but we can add those to map to string too. Is there any existing specs that uses invalid types?

Copy link
Copy Markdown
Contributor

@capistrant capistrant left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cecemei this is all just ingestion spec related, correct? So if users had been submitting specs in the past with invalid types, it would quietly default to string. But with your new behavior they will get an error? So to a user, this appears to be a breaking change, but really we are just fixing undesirable behavior? I think a release note section in the PR description should be created to specify how exactly the behavior is changing and what a user should do about it to fix (point them to the valid types?).

…nsionSchemaTest.java

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
@cecemei
Copy link
Copy Markdown
Contributor Author

cecemei commented Oct 6, 2025

@cecemei this is all just ingestion spec related, correct? So if users had been submitting specs in the past with invalid types, it would quietly default to string. But with your new behavior they will get an error? So to a user, this appears to be a breaking change, but really we are just fixing undesirable behavior? I think a release note section in the PR description should be created to specify how exactly the behavior is changing and what a user should do about it to fix (point them to the valid types?).

True, added a release entry.

@cecemei cecemei merged commit 94a1045 into apache:master Oct 7, 2025
109 of 110 checks passed
@cecemei cecemei added this to the 35.0.0 milestone Oct 21, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants