Fix node discovery to ignore unknown DruidServices by jihoonson · Pull Request #12157 · apache/druid

jihoonson · 2022-01-15T03:06:06Z

Description

This PR fixes a bug that DiscoveryDruidNode cannot be created while deserializing it from a JSON when there is an unknown DruidService. This bug can be seen when you do rolling update. Suppose you have a custom MyDruidService that is added in a new version, v2. While you are upgrading your cluster from v1 to v2, other nodes who watch the node that is upgraded first and announces MyDruidService will not understand what MyDruidService is until they are upgraded to v2. In this case, this bug currently can fail node discovery. This PR fixes the bug by ignoring unknown DruidServices during deserialization of DiscoveryDruidNode.

Key changed/added classes in this PR

DiscoveryDruidNode

This PR has:

been self-reviewed.
added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links.
added unit tests or modified existing tests to cover new code paths, ensuring the threshold for code coverage is met.
been tested in a test Druid cluster.

clintropolis

👍 after CI

cryptoe

LGTM!!

…druid-service

FrankChen021 · 2022-01-16T06:13:43Z

 public class DiscoveryDruidNode
 {
+  private static final Logger LOG = new Logger(DiscoveryDruidNode.class);
+  private static final TypeReference<Map<String, Object>> RAW_DRUID_SERVICE_TYPE =


I don't see any reference to this static variable, do I miss something?

Good catch! It is a left over that I forgot to delete after I cleaned up my PR. I will delete it.

FrankChen021 · 2022-01-16T06:13:47Z

+  )
+  {
+    Map<String, DruidService> services = new HashMap<>();
+    if (rawServices != null && !rawServices.isEmpty()) {


nit: can use org.apache.druid.utils.CollectionUtils.isNullOrEmpty to check the rawServices

Oh, that method cannot be used as Map is not a Collection.

Oh, right. Maybe we can add some overridden methods to this util class in the future.

asdf2014

👍 Overall LGTM. It seems that after the previous PR was merged, the forbiddenapis plugin checks fine now, but there are some other errors reported during the CI process that don't seem to be related to the forbiddenapis plugin check, as follows:

Max number of retries[240] exceeded for Task[waiting for SQL metadata refresh]
Background operation retry gave up ConnectionLossException: KeeperErrorCode = ConnectionLoss
AssertionError: lists don't have the same size expected [2] but found [0]
RuntimeException: org.apache.druid.java.util.common.ISE: one or more queries failed

jihoonson · 2022-01-17T06:02:33Z

@clintropolis, @cryptoe, @FrankChen021, @asdf2014 thank you for the review. The change in this PR revealed a bug in DataNodeService that it has duplicate "type" properties in its serialized JSON, one for the subtype key of DruidService and another for serverType in DataNodeService. It seems that things happened to be working before this change as the subtype key is always written first in the serialized JSON and Jackson always picks up the first "type" property as the subtype key for deserialization. However, this doesn't work anymore because DiscoveryDruidNode now deserializes the JSON to a Map first and then converts it to a DruidService to ignore unknown service types. Since the map does not allow duplicate keys, the second "type" overwrites the first "type" during deserialization which caused integration test failures. This bug is a bit tricky to fix since we cannot just rename the "type" property of "DataNodeService". It will break rolling upgrade if we do that. To not break the rolling upgrade, we should ensure that the serialized JSON of DataNodeService has both duplicate "type" properties, but the "type" for the subtype key always appears first in the JSON. To do this, I added a custom serializer (DruidServiceSerializer) and a custom deserialization logic in DiscoveryDruidNode that uses StringObjectPairList as an intermediary form during deserialization. DruidServiceSerializer might be not required as Jackson seems to always write the subtype key first during serialization, but I added it anyway just in case where it behaves differently in some edge case. Please see the Javadoc of DataNodeService for more details and let me know what you think.

FrankChen021 · 2022-01-17T06:32:03Z

@jihoonson I see. The JsonTypeInfo annotation defined on the DruidService class tells the jackson to add a type property automatically during serialization. And this type property conflicts with the type field defined in DataNodeService.

I don't have a better suggestion considering the backward compatibility.

I'm wondering if there's any way that we can check or avoid such conflict.
How about we can add constraint on the property name of JsonTypeInfo annotation, for example it should be _type? Such name won't conflict with user defined field name because this naming style is not allowed in users' code.

jihoonson · 2022-01-17T06:47:14Z

Yeah, I think it makes sense to use a special name as the subtype key such as "_type" or "@type" in the future. We cannot easily modify existing ones unfortunately though since it will break rolling upgrade due to the same reason as why I could not just get rid of the "type" from DataNodeService.

jihoonson · 2022-01-19T01:26:56Z

@clintropolis thank you for the review. @cryptoe @FrankChen021 @asdf2014 do you have more comments?

jihoonson · 2022-01-19T06:08:55Z

@FrankChen021 thank you!

Fix node discovery to ignore unknown DruidServices

178781b

jihoonson added the Bug label Jan 15, 2022

ignore all runtime exceptions

78ebd0f

clintropolis approved these changes Jan 15, 2022

View reviewed changes

fix test

891201a

cryptoe approved these changes Jan 15, 2022

View reviewed changes

jihoonson added 2 commits January 14, 2022 21:38

Merge remote-tracking branch 'origin/master' into ignore-unparseable-…

81d08af

…druid-service

Merge remote-tracking branch 'origin/master' into ignore-unparseable-…

aba606c

…druid-service

FrankChen021 reviewed Jan 16, 2022

View reviewed changes

asdf2014 reviewed Jan 16, 2022

View reviewed changes

jihoonson added 2 commits January 16, 2022 17:52

add custom deserializer

67dcd8d

custom serializer

5a7719f

log host for unparseable druidService

f60cca7

clintropolis approved these changes Jan 18, 2022

View reviewed changes

FrankChen021 approved these changes Jan 19, 2022

View reviewed changes

jihoonson merged commit cc2ffc6 into apache:master Jan 19, 2022

abhishekagarwal87 added this to the 0.23.0 milestone May 11, 2022

clintropolis mentioned this pull request Jan 28, 2026

remove custom json serde for DataNodeService #18961

Merged

Conversation

jihoonson commented Jan 15, 2022

Description

Key changed/added classes in this PR

Uh oh!

clintropolis left a comment

Choose a reason for hiding this comment

Uh oh!

cryptoe left a comment

Choose a reason for hiding this comment

Uh oh!

FrankChen021 Jan 16, 2022

Choose a reason for hiding this comment

Uh oh!

jihoonson Jan 16, 2022

Choose a reason for hiding this comment

Uh oh!

FrankChen021 Jan 16, 2022

Choose a reason for hiding this comment

Uh oh!

jihoonson Jan 17, 2022

Choose a reason for hiding this comment

Uh oh!

FrankChen021 Jan 17, 2022

Choose a reason for hiding this comment

Uh oh!

asdf2014 left a comment

Choose a reason for hiding this comment

Uh oh!

jihoonson commented Jan 17, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

FrankChen021 commented Jan 17, 2022

Uh oh!

jihoonson commented Jan 17, 2022

Uh oh!

jihoonson commented Jan 19, 2022

Uh oh!

jihoonson commented Jan 19, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

jihoonson commented Jan 17, 2022 •

edited

Loading