Skip to content

remove serialization of DruidServer#2665

Merged
xvrl merged 1 commit intoapache:masterfrom
jisookim0513:remove-druid-server-serialization
Mar 25, 2016
Merged

remove serialization of DruidServer#2665
xvrl merged 1 commit intoapache:masterfrom
jisookim0513:remove-druid-server-serialization

Conversation

@jisookim0513
Copy link
Copy Markdown
Contributor

Under the current implementation, when DruidServer can be serialized, and when it is, the serialized object contains every field except for name, dataSources, and metadata. This can be confusing and misleading since deserialized DruidServer has information of all the segments it has but no information of dataSource. In this case, adding a data segment to the server for adding dataSource will fail since it already has the segment on segments map. I am not sure why segments gets serialized, and since serialized object doesn't seem to be used anywhere (only DruidServerMetadata is serialized and used), I suggest getting rid of serialization of DruidServer to avoid further confusion.

I originally tried serializing and deserializing DruidServer in order to construct my own TimelineServerView based on the timeline I got from BrokerServerView, and this custom TimelineServerView couldn't return the correct timeline because DruidServer's dataSources was null while it had segments populated. In addition, the server didn't have name (name == null) and there was no method to set name.

The correct way of reconstructing DruidServer was to serialize and deserialize DruidServer's DruidServerMetadata and use that to construct a new DruidServer and populate it by calling addDataSegment for all data segments. By doing this, I was able to get the DruidServer with all fields being correct and the correct VersionedIntervalTimeline.

@xvrl
Copy link
Copy Markdown
Member

xvrl commented Mar 15, 2016

👍

@gianm
Copy link
Copy Markdown
Contributor

gianm commented Mar 15, 2016

Is the serialization used by any of the coordinator or broker jersey informational resources?

@jisookim0513
Copy link
Copy Markdown
Contributor Author

I don't think so. ContainerCache of CuratorInventoryManager uses deserialization of DruidServer when CHILD_ADDED or CHILD_UPDATED happens in its childrenCache, which watches containerPath. CuratorInventoryManager is created in ServerInventoryView with containerPath to be announcementsPath. However, it seems like the only class that writes to the path is DataSegmentAnnouncer, and in AbstractDataSegmentAnnouncer, the data that is being deserialized in CuratorInventoryManager is serialized from DruidServerMetadata, not DruidServer.

from AbstractDataSegmentAnnouncer:
private final DruidServerMetadata server;
from AbstractDataSegmentAnnouncer.start():
announcer.announce(path, jsonMapper.writeValueAsBytes(server), false);

Please correct me if I am wrong or missing something.

@jisookim0513
Copy link
Copy Markdown
Contributor Author

Actually I found a usage of it on ServersResource, as it returns server information from getServer and getServerSegment. So I guess the serialization of segments is fro getServerSegment. What is the purpose of this? And if this is needed, wouldn't it be better to have more complete serialization for DruidServer?

@guobingkun
Copy link
Copy Markdown
Contributor

FWIW, DruidServerMetadata should be the source of truth of a Druid node's metadata, if you want to get basic metadata of a Druid node, use DruidServerMetadata. DruidServer actually just means a Druid node that can load segments, i.e. Historical or Realtime. It is constructed from serialized DruidServerMetadata, but it stores more information about the segments it is serving. So technically it should be a subclass of DruidServerMetadata (I am working on refactoring this in #2242).

If Druid never uses serialized DruidServer, I am 👍

@guobingkun
Copy link
Copy Markdown
Contributor

It's also fine to leave them as is. I am fine with either.

@xvrl
Copy link
Copy Markdown
Member

xvrl commented Mar 16, 2016

@guobingkun @jisookim0513 what this tells me is that
a) we need tests for the getServer and getServerSegments methods in ServersResource
b) ServersResource can probably be updated to not rely on serialization of DruidServer (it already does it custom serialization in some places anyways)

I think it's still useful to remove the serialization of DruidServer just because it is confusing and also handles a lot of logic for the server views.

@guobingkun I am not sure about making DruidServer a subclass of DruidServerMetadata unless there really is a need to. If I remember correctly, the main reason we split off the metadata in the first place was to reduce heap usage on the coordinator.

@jisookim0513 jisookim0513 force-pushed the remove-druid-server-serialization branch from a1f0588 to 4ce536a Compare March 17, 2016 22:01
@jisookim0513
Copy link
Copy Markdown
Contributor Author

@xvrl added a custom serialization for full DruidServer in ServersResource. Here is an example of full DruidServer that contains segments information:

{"host":"localhost:8083","tier":"_default_tier","type":"historical","priority":0,"currSize":18722,"maxSize":10000000000,"segments":{"wikipedia_2016-03-17T21:15:00.000Z_2016-03-17T21:20:00.000Z_2016-03-17T21:15:00.000Z":{"dataSource":"wikipedia","interval":"2016-03-17T21:15:00.000Z/2016-03-17T21:20:00.000Z","version":"2016-03-17T21:15:00.000Z","loadSpec":{"type":"local","path":"/tmp/druid/localStorage/wikipedia/2016-03-17T21:15:00.000Z_2016-03-17T21:20:00.000Z/2016-03-17T21:15:00.000Z/0/index.zip"},"dimensions":"page,language,user,unpatrolled,newPage,robot,anonymous,namespace,continent,country,region,city","metrics":"count,added,deleted,delta","shardSpec":{"type":"none"},"binaryVersion":9,"size":18722,"identifier":"wikipedia_2016-03-17T21:15:00.000Z_2016-03-17T21:20:00.000Z_2016-03-17T21:15:00.000Z"}}}

@jisookim0513
Copy link
Copy Markdown
Contributor Author

@xvrl @gianm I added tests for ServersResource, mainly for serialization of DruidServer. I created the test based on the previous version (with a serialization of DruidServer) so the customized serialized outputs of getServer() and getClusterServers() matches with a serialization of DruidServer which is now removed.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this can be Map<String, Object>

@xvrl
Copy link
Copy Markdown
Member

xvrl commented Mar 25, 2016

👍 once you squash your commits

@jisookim0513 jisookim0513 force-pushed the remove-druid-server-serialization branch from bd3d14a to 0d3c5a3 Compare March 25, 2016 19:27
@jisookim0513
Copy link
Copy Markdown
Contributor Author

@xvrl squashed

@gianm
Copy link
Copy Markdown
Contributor

gianm commented Mar 25, 2016

👍

@xvrl xvrl merged commit 01f3221 into apache:master Mar 25, 2016
@xvrl xvrl added this to the 0.9.1 milestone Apr 12, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants