I have 4 realtime nodes. Each on a separate kafka consumer group.
I am using druid 0.6.160
2 of those nodes (1 &2) were running from before.
I started 3 last night.
4 was started today at 14:06 PST (timezone of machines are all in utc).
After starting 4, I shutdown nodes 1 & 2. (seeing cpu usage it seems node 2 was serving all requests).
Note that at all times, atleast 2 realtime nodes were active, so there should be no data loss whatsoever.
Now when I query for last 30 or 60 mins of data, sometimes I see the whole data, but sometimes I see missing gap between 14:00 and 14:08.
Here is a video showing the issue:
http://screencast.com/t/cs0osqHJ2
Is this because the broker is sometimes querying node 4, which doesnt have all the data cause it was started at 14:06 (and then took a couple minutes to initialize)?
Either way, this looks like a bug.
Portions of my realtime spec file:
"config": {
"maxRowsInMemory": 50000,
"intermediatePersistPeriod": "PT10m"
},
"firehose": {
"type": "kafka-0.8",
"consumerProps": {
"zookeeper.connect": "xx:2181,xx:2181,xx2181",
"zookeeper.connection.timeout.ms": "15000",
"zookeeper.session.timeout.ms": "15000",
"zookeeper.synctime.ms": "5000",
"group.id": "druid-dripstat",
"fetch.size": "1048586",
"auto.offset.reset": "largest",
"auto.commit.enable": "false"
},
"feed": "dripstat",
"parser": {
"timestampSpec": {
"column": "timestamp"
},
"data": {
"format": "json"
}
}
},
"plumber": {
"type": "realtime",
"windowPeriod": "PT10m",
"segmentGranularity": "hour",
"basePersistDirectory": "/data/realtime/basePersist"
}
}
I have 4 realtime nodes. Each on a separate kafka consumer group.
I am using druid 0.6.160
2 of those nodes (1 &2) were running from before.
I started 3 last night.
4 was started today at 14:06 PST (timezone of machines are all in utc).
After starting 4, I shutdown nodes 1 & 2. (seeing cpu usage it seems node 2 was serving all requests).
Note that at all times, atleast 2 realtime nodes were active, so there should be no data loss whatsoever.
Now when I query for last 30 or 60 mins of data, sometimes I see the whole data, but sometimes I see missing gap between 14:00 and 14:08.
Here is a video showing the issue:
http://screencast.com/t/cs0osqHJ2
Is this because the broker is sometimes querying node 4, which doesnt have all the data cause it was started at 14:06 (and then took a couple minutes to initialize)?
Either way, this looks like a bug.
Portions of my realtime spec file: