Skip to content

start/stop of individual clustered realtime nodes causes data loss #1202

@pdeva

Description

@pdeva

I have 4 realtime nodes. Each on a separate kafka consumer group.
I am using druid 0.6.160

2 of those nodes (1 &2) were running from before.
I started 3 last night.
4 was started today at 14:06 PST (timezone of machines are all in utc).

After starting 4, I shutdown nodes 1 & 2. (seeing cpu usage it seems node 2 was serving all requests).

Note that at all times, atleast 2 realtime nodes were active, so there should be no data loss whatsoever.

Now when I query for last 30 or 60 mins of data, sometimes I see the whole data, but sometimes I see missing gap between 14:00 and 14:08.

Here is a video showing the issue:
http://screencast.com/t/cs0osqHJ2

Is this because the broker is sometimes querying node 4, which doesnt have all the data cause it was started at 14:06 (and then took a couple minutes to initialize)?

Either way, this looks like a bug.

Portions of my realtime spec file:

"config": {
      "maxRowsInMemory": 50000,
      "intermediatePersistPeriod": "PT10m"
    },
    "firehose": {
      "type": "kafka-0.8",
      "consumerProps": {
        "zookeeper.connect": "xx:2181,xx:2181,xx2181",
        "zookeeper.connection.timeout.ms": "15000",
        "zookeeper.session.timeout.ms": "15000",
        "zookeeper.synctime.ms": "5000",
        "group.id": "druid-dripstat",
        "fetch.size": "1048586",
        "auto.offset.reset": "largest",
        "auto.commit.enable": "false"
      },
      "feed": "dripstat",
      "parser": {
        "timestampSpec": {
          "column": "timestamp"
        },
        "data": {
          "format": "json"
        }
      }
    },
    "plumber": {
      "type": "realtime",
      "windowPeriod": "PT10m",
      "segmentGranularity": "hour",
      "basePersistDirectory": "/data/realtime/basePersist"
    }
  }

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions