Skip to content

Loss of data on realtime node reboot #2676

@msspadoni

Description

@msspadoni

Opening this issue was suggested by Druid people following Druid Users Google Group post at
https://groups.google.com/forum/#!topic/druid-user/DRwyOuFHuYM
BEWARE: since when uploading the zip file I have got the message "Unfortunately, we don’t support that file type. Choose Files Try again with a PNG, GIF, JPG, DOCX, PPTX, XLSX, TXT, or PDF.", I changed the file extension to .xlsx, but it is a .zip file.
BEWARE: I am using Druid 0.8.2 version.
Contents related to the issue follows:

It should be easy to reproduce the problem with the stuff I attach.
I hope it will be not heavy to use them.
This is the content of the unittest.zip file:

./unittest/runtime.properties
./unittest/common.runtime.properties
./unittest/realtime-run.sh
./unittest/kafka.soj.spec
./unittest/toload.template
./unittest/ut_inject.sh
./unittest/crontab.settings
./unittest/config.yaml.9099

The ut_inject.sh script is a simple shell script that, using the toload.template file, produces three files:

the first one in which TIMESTAMP is substituted by the current time
the second one in which TIMESTAMP is substituted by the current time minus five minutes
the third one in which TIMESTAMP is substituted by the current time minus ten minutes

then all the three files are feeded, by calling the kafkacat utility (https://github.com/edenhill/kafkacat) to a Kafka topic.
The process is repeated each five minutes, as can be seen from the crontab.settings file.

To use the above stuff you should/could :

edit the crontab.settings file to fix directories at your best convenience
edit the ut_inject.sh script to:
    fix directories (SPLDIR and TMPDIR variables) at your best convenience
    fix Kafka broker endpoint by modifying the variables KAFKA_BROKER_HOST and KAFKA_BROKER_PORT
    fix Kafka topic name, if you want, by modifying the variable KAFKA_TOPIC
edit the realtime node Kafka firehose specification file (kafka.soj.spec) by modifying:
    the "dataSource" value, if you want;
    the consumer properties "zookeeper.connect" endpoints;
    the consumer properties "group.id" consumer group identifier, if you want;
    the firehose "feed" (if you changed the name of the Kafka topic);
    the tuning config "basePersistDirectory" directory.

The realtime-run.sh script shows the way in which we started the Druid realtime node, whilst the runtime.properties file
contains its properties. Find attached also the properties common to all Druid nodes (common.runtime.properties).

The config.yaml.9099 file is the configuration file used by our installation of Pivot.

I modified the crontab file of my testing machine with the crontab.settings contents at about 13:00 PM CET.
After the execution of 13:50 I verified that the consumer offset lag became zero (see kafkaConsumerOffset utility snapshot attached),
then at 13:53:48 I killed the realtime node with:

kill -15 pid

At 13:54:24 I started again the realtime node (see the file realtime-run.sh for parameters).

Some time later, running Pivot, I can see again the loss of data (see Pivot snapshot attached).
Hoping this will be useful, best,
Marco

kafkaconsumeroffset
pivot-2016-02-24_144739
unittest.xlsx

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions