Opening this issue was suggested by Druid people following Druid Users Google Group post at
https://groups.google.com/forum/#!topic/druid-user/DRwyOuFHuYM
BEWARE: since when uploading the zip file I have got the message "Unfortunately, we don’t support that file type. Choose Files Try again with a PNG, GIF, JPG, DOCX, PPTX, XLSX, TXT, or PDF.", I changed the file extension to .xlsx, but it is a .zip file.
BEWARE: I am using Druid 0.8.2 version.
Contents related to the issue follows:
It should be easy to reproduce the problem with the stuff I attach.
I hope it will be not heavy to use them.
This is the content of the unittest.zip file:
./unittest/runtime.properties
./unittest/common.runtime.properties
./unittest/realtime-run.sh
./unittest/kafka.soj.spec
./unittest/toload.template
./unittest/ut_inject.sh
./unittest/crontab.settings
./unittest/config.yaml.9099
The ut_inject.sh script is a simple shell script that, using the toload.template file, produces three files:
the first one in which TIMESTAMP is substituted by the current time
the second one in which TIMESTAMP is substituted by the current time minus five minutes
the third one in which TIMESTAMP is substituted by the current time minus ten minutes
then all the three files are feeded, by calling the kafkacat utility (https://github.com/edenhill/kafkacat) to a Kafka topic.
The process is repeated each five minutes, as can be seen from the crontab.settings file.
To use the above stuff you should/could :
edit the crontab.settings file to fix directories at your best convenience
edit the ut_inject.sh script to:
fix directories (SPLDIR and TMPDIR variables) at your best convenience
fix Kafka broker endpoint by modifying the variables KAFKA_BROKER_HOST and KAFKA_BROKER_PORT
fix Kafka topic name, if you want, by modifying the variable KAFKA_TOPIC
edit the realtime node Kafka firehose specification file (kafka.soj.spec) by modifying:
the "dataSource" value, if you want;
the consumer properties "zookeeper.connect" endpoints;
the consumer properties "group.id" consumer group identifier, if you want;
the firehose "feed" (if you changed the name of the Kafka topic);
the tuning config "basePersistDirectory" directory.
The realtime-run.sh script shows the way in which we started the Druid realtime node, whilst the runtime.properties file
contains its properties. Find attached also the properties common to all Druid nodes (common.runtime.properties).
The config.yaml.9099 file is the configuration file used by our installation of Pivot.
I modified the crontab file of my testing machine with the crontab.settings contents at about 13:00 PM CET.
After the execution of 13:50 I verified that the consumer offset lag became zero (see kafkaConsumerOffset utility snapshot attached),
then at 13:53:48 I killed the realtime node with:
kill -15 pid
At 13:54:24 I started again the realtime node (see the file realtime-run.sh for parameters).
Some time later, running Pivot, I can see again the loss of data (see Pivot snapshot attached).
Hoping this will be useful, best,
Marco


unittest.xlsx
Opening this issue was suggested by Druid people following Druid Users Google Group post at
https://groups.google.com/forum/#!topic/druid-user/DRwyOuFHuYM
BEWARE: since when uploading the zip file I have got the message "Unfortunately, we don’t support that file type. Choose Files Try again with a PNG, GIF, JPG, DOCX, PPTX, XLSX, TXT, or PDF.", I changed the file extension to .xlsx, but it is a .zip file.
BEWARE: I am using Druid 0.8.2 version.
Contents related to the issue follows:
It should be easy to reproduce the problem with the stuff I attach.
I hope it will be not heavy to use them.
This is the content of the unittest.zip file:
./unittest/runtime.properties
./unittest/common.runtime.properties
./unittest/realtime-run.sh
./unittest/kafka.soj.spec
./unittest/toload.template
./unittest/ut_inject.sh
./unittest/crontab.settings
./unittest/config.yaml.9099
The ut_inject.sh script is a simple shell script that, using the toload.template file, produces three files:
then all the three files are feeded, by calling the kafkacat utility (https://github.com/edenhill/kafkacat) to a Kafka topic.
The process is repeated each five minutes, as can be seen from the crontab.settings file.
To use the above stuff you should/could :
The realtime-run.sh script shows the way in which we started the Druid realtime node, whilst the runtime.properties file
contains its properties. Find attached also the properties common to all Druid nodes (common.runtime.properties).
The config.yaml.9099 file is the configuration file used by our installation of Pivot.
I modified the crontab file of my testing machine with the crontab.settings contents at about 13:00 PM CET.
After the execution of 13:50 I verified that the consumer offset lag became zero (see kafkaConsumerOffset utility snapshot attached),
then at 13:53:48 I killed the realtime node with:
kill -15 pid
At 13:54:24 I started again the realtime node (see the file realtime-run.sh for parameters).
Some time later, running Pivot, I can see again the loss of data (see Pivot snapshot attached).
Hoping this will be useful, best,
Marco
unittest.xlsx