-
Notifications
You must be signed in to change notification settings - Fork 3.8k
Description
Hi all,
I'm using Kafka Indexing Service to ingest data, and I noticed that in some cases I'm getting rows with combinations that do not make sense. More specifically - I have a dimension of 'video playback type' which can be live / vod, and I also have a multi value dimension 'position' which holds timestamps for live playback only.
I verified multiple times that I'm always sending an empty array ([]) for position when playback type is vod, and yet, if I filter for playback type = vod and split by live position, I am getting some values.
If I click one of these rows and select 'view raw data' in Swiv, it also shows the problem - it gives me a row with playbackType=vod & some value for position. Grepping for the original row ingested to Druid, shows that it did not have any position set.
Most of the time, I'm using the segments generated by KIS, only merging the different shards into one. However, in some rare cases, when there is some problem, I'm reindexing the events using Hadoop. I found that segments created by KIS show this problem, while segments created by Hadoop do not - filtering by vod in those timestamps and splitting by position returns empty, as expected.
I was wondering whether anyone bumped into such problem. I'm using Druid 0.10.0, maybe it was fixed in later versions?
Btw, it is possible that the problem is limited to empty arrays, I thought maybe when KIS gets such an empty array for a dimension it ignores it, and uses the value it had on the previous ingested row (but that theory is a bit hard for me to prove...)
Thanks
Eran