Skip to content

KIS - wrong values in multi-value dimension  #6542

@erankor

Description

@erankor

Hi all,

I'm using Kafka Indexing Service to ingest data, and I noticed that in some cases I'm getting rows with combinations that do not make sense. More specifically - I have a dimension of 'video playback type' which can be live / vod, and I also have a multi value dimension 'position' which holds timestamps for live playback only.
I verified multiple times that I'm always sending an empty array ([]) for position when playback type is vod, and yet, if I filter for playback type = vod and split by live position, I am getting some values.
If I click one of these rows and select 'view raw data' in Swiv, it also shows the problem - it gives me a row with playbackType=vod & some value for position. Grepping for the original row ingested to Druid, shows that it did not have any position set.
Most of the time, I'm using the segments generated by KIS, only merging the different shards into one. However, in some rare cases, when there is some problem, I'm reindexing the events using Hadoop. I found that segments created by KIS show this problem, while segments created by Hadoop do not - filtering by vod in those timestamps and splitting by position returns empty, as expected.
I was wondering whether anyone bumped into such problem. I'm using Druid 0.10.0, maybe it was fixed in later versions?
Btw, it is possible that the problem is limited to empty arrays, I thought maybe when KIS gets such an empty array for a dimension it ignores it, and uses the value it had on the previous ingested row (but that theory is a bit hard for me to prove...)

Thanks

Eran

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions