Modify batch index task naming to accomodate simultaneous tasks #8612
Modify batch index task naming to accomodate simultaneous tasks #8612jihoonson merged 7 commits intoapache:masterfrom
Conversation
gianm
left a comment
There was a problem hiding this comment.
This isn't enough randomness; RandomIdUtils.getRandomId() offers 32 bits of randomness, but if that's the only thing other than datasource we use for generating a task id, it means that after just a few thousand tasks, odds are pretty good we'll see a collision. It would show up as a task that fails with a duplicate-task-id error. I'd suggest including the current time as well, which would make this a non-issue.
|
Sorry for the delay. |
|
Sorry for the delay. I will review this PR in a couple of days. |
jihoonson
left a comment
There was a problem hiding this comment.
I lied. I just read this PR. Looks like it still has the limited randomness issue. How about using UUIDUtils.generateUuid()?
|
@jihoonson Thanks for reviewing. Apart from the random string, the timestamp is also part of the task name. Wouldn't that be sufficient to avoid task name collisions? |
jihoonson
left a comment
There was a problem hiding this comment.
Oh, I missed it. Thanks!
Fixes #8494 .
Description
Modifies the index task name generation logic to as follows:
Before:
index_hadoop_datasource_2019-09-27T21:11:29.162ZAfter:
index_hadoop_datasource_wjkrtyiz_2019-09-27T21:11:29.162ZThis PR has: