Skip to content

Modify batch index task naming to accomodate simultaneous tasks #8612

Merged
jihoonson merged 7 commits intoapache:masterfrom
a2l007:index_naming
Nov 18, 2019
Merged

Modify batch index task naming to accomodate simultaneous tasks #8612
jihoonson merged 7 commits intoapache:masterfrom
a2l007:index_naming

Conversation

@a2l007
Copy link
Copy Markdown
Contributor

@a2l007 a2l007 commented Sep 30, 2019

Fixes #8494 .

Description

Modifies the index task name generation logic to as follows:
Before: index_hadoop_datasource_2019-09-27T21:11:29.162Z
After: index_hadoop_datasource_wjkrtyiz_2019-09-27T21:11:29.162Z


This PR has:

  • been self-reviewed.
  • been tested in a test Druid cluster.

@a2l007 a2l007 changed the title Index naming Modify batch index task naming to accomodate simultaneous tasks Sep 30, 2019
Copy link
Copy Markdown
Contributor

@gianm gianm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't enough randomness; RandomIdUtils.getRandomId() offers 32 bits of randomness, but if that's the only thing other than datasource we use for generating a task id, it means that after just a few thousand tasks, odds are pretty good we'll see a collision. It would show up as a task that fails with a duplicate-task-id error. I'd suggest including the current time as well, which would make this a non-issue.

@jihoonson
Copy link
Copy Markdown
Contributor

@a2l007 thank you for raising this PR! Have you had a chance to check @gianm's comment?

@a2l007
Copy link
Copy Markdown
Contributor Author

a2l007 commented Oct 4, 2019

Sorry for the delay.
@gianm Thanks, that makes sense to me. I'm made the corresponding changes.

@jihoonson
Copy link
Copy Markdown
Contributor

Sorry for the delay. I will review this PR in a couple of days.

Copy link
Copy Markdown
Contributor

@jihoonson jihoonson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I lied. I just read this PR. Looks like it still has the limited randomness issue. How about using UUIDUtils.generateUuid()?

@a2l007
Copy link
Copy Markdown
Contributor Author

a2l007 commented Nov 18, 2019

@jihoonson Thanks for reviewing. Apart from the random string, the timestamp is also part of the task name. Wouldn't that be sufficient to avoid task name collisions?

Copy link
Copy Markdown
Contributor

@jihoonson jihoonson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I missed it. Thanks!

@jihoonson jihoonson merged commit 8515a03 into apache:master Nov 18, 2019
@jon-wei jon-wei added this to the 0.17.0 milestone Dec 17, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Fix batch index task naming convention to accommodate simultaneous tasks

4 participants