Skip to content

Reduce default max # of subTasks to 1 for native parallel task#7181

Merged
fjy merged 7 commits intoapache:masterfrom
jihoonson:reduce-max-num-subtasks
Mar 6, 2019
Merged

Reduce default max # of subTasks to 1 for native parallel task#7181
fjy merged 7 commits intoapache:masterfrom
jihoonson:reduce-max-num-subtasks

Conversation

@jihoonson
Copy link
Copy Markdown
Contributor

@jihoonson jihoonson commented Mar 2, 2019

The unlimited maxNumSubTasks is easy to spawn a huge number of sub tasks. This PR is to reduce its default value to 1, so that people would think about how many tasks they want. When it's 1, the supervisor task prints a warning and does ingestion job on its own instead of spawning sub tasks.

I also added a section of 'Capacity Planning' to give a brief idea about how to set maxNumSubTasks properly.

@jihoonson jihoonson added this to the 0.14.0 milestone Mar 2, 2019
Comment thread docs/content/ingestion/native_tasks.md Outdated
|pushTimeout|Milliseconds to wait for pushing segments. It must be >= 0, where 0 means to wait forever.|0|no|
|segmentWriteOutMediumFactory|Segment write-out medium to use when creating segments. See [SegmentWriteOutMediumFactory](#segmentWriteOutMediumFactory).|Not specified, the value from `druid.peon.defaultSegmentWriteOutMediumFactory.type` is used|no|
|maxNumSubTasks|Maximum number of tasks which can be run at the same time.|Integer.MAX_VALUE|no|
|maxNumSubTasks|Maximum number of tasks which can be run at the same time. The supervisor task would spawn sub tasks up to `maxNumSubTasks` regardless of the reamining task slots. If this value is set to too large, too many sub tasks can be created which might block other ingestion.|2|no|
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remaining*

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. I'll fix this and add a new section to emphasize the importance of setting this value properly.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added Capacity Planning section.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the value is so critical and tough to set automatically, how about setting it to 1 by default and logging a warning in the task log when it's not raised higher? That way users are more likely to notice it if they didn't read the documentation.

Copy link
Copy Markdown
Contributor Author

@jihoonson jihoonson Mar 4, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought about it, but set to 2 because I’m thinking to make the supervisor task do the ingestion job on its own instead of spawning sub tasks if this is 1.

What do you think about more aggressive way like making this option mandatory and throwing error if it’s missing?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought about it, but set to 2 because I’m thinking to make the supervisor task do the ingestion job on its own instead of spawning sub tasks if this is 1.

If this is the case, I think 1 is an even better default.

What do you think about more aggressive way like making this option mandatory and throwing error if it’s missing?

I think there should be some kind of default (we want to minimize the # of mandatory parameters) and at least "1" + a warning is something where it's fairly clear what's going on if you forgot to set it.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Default of 1 with a warning message that tells the user to set "maxNumSubTasks" if they want parallelism sounds good to me.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed default to 1 and added a warning. I also changed the behavior of default to the sequential mode to emphasize the importance.

@jihoonson jihoonson changed the title Reduce # of max subTasks to 2 Reduce default max # of subTasks to 2 for native parallel task Mar 4, 2019
Copy link
Copy Markdown
Member

@clintropolis clintropolis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also will need to update PR title and description if default is now 1

Comment thread docs/content/ingestion/native_tasks.md Outdated
|pushTimeout|Milliseconds to wait for pushing segments. It must be >= 0, where 0 means to wait forever.|0|no|
|segmentWriteOutMediumFactory|Segment write-out medium to use when creating segments. See [SegmentWriteOutMediumFactory](#segmentWriteOutMediumFactory).|Not specified, the value from `druid.peon.defaultSegmentWriteOutMediumFactory.type` is used|no|
|maxNumSubTasks|Maximum number of tasks which can be run at the same time.|Integer.MAX_VALUE|no|
|maxNumSubTasks|Maximum number of tasks which can be run at the same time. The supervisor task would spawn worker tasks up to `maxNumSubTasks` regardless of the available task slots. If this value is set to too large, too many worker tasks can be created which might block other ingestion. Check [Capacity Planning](#capacity-planning) for more details.|2|no|
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is default 1 now?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah sorry. Fixed.

@jihoonson jihoonson changed the title Reduce default max # of subTasks to 2 for native parallel task Reduce default max # of subTasks to 1 for native parallel task Mar 5, 2019
@jihoonson
Copy link
Copy Markdown
Contributor Author

I fixed a wrong if clause in ParallelIndexSupervisorTask.run() and added a unit test.

@jihoonson
Copy link
Copy Markdown
Contributor Author

Fixed the failing IT test.

@fjy fjy merged commit e48a9c1 into apache:master Mar 6, 2019
jon-wei pushed a commit to jon-wei/druid that referenced this pull request Mar 6, 2019
…e#7181)

* Reduce # of max subTasks to 2

* fix typo and add more doc

* add more doc and link

* change default and add warning

* fix doc

* add test

* fix it test
jon-wei added a commit that referenced this pull request Mar 7, 2019
#7200)

* Reduce # of max subTasks to 2

* fix typo and add more doc

* add more doc and link

* change default and add warning

* fix doc

* add test

* fix it test
@glasser
Copy link
Copy Markdown
Contributor

glasser commented Mar 7, 2019

The concern here is that batch ingestion tasks will get in the way of other ingestion tasks. But isn't that what the task priority system solves?

@jihoonson
Copy link
Copy Markdown
Contributor Author

Would you please elaborate a bit more about your concern? Currently the task priority is actually just a lock priority. I think we should improve it to be able to run higher priority tasks faster.

@glasser
Copy link
Copy Markdown
Contributor

glasser commented Mar 7, 2019

Ah, my assumption about how priority works was that if there are more tasks scheduled for the cluster than available capacity, higher priority tasks would take precedence (and maybe even kill existing lower-prioritytasks?). Sounds like I'm wrong?

@gianm
Copy link
Copy Markdown
Contributor

gianm commented Mar 8, 2019

Right now, priority doesn't affect scheduling, only ability of tasks that are already-scheduled to acquire locks.

@glasser
Copy link
Copy Markdown
Contributor

glasser commented Mar 8, 2019

That's good to know, and this change makes lots of sense given that!

@glasser
Copy link
Copy Markdown
Contributor

glasser commented Mar 8, 2019

@jihoonson I just rebased #7048 on top of this and noticed that ITParallelIndexTest no longer actually runs in parallel. I assume that is not intended; I'll send off a PR.

@jihoonson
Copy link
Copy Markdown
Contributor Author

@glasser OMG, it's my bad. That test should be run in parallel. Would you please fix it by adding maxNumSubTasks to the ingestion spec if possible?

@glasser
Copy link
Copy Markdown
Contributor

glasser commented Mar 8, 2019

Yep, #7211.

gianm pushed a commit that referenced this pull request Mar 9, 2019
…7211)

* integration-tests: make ITParallelIndexTest still work in parallel

Follow-up to #7181, which made the default behavior for index_parallel tasks
non-parallel.

* Validate that parallel index subtasks were run
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants