Fix ingestion task failure when no input split to process#11553
Fix ingestion task failure when no input split to process#11553maytasm merged 3 commits intoapache:masterfrom
Conversation
|
This sounds like an interesting edge case. I think we should add an integration test for this. EDIT: Wow auto-correct on my phone - fixed my comment |
| if (taskMonitor != null) { | ||
| taskMonitor.collectReport(report); | ||
| } |
There was a problem hiding this comment.
collectReport should be called only when there is a subtask sending its report. Since TaskMonitor is responsible for spawning subtasks, it seems quite strange if taskMonitor is null when this method is called. How about adding a precondition that explodes when it's null?
There was a problem hiding this comment.
I just saw that the taskMonitor was accessed without a null check so I added the null check just in case. The bug mentioned in this ticket can be fix with just the change in getReports(). Given the context from your comment, maybe it is better to leave it as is and adds a comment mentioning something like... collectReport is only called when there is a subtask sending its report. Since TaskMonitor is responsible for spawning subtasks, the taskMonitor cannot be null
There was a problem hiding this comment.
The comment sounds good. I think having a null check is also a good convention because someone like me could break the contract around taskMonitor unintentionally in the future. Perhaps we can add assert taskMonitor != null;. This will be ran only in tests, but not in production.
There was a problem hiding this comment.
Is there a way to have static analysis flag all these uses? I've tried using combinations of @Nullable(doesn't realize that once it's set, it's good for the rest of the method) and @MonotonicNonNull but haven't been able to get intelliJ to flag these issues for me locally yet. I figured I would just leave a comment here and maybe you will have better luck
There was a problem hiding this comment.
You can set the severity of the "Nullability problems" intellij inspection to error instead of warning. I think we haven't done it because there could be lots of places where the inspection will complain about nulls. It could be worth doing it at some point though.
| public Map<String, SubTaskReportType> getReports() | ||
| { | ||
| return taskMonitor.getReports(); | ||
| return taskMonitor == null ? Collections.emptyMap() : taskMonitor.getReports(); |
To name a few cases:
|
Added integration test |
The first 2 cases sound like a user doing something they shouldn't be doing - and the re-index is a no-op. Maybe we should surface this as a typo to the end user? Can you provide an example of the 3rd case? I'm wondering if all these cases are situations where the end user should be warned of something "unexpected" happening. |
For the 3rd case, imagine you have a hash partitioning with the following spec: The partial_segment_merge phase tasks will have no input split to process as the interval in the ioConfig does not match with intervals in granularitySpec |
The cases I thought of are due to user error in their ingestionSpec. I am not sure if there are valid cases where we can run into no input split to process situation though. |
jihoonson
left a comment
There was a problem hiding this comment.
LGTM
I'm wondering if all these cases are situations where the end user should be warned of something "unexpected" happening.
I think this is likely because of the user error in their ingestion spec, and thus it makes sense to bubble up the user error to their end. The thing is how. To me, from the Druid perspective, the most reasonable way is ending the task with the SUCCEEDED state (because the task technically succeeded with no error) and showing the ingestion metrics, such as # of rows ingested, in the task report, so that users can notice the strangeness of their ingestion job from those metrics. We could probably improve our UI to make this more noticeable.
|
Merging this in to fix regression. Improvement for logging and UI can be done separately |
Fix ingestion task failure when no input split to process
Description
This PR fix a regression introduced by #11189
The regression causes ingestion task in parallel mode to fail when there is no input split to process (previously the task would succeed).
This is because of a null pointer exception occurred in getReports() at
druid/indexing-service/src/main/java/org/apache/druid/indexing/common/task/batch/parallel/ParallelIndexPhaseRunner.java
Line 296 in 2df4214
druid/indexing-service/src/main/java/org/apache/druid/indexing/common/task/batch/parallel/ParallelIndexPhaseRunner.java
Line 111 in 2df4214
that taskMonitor is initialized in line 121, but that does not always happen. If the method returned in line 116, which is when there is no input source to split, then taskMonitor stays null.
This PR has: