Fix compaction tasks reports getting overwritten#15981
Conversation
ed286b1 to
f9a736a
Compare
georgew5656
left a comment
There was a problem hiding this comment.
this looks good to me, imo we don't need the task id in the keys of the reports though,
i think ingestionStatsAndErrors_0 ingestionStatsAndErrors_1 (similar to what is done by query_controller tasks) or maybe the interval that is being compacted instead would be good.
|
@georgew5656 Modified to |
Is the current format documented? If so, we should consider options that preserve compatibility with previous documentation. If not, changing the format is fair game. Although, if you are depending on the specific output format, you might want to add documentation, as otherwise it might be changed later in a way you don't expect. |
|
I tested this change a little bit and found that if you run a compact task with no additional subtasks - the report is an empty json object. Before this change the report would have some details. I think we should fix it so that the task report details are preserved for a compact task with no sub-tasks, even if we change the format of the report. +1 for gian's suggestion of documenting the report so that other users don't break it once we've decided on a particular format. I think it would be a good idea to also add a test of some sort that validates the format of the report. There are a few integration tests for compaction - perhaps one of those would be a good place to add validation for the format of the task report. |
suneet-s
left a comment
There was a problem hiding this comment.
+1 from me on the approach. Thanks for the integration test and docs @adithyachakilam!
There was a problem hiding this comment.
Sorry, @adithyachakilam , I couldn't get to reviewing this PR sooner.
I have left some comments for better code readability. There is also a minor concern regarding holding too many reports and causing an OOM.
Since this PR has already been merged, the comments can be addressed in a follow-up PR.
| log.info("Generated [%d] compaction task specs", totalNumSpecs); | ||
|
|
||
| int failCnt = 0; | ||
| Map<String, TaskReport> completionReports = new HashMap<>(); |
There was a problem hiding this comment.
If the compaction is being run on several intervals (not very likely but still a possibility), can holding all the task reports in memory potentially cause an OOM exception? Currently, most of the task reports contain only ingestStatsAndErrors but they may contain other stuff in the future.
In the future, we should consider writing out the sub-reports in a streaming fashion alongwith the required changes to the TaskReportFileWriter API.
For now, we should add a guardrail here so that we don't try to hold too many reports in memory and fail with an OOM.
There was a problem hiding this comment.
what do you think is a good size to hold then ?
Description
A single compaction task could be splitted the into multiple index tasks based on the interval given in spec. In such cases, all the index tasks are run with same id and the task completion report is getting over written. This PR skips writing the report for each parallel index task and instead writes on the compaction task completion.
With this change instead of overwriting the reports file, we make multiple entries and it looks something like:
Key changed/added classes in this PR
CompactionTaskParallelIndexSupervisorTaskThis PR has: