Skip to content

Fix compaction tasks reports getting overwritten#15981

Merged
georgew5656 merged 11 commits intoapache:masterfrom
ac9817:adithyachakilam/fix-compaction-task-reports-getting-overwritten
Mar 4, 2024
Merged

Fix compaction tasks reports getting overwritten#15981
georgew5656 merged 11 commits intoapache:masterfrom
ac9817:adithyachakilam/fix-compaction-task-reports-getting-overwritten

Conversation

@ac9817
Copy link
Copy Markdown
Contributor

@ac9817 ac9817 commented Feb 27, 2024

Description

A single compaction task could be splitted the into multiple index tasks based on the interval given in spec. In such cases, all the index tasks are run with same id and the task completion report is getting over written. This PR skips writing the report for each parallel index task and instead writes on the compaction task completion.

With this change instead of overwriting the reports file, we make multiple entries and it looks something like:

{
  "ingestionStatsAndErrors_2": {
    "type": "ingestionStatsAndErrors",
    "taskId": "compact_test_klkggepm_2024-02-28T17:27:17.327Z",
    "payload": {
      "ingestionState": "COMPLETED",
      "unparseableEvents": {
        "buildSegments": []
      },
      "rowStats": {
        "totals": {
          "buildSegments": {
            "processed": 3,
            "processedBytes": 1500,
            "processedWithError": 0,
            "thrownAway": 0,
            "unparseable": 0
          }
        }
      },
      "errorMsg": null,
      "segmentAvailabilityConfirmed": false,
      "segmentAvailabilityWaitTimeMs": 0,
      "recordsProcessed": {}
    }
  },
  "ingestionStatsAndErrors_1": {
    "type": "ingestionStatsAndErrors",
    "taskId": "compact_test_klkggepm_2024-02-28T17:27:17.327Z",
    "payload": {
      "ingestionState": "COMPLETED",
      "unparseableEvents": {
        "buildSegments": []
      },
      "rowStats": {
        "totals": {
          "buildSegments": {
            "processed": 3,
            "processedBytes": 1500,
            "processedWithError": 0,
            "thrownAway": 0,
            "unparseable": 0
          }
        }
      },
      "errorMsg": null,
      "segmentAvailabilityConfirmed": false,
      "segmentAvailabilityWaitTimeMs": 0,
      "recordsProcessed": {}
    }
  },
  "ingestionStatsAndErrors_0": {
    "type": "ingestionStatsAndErrors",
    "taskId": "compact_test_klkggepm_2024-02-28T17:27:17.327Z",
    "payload": {
      "ingestionState": "COMPLETED",
      "unparseableEvents": {
        "buildSegments": []
      },
      "rowStats": {
        "totals": {
          "buildSegments": {
            "processed": 3,
            "processedBytes": 1500,
            "processedWithError": 0,
            "thrownAway": 0,
            "unparseable": 0
          }
        }
      },
      "errorMsg": null,
      "segmentAvailabilityConfirmed": false,
      "segmentAvailabilityWaitTimeMs": 0,
      "recordsProcessed": {}
    }
  }
}

Key changed/added classes in this PR
  • CompactionTask
  • ParallelIndexSupervisorTask

This PR has:

  • been self-reviewed.
  • added documentation for new or modified features or behaviors.
  • a release note entry in the PR description.
  • added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links.
  • added or updated version, license, or notice information in licenses.yaml
  • added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
  • added unit tests or modified existing tests to cover new code paths, ensuring the threshold for code coverage is met.
  • added integration tests.
  • been tested in a test Druid cluster.

@ac9817 ac9817 force-pushed the adithyachakilam/fix-compaction-task-reports-getting-overwritten branch from ed286b1 to f9a736a Compare February 27, 2024 19:05
Copy link
Copy Markdown
Contributor

@georgew5656 georgew5656 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this looks good to me, imo we don't need the task id in the keys of the reports though,

i think ingestionStatsAndErrors_0 ingestionStatsAndErrors_1 (similar to what is done by query_controller tasks) or maybe the interval that is being compacted instead would be good.

@ac9817
Copy link
Copy Markdown
Contributor Author

ac9817 commented Feb 28, 2024

@georgew5656 Modified to ingestionStatsAndErrors_0, ingestionStatsAndErrors_1

@ac9817 ac9817 requested a review from georgew5656 February 28, 2024 17:36
@arunramani
Copy link
Copy Markdown
Contributor

@cryptoe / @kfaraz this PR changes the structure of the task report for compaction. Are task reports considered part of a Druid API contract? In other words, are "breaking" changes okay for this?

@ac9817 ac9817 requested a review from georgew5656 February 29, 2024 19:46
@gianm
Copy link
Copy Markdown
Contributor

gianm commented Mar 1, 2024

@cryptoe / @kfaraz this PR changes the structure of the task report for compaction. Are task reports considered part of a Druid API contract? In other words, are "breaking" changes okay for this?

Is the current format documented? If so, we should consider options that preserve compatibility with previous documentation.

If not, changing the format is fair game. Although, if you are depending on the specific output format, you might want to add documentation, as otherwise it might be changed later in a way you don't expect.

@suneet-s
Copy link
Copy Markdown
Contributor

suneet-s commented Mar 1, 2024

I tested this change a little bit and found that if you run a compact task with no additional subtasks - the report is an empty json object. Before this change the report would have some details.

I think we should fix it so that the task report details are preserved for a compact task with no sub-tasks, even if we change the format of the report.

+1 for gian's suggestion of documenting the report so that other users don't break it once we've decided on a particular format. I think it would be a good idea to also add a test of some sort that validates the format of the report. There are a few integration tests for compaction - perhaps one of those would be a good place to add validation for the format of the task report.

Comment thread docs/ingestion/tasks.md Outdated
Comment thread docs/ingestion/tasks.md Outdated
@georgew5656 georgew5656 self-requested a review March 1, 2024 20:52
Copy link
Copy Markdown
Contributor

@suneet-s suneet-s left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 from me on the approach. Thanks for the integration test and docs @adithyachakilam!

@georgew5656 georgew5656 merged commit ec52f68 into apache:master Mar 4, 2024
Copy link
Copy Markdown
Contributor

@kfaraz kfaraz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, @adithyachakilam , I couldn't get to reviewing this PR sooner.

I have left some comments for better code readability. There is also a minor concern regarding holding too many reports and causing an OOM.
Since this PR has already been merged, the comments can be addressed in a follow-up PR.

log.info("Generated [%d] compaction task specs", totalNumSpecs);

int failCnt = 0;
Map<String, TaskReport> completionReports = new HashMap<>();
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the compaction is being run on several intervals (not very likely but still a possibility), can holding all the task reports in memory potentially cause an OOM exception? Currently, most of the task reports contain only ingestStatsAndErrors but they may contain other stuff in the future.

In the future, we should consider writing out the sub-reports in a streaming fashion alongwith the required changes to the TaskReportFileWriter API.
For now, we should add a guardrail here so that we don't try to hold too many reports in memory and fail with an OOM.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what do you think is a good size to hold then ?

@ac9817
Copy link
Copy Markdown
Contributor Author

ac9817 commented Mar 5, 2024

@kfaraz Trying to address them here: #16042

georgew5656 pushed a commit that referenced this pull request Mar 6, 2024
…6042)

* initial commit

* comments

* typo

* comments

* comments

* remove var

* initialize global var early

* remove new line

* small test fix

* same fix another test
@adarshsanjeev adarshsanjeev added this to the 30.0.0 milestone May 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants