Add limit to task payload size by georgew5656 · Pull Request #16512 · apache/druid

georgew5656 · 2024-05-29T13:51:39Z

Currently it's possible to submit a really big task payload and have it OOM the overlord if the request fails and logs the whole thing in SQLMetadataStorageActionHandler.insertEntryWithHandle.

If the request happens to be larger than max_allowed_packet for a mysql metadata store it will always fail. Since really large task payloads seem to cause overlord instability in general I think it makes sense to limit the size of task payloads at the task queue level.

Description

Add a new config that sets a limit for task payload sizes. Throw a exception if the limit is exceeded.

The default limit of 60 MB is based on the 64 MB default value of max_allowed_packet in MySQL 8+.

I would ideally like to use the http request content-length header to calculate the size of the task payload rather than re-serializing it in memory but we also call taskQueue.add directly from supervisors so that would bypass the check. If others think this is acceptable and it would be better to check Content-Length in OverlordResource I am fine with changing this logic.

Release note

Adding a new guardrail for submitting tasks that are too large

Key changed/added classes in this PR

TaskQueue

This PR has:

been self-reviewed.
- using the concurrency checklist (Remove this item if the PR doesn't have any relation to concurrency.)
added documentation for new or modified features or behaviors.
a release note entry in the PR description.
added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links.
added or updated version, license, or notice information in licenses.yaml
[ X added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
added unit tests or modified existing tests to cover new code paths, ensuring the threshold for code coverage is met.
added integration tests.
been tested in a test Druid cluster.

kfaraz · 2024-05-29T14:21:47Z

@georgew5656 , thanks for the changes!

It makes sense to fail early on large payloads with a clear error message, but I have some concerns.

Such tasks already fail on insertion into the metadata store. Do you think the current error message in this case is not clear enough? Instead of a new check/config, should we try to improve the existing error message by maybe catching it and throwing a cleaner DruidException?
Or do you feel that there are other challenges with large task payloads too (say, Overlord facing a memory crunch)?
One problem with introducing this new config with a default of 64MiB is that some clusters would already be using large task payloads and would have set their max_allowed_packet to large enough values that allow such payloads. And if we go with a default that is large enough to accommodate all clusters, then we would probably defeat the purpose of the config itself.
There is also the MSQ case to consider where the user just submits a SQL and the task payload is actually generated by the system. How does the user control the task payload size in that case? Maybe we don't fail in the case of msq task types?

One compromise could be to keep the check and just warn about large task payloads for the time being, with a log message and maybe also an alert event.

Let me know what you think!

georgew5656 · 2024-05-29T15:59:31Z

@georgew5656 , thanks for the changes!

It makes sense to fail early on large payloads with a clear error message, but I have some concerns.

Such tasks already fail on insertion into the metadata store. Do you think the current error message in this case is not clear enough? Instead of a new check/config, should we try to improve the existing error message by maybe catching it and throwing a cleaner DruidException?

Or do you feel that there are other challenges with large task payloads too (say, Overlord facing a memory crunch)?

One problem with introducing this new config with a default of 64MiB is that some clusters would already be using large task payloads and would have set their max_allowed_packet to large enough values that allow such payloads. And if we go with a default that is large enough to accommodate all clusters, then we would probably defeat the purpose of the config itself.

There is also the MSQ case to consider where the user just submits a SQL and the task payload is actually generated by the system. How does the user control the task payload size in that case? Maybe we don't fail in the case of msq task types?

One compromise could be to keep the check and just warn about large task payloads for the time being, with a log message and maybe also an alert event.

Let me know what you think!

the direct problem seems to be the error message that gets thrown when the write to db fails is a little TOO descriptive in this case (it tries to log the whole thing and this can easily cause mem issues).

i don't think the config totally loses its purpose with a higher default, since you can still choose to lower it to limit the blast radius of large payloads.

maybe it would make sense to just truncate that error message and throw a warn in OverlordResource when a task payload is very large?

I personally don't think a task payload above a certain size makes sense. for msq, would it really try to generate that large of a payload? 60 MB is huge for metadata

kfaraz · 2024-05-29T16:45:31Z

maybe it would make sense to just truncate that error message and throw a warn in OverlordResource when a task payload is very large?

Yes, we should definitely do these two items even if we decide not to do the rest.

i don't think the config totally loses its purpose with a higher default, since you can still choose to lower it to limit the blast radius of large payloads.

True, but I don't think most cluster operators would think of updating this config when they encounter a problem. They would be more likely to increase the max_allowed_packet unless we update the error message thrown for max_allowed_packet to suggest using the new config.

I personally don't think a task payload above a certain size makes sense. for msq, would it really try to generate that large of a payload? 60 MB is huge for metadata

I agree that 60MiB is large enough, but I do recall some cases where users had to resort to increasing the max_allowed_packet. Example,

Could not send query: query size is >= to max_allowed_packet (67108864)

In conclusion, we could do the following:

Truncate the error message thrown when we exceed max_allowed_packet and include info about using the new config.
Log a warning when task payload crosses say 80% of max limit.
Log a stronger warning and raise an alert when task payload crosses 100% of max limit, maybe also mentioning that such task payloads will be rejected in future Druid releases.

kfaraz · 2024-05-30T02:29:05Z

+      log.warn("Received a task payload > [%d] with id [%s]. and datasource [%s]" +
+              " There may be downstream issues caused by managing this large payload." +
+              "Increase druid.indexer.queue.maxTaskPayloadSize to ignore this warning.",
+          config.getMaxTaskPayloadSize(),
+          task.getId(),
+          task.getDataSource()
+      );


Suggested change

log.warn("Received a task payload > [%d] with id [%s]. and datasource [%s]" +

" There may be downstream issues caused by managing this large payload." +

"Increase druid.indexer.queue.maxTaskPayloadSize to ignore this warning.",

config.getMaxTaskPayloadSize(),

task.getId(),

task.getDataSource()

);

log.warn(

"Task[%s] of datasource[%s] has payload size[%d] larger than the recommended maximum[%d]." +

+ " Large task payloads may cause stability issues in the Overlord and may fail while persisting to the metadata store."

+ " Use smaller task payloads or increase 'druid.indexer.queue.maxTaskPayloadSize' to suppress this warning.",

task.getId(), task.getDataSource(), config.getMaxTaskPayloadSize()

);

…ord/TaskQueue.java Co-authored-by: Kashif Faraz <kashif.faraz@gmail.com>

kfaraz

Minor nitpicks, otherwise looks good.
Please update the release note in the PR description mentioning the usage of the new config.

kfaraz · 2024-05-31T15:00:38Z

+      } else if (payload.length() > TASK_SIZE_WARNING_THRESHOLD) {
+        log.warn(
+            "Task[%s] of datasource[%s] has payload size[%d] larger than the recommended maximum[%d]. " +
+                "Large task payloads may cause stability issues in the Overlord and may fail while persisting to the metadata store." +


Typo: missing space

Suggested change

"Large task payloads may cause stability issues in the Overlord and may fail while persisting to the metadata store." +

"Large task payloads may cause stability issues in the Overlord and may fail while persisting to the metadata store. " +

Co-authored-by: Kashif Faraz <kashif.faraz@gmail.com>

…ord/TaskQueue.java Co-authored-by: Kashif Faraz <kashif.faraz@gmail.com>

georgew5656 · 2024-05-31T16:17:50Z

the intellij inspection failures look unrelated to me

Add limit to task payload size

23cccbc

georgew5656 requested review from cryptoe, kfaraz and suneet-s May 29, 2024 13:51

github-actions Bot added Area - Documentation Area - Ingestion labels May 29, 2024

Change to a warning

1a3f95c

georgew5656 force-pushed the addTaskPayloadLimit branch from 6c2211b to 1a3f95c Compare May 29, 2024 17:13

georgew5656 added 2 commits May 29, 2024 13:14

Remove test

2f54fc9

Fix unit tests

04b873e

kfaraz reviewed May 30, 2024

View reviewed changes

georgew5656 added 2 commits May 30, 2024 09:12

Optionally throw alert

a46d5c1

PR comments

103164a

github-advanced-security AI found potential problems May 30, 2024

View reviewed changes

Comment thread indexing-service/src/main/java/org/apache/druid/indexing/overlord/TaskQueue.java Fixed

georgew5656 requested a review from kfaraz May 30, 2024 19:21

kfaraz reviewed May 31, 2024

View reviewed changes

georgew5656 and others added 3 commits May 31, 2024 06:16

Update indexing-service/src/main/java/org/apache/druid/indexing/overl…

3f72225

…ord/TaskQueue.java Co-authored-by: Kashif Faraz <kashif.faraz@gmail.com>

PR comments

e414b29

Reject large payloads

2d44e3f

kfaraz approved these changes May 31, 2024

View reviewed changes

georgew5656 and others added 2 commits May 31, 2024 08:19

Update docs/configuration/index.md

88a4173

Co-authored-by: Kashif Faraz <kashif.faraz@gmail.com>

Update indexing-service/src/main/java/org/apache/druid/indexing/overl…

a401973

…ord/TaskQueue.java Co-authored-by: Kashif Faraz <kashif.faraz@gmail.com>

georgew5656 merged commit 0936798 into apache:master May 31, 2024

kfaraz added this to the 31.0.0 milestone Oct 4, 2024

kfaraz mentioned this pull request Oct 11, 2024

[DRAFT] 31.0.0 Release Notes #17332

Closed

	"Large task payloads may cause stability issues in the Overlord and may fail while persisting to the metadata store." +
	"Large task payloads may cause stability issues in the Overlord and may fail while persisting to the metadata store. " +

Conversation

georgew5656 commented May 29, 2024

Description

Release note

Key changed/added classes in this PR

Uh oh!

kfaraz commented May 29, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

georgew5656 commented May 29, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kfaraz commented May 29, 2024

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

kfaraz May 30, 2024

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

kfaraz left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

kfaraz May 31, 2024

Choose a reason for hiding this comment

Uh oh!

Uh oh!

georgew5656 commented May 31, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

kfaraz commented May 29, 2024 •

edited

Loading

georgew5656 commented May 29, 2024 •

edited

Loading

kfaraz left a comment •

edited

Loading