Skip to content

[Backport] Dart: Smoother handling of stage early exit (#17228) (#17069)#17256

Merged
kfaraz merged 2 commits intoapache:31.0.0from
kfaraz:backport_17069
Oct 5, 2024
Merged

[Backport] Dart: Smoother handling of stage early exit (#17228) (#17069)#17256
kfaraz merged 2 commits intoapache:31.0.0from
kfaraz:backport_17069

Conversation

@kfaraz
Copy link
Copy Markdown
Contributor

@kfaraz kfaraz commented Oct 5, 2024

Backport the following patches:
#17069
#17228

gianm added 2 commits October 5, 2024 09:23
…apache#17069)

* MSQ: Properly report errors that occur when starting up RunWorkOrder.

In apache#17046, an exception thrown by RunWorkOrder#startAsync would be ignored
and replaced with a generic CanceledFault. This patch fixes it by retaining
the original error.
Stages can be instructed to exit before they finish, especially when a
downstream stage includes a "LIMIT". This patch has improvements related
to early-exiting stages.

Bug fix:

- WorkerStageKernel: Don't allow fail() to set an exception if the stage is
  already in a terminal state (FINISHED or FAILED). If fail() is called while
  in a terminal state, log the exception, then throw it away. If it's a
  cancellation exception, don't even log it. This fixes a bug where a stage
  that exited early could transition to FINISHED and then to FAILED, causing
  the overall query to fail.

Performance:

- DartWorkerManager previously sent stopWorker commands to workers
  even when "interrupt" was false. Now it only sends those commands when
  "interrupt" is true. The method javadoc already claimed this is what the
  method did, but the implementation did not match the javadoc. This reduces
  the number of RPCs by 1 per worker per query.

Quieter logging:

- In ReadableByteChunksFrameChannel, skip logging exception from setError if
  the channel has been closed. Channels are closed when readers are done with
  them, so at that point, we wouldn't be interested in the errors.

- In RunWorkOrder, skip calling notifyListener on failure of the main work,
  in the case when stop() has already been called. The stop() method will
  set its own error using CanceledFault. This enables callers to detect
  when a stage was canceled vs. failed for some other reason.

- In WorkerStageKernel, skip logging cancellation errors in fail(). This is
  made possible by the previous change in RunWorkOrder.
@kfaraz kfaraz added the Backport label Oct 5, 2024
@kfaraz kfaraz added this to the 31.0.0 milestone Oct 5, 2024
@github-actions github-actions Bot added Area - Batch Ingestion Area - MSQ For multi stage queries - https://github.com/apache/druid/issues/12262 labels Oct 5, 2024
@kfaraz kfaraz merged commit f27a1dc into apache:31.0.0 Oct 5, 2024
@kfaraz kfaraz deleted the backport_17069 branch October 5, 2024 11:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Area - Batch Ingestion Area - MSQ For multi stage queries - https://github.com/apache/druid/issues/12262 Backport

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants