KAFKA-16448: Unify error-callback exception handling#16745
KAFKA-16448: Unify error-callback exception handling#16745mjsax merged 5 commits intoapache:trunkfrom
Conversation
| ", processorNodeId='" + processorNodeId + '\'' + | ||
| ", taskId=" + taskId + | ||
| '}'; | ||
| } |
There was a problem hiding this comment.
Omitting headers on purpose, as we should not log user-data.
There was a problem hiding this comment.
I know it's late in the game to ask this, but what about a timestamp? Just a thought, I don't have a strong opinion here.
There was a problem hiding this comment.
Good question. Sounds like a miss... \cc @sebastienviale @loicgreffier
There was a problem hiding this comment.
Side comment: I would not block this PR on this question, but do a follow up one, if we want to add ts.
There was a problem hiding this comment.
Yep I agree it shouldn't block, adding a timestamp in a follow on PR is fine.
| globalProcessorContext.metrics() | ||
| ) | ||
| ), | ||
| null |
There was a problem hiding this comment.
Side cleanup as discussed on some other PR.
| // Rethrow exceptions that should not be handled here | ||
| throw e; | ||
| } catch (final RuntimeException e) { | ||
| } catch (final RuntimeException processingException) { |
There was a problem hiding this comment.
side cleanup: use proper variable names
| keySerializer, | ||
| exception); | ||
| } catch (final Exception exception) { | ||
| } catch (final RuntimeException serializationException) { |
There was a problem hiding this comment.
side cleanup: use proper variable names
|
|
||
| private void recordSendError(final String topic, | ||
| final Exception exception, | ||
| final Exception productionException, |
There was a problem hiding this comment.
side cleanup: use proper variable names
| final ConsumerRecord<byte[], byte[]> rawRecord, | ||
| final Logger log, | ||
| final Sensor droppedRecordsSensor) { | ||
| handleDeserializationFailure(deserializationExceptionHandler, processorContext, deserializationException, rawRecord, log, droppedRecordsSensor, null); |
There was a problem hiding this comment.
Side cleanup as discussed on some other PR
|
|
||
| try { | ||
| maybeMeasureLatency(() -> punctuator.punctuate(timestamp), time, punctuateLatencySensor); | ||
| } catch (final TimeoutException timeoutException) { |
There was a problem hiding this comment.
Totally unrelated (might actually to good to put into different PR, but want to avoid that I forget it).
Seems we forgot to handle this case entirely... we might call context.forward() inside of punctuate() and thus need to handle the same error cases as for node.process()...
There was a problem hiding this comment.
Why a TimeoutException here? Could a punctuate potentially throw it?
There was a problem hiding this comment.
Seems we forgot to handle this case entirely... we might call context.forward() inside of punctuate() and thus need to handle the same error cases as for node.process()...
We could end up in a sink node, and thus in RecordCollectorImpl / producer.send() which could throw a TimeoutException.
| log.error("Fatal when handling serialization exception", e); | ||
| recordSendError(topic, e, null, context, processorNodeId); | ||
| return; | ||
| throw new FailedProcessingException("Fatal user code error in production error callback", fatalUserException); |
There was a problem hiding this comment.
fix: inside send() is seem incorrect to call recordSendError() which we only use in from the producer.send(..., Calback) which is called async. Here inside RecordCollector.send() (before we hit producer.send()) can should rather throw directly
| originalException); | ||
| } | ||
|
|
||
| private void handleException(final String errorMessage, final Throwable originalException) { |
There was a problem hiding this comment.
this is an attempt to preserve the error message we create when a handler throws by itself.
| private void handleException(final Throwable originalException) { | ||
| handleException( | ||
| String.format( | ||
| "Exception caught in process. taskId=%s, processor=%s, topic=%s, partition=%d, offset=%d, stacktrace=%s", |
There was a problem hiding this comment.
given what we pass in originalException into `StreamsException, it seems not necessary to include the stracktrace in the error message.
There was a problem hiding this comment.
this seems to break two unit tests in ProcessingExceptionHandlerIntegrationTest while testing exception.getMessage
There was a problem hiding this comment.
Thanks! -- I did only run a few test locally, not the whole suite. Will take a look.
|
thanks for the PR, you're just ahead of me |
|
I just notice a nit in ProductionExceptionHandler should be
|
|
A very little typo |
|
Updates this PR, and also extended it to align all 5 cases we have:
We now handle both error cases "handler throws exception" and "handler returns null" (this one I just realized while working on this PR and just added it). |
|
|
||
| @Test | ||
| public void shouldStopProcessingWhenFatalUserExceptionInFailProcessingExceptionHandler() { | ||
| public void shouldStopProcessingWhenProcessingExceptionHandlerReturnsNull() { |
There was a problem hiding this comment.
Re-purposed this test case for NULL case.
It seem redundant to test "throw exception" twice as originally in the code (one for FAIL and once for CONTINUE), because if we throw, there is no response we return at all...
| public static class FailProcessingExceptionHandlerMockTest implements ProcessingExceptionHandler { | ||
| @Override | ||
| public ProcessingExceptionHandler.ProcessingHandlerResponse handle(final ErrorHandlerContext context, final Record<?, ?> record, final Exception exception) { | ||
| if (((String) record.key()).contains("FATAL")) { |
There was a problem hiding this comment.
Not needed any longer -- we just need one handler that can throw an exception
| taskId, | ||
| getExceptionalStreamsProducerOnSend(exception), | ||
| new ProductionExceptionHandlerMock(ProductionExceptionHandler.ProductionExceptionHandlerResponse.CONTINUE), | ||
| new ProductionExceptionHandlerMock(Optional.of(ProductionExceptionHandlerResponse.CONTINUE)), |
There was a problem hiding this comment.
Want to test 4 cases, so added Optional to make this possible.
| assertEquals(rawRecord.timestamp(), record.timestamp()); | ||
| assertEquals(TimestampType.CREATE_TIME, record.timestampType()); | ||
| assertEquals(rawRecord.headers(), record.headers()); | ||
| try (final Metrics metrics = new Metrics()) { |
There was a problem hiding this comment.
Side cleanup. Metrics must be close to avoid resource leaks. (A few more times below.)
|
|
||
| @Test | ||
| public void shouldPunctuateNotHandleFailProcessingExceptionAndThrowStreamsException() { | ||
| public void punctuateShouldNotHandleFailProcessingExceptionAndThrowStreamsException() { |
There was a problem hiding this comment.
renaming to make it readable :) (same below)
Plus some formatting changes to make the code readable
| LogAndContinueProcessingExceptionHandler.class.getName() | ||
| )); | ||
|
|
||
| assertDoesNotThrow(() -> |
There was a problem hiding this comment.
No need for this... if it would throw, the test fails anyway.
| if (uncaughtException != null || !e.getMessage().contains("Injected test exception")) { | ||
| if (uncaughtException != null || | ||
| !(e instanceof StreamsException) || | ||
| !e.getCause().getMessage().equals("Injected test exception.")) { |
There was a problem hiding this comment.
This change is required only, because the original test conditions was not "correct" anyway. The outer exception was already a StreamsException wrapping the RuntimeException the test throws. Testing the output exception for type RuntimeException does not make any sense anyway, as all non-checked exception are RuntimeException and the check passes...
The only actual change we get in this PR, is that the error message of the outer StreamsException changes -- originally, it was not set explicitly, and thus was set as "RuntimeException: Injected test exception" -- thus, even if we did check the "wrong/outer" error message the test passed. This PR now sets the StreamsException error message explicitly (what I don't consider a backward compatibility issue), but to this check, we need to check the error message of the wrapped exception (bonus, we can test for equals instead of contains).
| !(e.getMessage().contains("test exception"))) { | ||
| if (++exceptionCount > 2 || | ||
| !(e instanceof StreamsException) || | ||
| !(e.getCause().getMessage().endsWith(" test exception."))) { |
There was a problem hiding this comment.
As above. However, we cannot use equals as the use different error messages for different situation when we throw.
| ", processorNodeId='" + processorNodeId + '\'' + | ||
| ", taskId=" + taskId + | ||
| '}'; | ||
| } |
There was a problem hiding this comment.
I know it's late in the game to ask this, but what about a timestamp? Just a thought, I don't have a strong opinion here.
|
|
||
| try { | ||
| maybeMeasureLatency(() -> punctuator.punctuate(timestamp), time, punctuateLatencySensor); | ||
| } catch (final TimeoutException timeoutException) { |
There was a problem hiding this comment.
Why a TimeoutException here? Could a punctuate potentially throw it?
| final Exception exception) { | ||
| assertInputs(context, exception); | ||
| return response; | ||
| if (response == null) { |
There was a problem hiding this comment.
I don't think the response can be null should this be response.isEmpty?
There was a problem hiding this comment.
null is on purpose. It's how we setup the code (it's not something the handler return).
In the end response is "what should the handler return in this test run", and I miss use it to say "handler should throw an exception". I could also add a new boolean flag shouldThrowException instead to not miss use response for this case.
Will update the PR.
| assertInstanceOf(RuntimeException.class, exception); | ||
| assertEquals("KABOOM!", exception.getMessage()); | ||
| return response; | ||
| if (response == null) { |
Follow up code cleanup for KIP-1033. This PR unifies the handling of both error cases for exception handlers: - handler throws an exception - handler returns null The unification happens for all 5 handler cases: - deserialzation - production / serialization - production / send - processing - punctuation Reviewers: Sebastien Viale <sebastien.viale@michelin.com>, Loic Greffier <loic.greffier@michelin.com>, Bill Bejeck <bill@confluent.io>
|
Merged to |
@sebastienviale @loicgreffier -- as discussed on other PR review, I think we should apply some cleanup and unification for the different handler. Would love to get your feedback.
\cc @cadonna @lucasbru @bbejeck