Fix streaming ingestion fails if it encounters empty rows (Regression)#10962
Fix streaming ingestion fails if it encounters empty rows (Regression)#10962maytasm merged 2 commits intoapache:masterfrom
Conversation
| @Override | ||
| protected List<InputRow> parseInputRows(String intermediateRow) throws IOException, ParseException | ||
| { | ||
| List<InputRow> inputRows; |
| inputRows = FluentIterable.from(() -> delegate) | ||
| .transform(jsonNode -> MapInputRowParser.parse(inputRowSchema, flattener.flatten(jsonNode))) | ||
| .toList(); |
There was a problem hiding this comment.
nit: this formatting seems off (intellij moves it in line with the .from when i pull your branch locally)
| //throw unknown exception | ||
| throw e; | ||
| } | ||
| if (CollectionUtils.isEmpty(inputRows)) { |
There was a problem hiding this comment.
CI is failing asking for
<groupId>commons-collections</groupId>
<artifactId>commons-collections</artifactId>
<version>3.2.2</version>
</dependency>
to be added to the pom of druid-core. Could instead use druid's org.apache.druid.utils.CollectionUtils.isNullOrEmpty which does the same check
|
I'm curious that why not skip the empty rows instead of throwing a |
|
Is it possible to add an integration test for this? Maybe update I think a test like this can help flush out whether similar issues exist with other data formats... To be fair, I don't know why a null event would ever end up in a stream |
This PR simply fix the regression caused by #10383 and not intended to change / redesign any behavior. The behavior for streaming ingestion before #10383 was that it throw |
I think it is possible to add to the integration test but is not easy. For testing the JSON (this bug), i don't think the integration test would provide any additional coverage compare to |
This seems like a reasonable approach. Can you create a github issue for this - maybe label this with |
Done. #10971 |
Fix streaming ingestion fails if it encounters empty rows (Regression)
Description
This is a regression in the JSON inputFormat when use with streaming ingestion, introduced by #10383
Streaming ingestion task will consistently fails when it try to parse an empty row despite not reaching maxParseExceptions yet. This is because when the task try to parse an empty row, it does not result in a ParseException but resulted in java.util.NoSuchElementException instead. Hence, the task will fail and will not be able to move pass the empty row even with maxParseExceptions set.
This PR has: