Skip to content

Kafka ingestion fails to parse multiple-line messages in 0.19 #10259

@FrankChen021

Description

@FrankChen021

Affected Version

  • 0.17
  • 0.18
  • 0.19

Description

There's a topic in our kafka cluster, which contains messages in pretty JSON format as below. The newest 0.19 fails to parse these messages as JSON objects while 0.16 works fine.

JSON example

{
        "byteCount":0,
        "partition":0,
        "recordAge":0,
        "recordCount":0,
        "replicationLatency":0,
        "targetCluster":"dst",
        "timestamp":1597045440490,
        "topic":"test"
}

0.16

image
image

0.19
image
image

after changing Input Format from default Regex to Json, following error appears.

image

Reason

After diving into the code between 0.16 and 0.19, I found the problem is caused by JsonReader which was introduced in 0.17 by #8823

The new JsonReader inherits from TextReader which uses LineIterator to split the input string and return text LINE BY LINE instead of the whole text.

So for multiple-line json text, this implementation fails to parse the text as JSON object.

final LineIterator delegate = new LineIterator(

How to fix

Maybe JsonReader should override the intermediateRowIterator function defined in TextReader to return an iterator with only one string object.

@jihoonson please check this bug if you're convenient :)

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions