Skip to content

[Bug]: Go SDK not faithful to Runner's coder spec #36387

@shunping

Description

@shunping

What happened?

Two of the GroupIntoBatches tests are excluded in the prism Java VR test suite.

  • 'org.apache.beam.sdk.transforms.GroupIntoBatchesTest.testInStreamingMode'
  • 'org.apache.beam.sdk.transforms.GroupIntoBatchesTest.testBufferingTimerInFixedWindow'

The stacktrace is shown below:

bundle inst009 stage-001 failed:org.apache.beam.sdk.coders.CoderException: java.io.EOFException: reached end of stream after reading 7 bytes; 69 bytes expected
        at org.apache.beam.sdk.coders.StringUtf8Coder.decode(StringUtf8Coder.java:104)
        at org.apache.beam.sdk.coders.StringUtf8Coder.decode(StringUtf8Coder.java:37)
        at org.apache.beam.sdk.coders.KvCoder.decode(KvCoder.java:84)
        at org.apache.beam.sdk.coders.KvCoder.decode(KvCoder.java:37)
        at org.apache.beam.sdk.values.WindowedValues$FullWindowedValueCoder.decode(WindowedValues.java:847)
        at org.apache.beam.sdk.values.WindowedValues$FullWindowedValueCoder.decode(WindowedValues.java:838)
        at org.apache.beam.sdk.values.WindowedValues$FullWindowedValueCoder.decode(WindowedValues.java:784)
        at org.apache.beam.sdk.fn.data.BeamFnDataInboundObserver.multiplexElements(BeamFnDataInboundObserver.java:232)
        at org.apache.beam.sdk.fn.data.BeamFnDataInboundObserver.awaitCompletion(BeamFnDataInboundObserver.java:186)
        at org.apache.beam.fn.harness.control.ProcessBundleHandler.processBundle(ProcessBundleHandler.java:542)
        at org.apache.beam.fn.harness.control.BeamFnControlClient.delegateOnInstructionRequestType(BeamFnControlClient.java:150)
        at org.apache.beam.fn.harness.control.BeamFnControlClient$InboundObserver.lambda$onNext$0(BeamFnControlClient.java:115)
        at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:539)
        at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
        at org.apache.beam.sdk.util.UnboundedScheduledExecutorService$ScheduledFutureTask.run(UnboundedScheduledExecutorService.java:163)
        at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
        at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
        at java.base/java.lang.Thread.run(Thread.java:833)
    Caused by: java.io.EOFException: reached end of stream after reading 7 bytes; 69 bytes expected
        at org.apache.beam.vendor.guava.v32_1_2_jre.com.google.common.io.ByteStreams.readFully(ByteStreams.java:802)
        at org.apache.beam.vendor.guava.v32_1_2_jre.com.google.common.io.ByteStreams.readFully(ByteStreams.java:784)
        at org.apache.beam.sdk.coders.StringUtf8Coder.readString(StringUtf8Coder.java:60)
        at org.apache.beam.sdk.coders.StringUtf8Coder.decode(StringUtf8Coder.java:100)
        ... 17 more
        at org.apache.beam.runners.prism.TestPrismRunner.run(TestPrismRunner.java:72)
        at org.apache.beam.sdk.Pipeline.run(Pipeline.java:325)
        at org.apache.beam.sdk.testing.TestPipeline.run(TestPipeline.java:442)
        at org.apache.beam.sdk.testing.TestPipeline.run(TestPipeline.java:381)
        at org.apache.beam.sdk.transforms.GroupIntoBatchesTest.testInStreamingMode(GroupIntoBatchesTest.java:438)

The encoded bytes for the first element in the TestStream is
{Encoded:[3 107 101 121 69 105 110 115 116 101 105 110]}, which is actually "0x3 key Einstein", but somehow it miss the length before the second string.

Issue Priority

Priority: 2 (default / most bugs should be filed as P2)

Issue Components

  • Component: Python SDK
  • Component: Java SDK
  • Component: Go SDK
  • Component: Typescript SDK
  • Component: IO connector
  • Component: Beam YAML
  • Component: Beam examples
  • Component: Beam playground
  • Component: Beam katas
  • Component: Website
  • Component: Infrastructure
  • Component: Spark Runner
  • Component: Flink Runner
  • Component: Samza Runner
  • Component: Twister2 Runner
  • Component: Hazelcast Jet Runner
  • Component: Google Cloud Dataflow Runner

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions