-
Notifications
You must be signed in to change notification settings - Fork 4.5k
Description
What happened?
After #27851, user code that depends on versions newer than Avro 1.8.2 are having problems running on Dataflow.
For example in https://github.com/GoogleCloudPlatform/DataflowTemplates, where we moved on to Avro 1.11.3, there were incompatibility errors:
Caused by: java.io.InvalidClassException: org.apache.avro.specific.SpecificRecordBase; local class incompatible: stream classdesc serialVersionUID = -1463700717714793795, local class serialVersionUID = 189988654766568477
and
Caused by: java.lang.NoSuchMethodError: 'boolean org.apache.avro.generic.GenericRecord.hasField(java.lang.String)' com.google.cloud.teleport.v2.transforms.FormatDatastreamRecordToJson.getMetadataIsDeleted(FormatDatastreamRecordToJson.java:258) com.google.cloud.teleport.v2.transforms.FormatDatastreamRecordToJson.apply(FormatDatastreamRecordToJson.java:123) com.google.cloud.teleport.v2.transforms.FormatDatastreamRecordToJson.apply(FormatDatastreamRecordToJson.java:51) org.apache.beam.sdk.extensions.avro.io.AvroSource$AvroBlock.readNextRecord(AvroSource.java:610)
The root cause is that Avro classes are now being shipped along with the /opt/apache/beam/jars/beam-sdks-java-harness.jar, which wasn't the case before.
Tried to relocate in #29407 but got some test failures.
Next step is marking Avro as provided in that JAR, since it's apparently not used.
Issue Priority
Priority: 1 (data loss / total loss of function)
Issue Components
- Component: Python SDK
- Component: Java SDK
- Component: Go SDK
- Component: Typescript SDK
- Component: IO connector
- Component: Beam YAML
- Component: Beam examples
- Component: Beam playground
- Component: Beam katas
- Component: Website
- Component: Spark Runner
- Component: Flink Runner
- Component: Samza Runner
- Component: Twister2 Runner
- Component: Hazelcast Jet Runner
- Component: Google Cloud Dataflow Runner