Skip to content

[Bug]: Reading from BigQuery provides inconsistent schemas #28151

@robertwb

Description

@robertwb

What happened?

When doing a BigQuery Read like

p | beam.io.ReadFromBigQuery(
    table='apache-beam-testing:beam_bigquery_io_test.taxi_small',
    output_type='BEAM_ROW')

the TIMESTAMP fields are converted to fields of schema type Field{name=event_timestamp, description=, type=LOGICAL_TYPE<beam:logical_type:micros_instant:v1>, options={{}}} whereas in Java they are converted into (incompatible) fields of schema type Field{name=event_timestamp, description=, type=DATETIME, options={{}}}.

The Python one is probably the one that is wrong here. In addition, one cannot write elements of this type to another BigQuery table as one gets

  File "/Users/robertwb/Work/beam/incubator-beam/sdks/python/apache_beam/io/gcp/bigquery_file_loads.py", line 261, in process
    writer.write(row)
  File "/Users/robertwb/Work/beam/incubator-beam/sdks/python/apache_beam/io/gcp/bigquery_tools.py", line 1432, in write
    return self._file_handle.write(self._coder.encode(row) + b'\n')
  File "/Users/robertwb/Work/beam/incubator-beam/sdks/python/apache_beam/io/gcp/bigquery_tools.py", line 1379, in encode
    return json.dumps(
  File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/json/__init__.py", line 234, in dumps
    return cls(
  File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/json/encoder.py", line 199, in encode
    chunks = self.iterencode(o, _one_shot=True)
  File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/json/encoder.py", line 257, in iterencode
    return _iterencode(o, 0)
  File "/Users/robertwb/Work/beam/incubator-beam/sdks/python/apache_beam/io/gcp/bigquery_tools.py", line 152, in default_encoder
    raise TypeError(
TypeError: Object of type 'Timestamp' is not JSON serializable [while running 'WriteToBigQueryHandlingErrors/WriteToBigQuery/BigQueryBatchFileLoads/ParDo(WriteRecordsToFile)/ParDo(WriteRecordsToFile)']

Issue Priority

Priority: 2 (default / most bugs should be filed as P2)

Issue Components

  • Component: Python SDK
  • Component: Java SDK
  • Component: Go SDK
  • Component: Typescript SDK
  • Component: IO connector
  • Component: Beam examples
  • Component: Beam playground
  • Component: Beam katas
  • Component: Website
  • Component: Spark Runner
  • Component: Flink Runner
  • Component: Samza Runner
  • Component: Twister2 Runner
  • Component: Hazelcast Jet Runner
  • Component: Google Cloud Dataflow Runner

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions