-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Open
Labels
bugSomething isn't workingSomething isn't working
Description
Describe the bug
Substrait plan execution from COUNT(X) query errors with 'pyarrow.lib.ArrowInvalid: Schema at index 0 was different'
To Reproduce
import json
import pyarrow as pa
import substrait.gen.proto.plan_pb2 as plan_pb2
from datafusion import SessionContext
from datafusion import substrait as ss
from google.protobuf.json_format import Parse
from substrait.gen.proto.plan_pb2 import Plan
from google.protobuf.json_format import MessageToJson
ctx = SessionContext()
tables = pa.RecordBatch.from_arrays(
[
pa.array([1, 2, 3, -4, 5, -6, 7, 8, 9, None]),
],
names=["a"],
)
ctx.register_record_batches("t", [[tables]])
sql_query = "SELECT COUNT(a) FROM 't'"
substrait_proto = plan_pb2.Plan()
substrait_plan = ss.substrait.serde.serialize_to_plan(sql_query, ctx)
substrait_plan_bytes = substrait_plan.encode()
substrait_proto.ParseFromString(substrait_plan_bytes)
substrait_query = MessageToJson(substrait_proto)
substrait_json = json.loads(substrait_query)
plan_proto = Parse(json.dumps(substrait_json), Plan())
plan_bytes = plan_proto.SerializeToString()
substrait_plan = ss.substrait.serde.deserialize_bytes(plan_bytes)
logical_plan = ss.substrait.consumer.from_substrait_plan(ctx, substrait_plan)
df_result = ctx.create_dataframe_from_logical_plan(logical_plan)
df_result.to_arrow_table()
Error:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "pyarrow/table.pxi", line 3950, in pyarrow.lib.Table.from_batches
File "pyarrow/error.pxi", line 144, in pyarrow.lib.pyarrow_internal_check_status
File "pyarrow/error.pxi", line 100, in pyarrow.lib.check_status
pyarrow.lib.ArrowInvalid: Schema at index 0 was different:
COUNT(t.a): int64
vs
COUNT(t.a): int64 not null
Expected behavior
No response
Additional context
No response
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working