-
Notifications
You must be signed in to change notification settings - Fork 4k
Open
Description
Describe the bug, including details regarding any error messages, version, and platform.
I'm trying to read a parquet file with pandas using 'pyarrow' engine and I'm having a problem while reading it.
the stack trace error :
File "<stdin>", line 1, in <module>
File "/home/bama/.pyenv/versions/3.10.4/lib/python3.10/site-packages/pandas/io/parquet.py", line 501, in read_parquet
return impl.read(
File "/home/bama/.pyenv/versions/3.10.4/lib/python3.10/site-packages/pandas/io/parquet.py", line 249, in read
result = self.api.parquet.read_table(
File "/home/bama/.pyenv/versions/3.10.4/lib/python3.10/site-packages/pyarrow/parquet/core.py", line 2956, in read_table
dataset = _ParquetDatasetV2(
File "/home/bama/.pyenv/versions/3.10.4/lib/python3.10/site-packages/pyarrow/parquet/core.py", line 2496, in __init__
[fragment], schema=schema or fragment.physical_schema,
File "pyarrow/_dataset.pyx", line 1358, in pyarrow._dataset.Fragment.physical_schema.__get__
File "pyarrow/error.pxi", line 154, in pyarrow.lib.pyarrow_internal_check_status
File "pyarrow/error.pxi", line 91, in pyarrow.lib.check_status
OSError: Could not open Parquet input source '<Buffer>': Logical type Null can not be applied to group node
here is the schema of the parquet file that I'm trying to read:
org.apache.spark.version2.4.7)org.apache.spark.sql.parquet.row.metadata�{"type":"struct","fields":[{"name":"id","type":"string","nullable":true,"metadata":{}},{"name":"uid","type":"string","nullable":true,"metadata":{}},{"name":"params","type":{"type":"map","keyType":"string","valueType":{"type":"array","elementType":"string","containsNull":true},"valueContainsNull":true},"nullable":true,"metadata":{}},{"name":"utc_date","type":"timestamp","nullable":true,"metadata":{}},{"name":"host","type":"string","nullable":true,"metadata":{}},{"name":"customer_id","type":"string","nullable":true,"metadata":{}}]}Wparquet-mr version 1.10.99.7.1.7.0-550 (build 27a2f693f9b09573ead42e85bee2a649ac904119)�!PAR1
otherwise when I'm reading the same file with fastparquet everything goes smoothly
pandas version: 1.5.0
pyarrow version: 14.0.1
Component(s)
Python