Skip to content

[BUG] Valid column characters fail on to_arrow() or to_pandas() ArrowInvalid: No match for FieldRef.Name #584

@gwindes

Description

@gwindes

Apache Iceberg version

0.6.0 (latest release)

Please describe the bug 🐞

Related to: #81

Platform: MacOS M1
Python v3.12

pyiceberg 0.6.0
pyarrow 15.0.2
pandas 2.2.1

I believe this is a bug, but I may also be misunderstanding how pyiceberg and pyarrow are working with iceberg tables and thus I may be doing something wrong. However, when I sanitize the column name before writing the data to remove : . - / I'm able to query just fine.

My understanding is that the following iceberg column name is a valid name TEST:A1B2.RAW.ABC-GG-1-A. With the caveat that it is NOT a nested field (which I don't need). I'm able to write the data to the iceberg table and it shows the metadata with the fully qualified name of TEST:A1B2.RAW.ABC-GG-1-A in the metadata json.

It appears to be only fail when I want to read the data. I'm following the basic getting started in the pyiceberg

table = catalog.load_table("A1B2.A1-301")

# neither table scan works (throws the same error):
df_pyarrow = table.scan().to_arrow()
df_panda = table.scan().to_pandas()

I created a sample project that reproduces my problem with the pyarrow.lib.ArrowInvalid: No match for FieldRef.Name(TEST_x3AA1B2_x2ERAW_x2EABC_x2DGG_x2D1_x2DA) in TEST:A1B2.RAW.ABC-GG-1-A: double error.

Also to clarify, these column names do need to be in this format as their format has a very specific use case within our hardware environments. We try to follow a URI style naming schema for our columns & sensors.

Image showing metadata is storing channel name as expected.
image

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions