Skip to content

Refactor Arrow schema conversion#117

Merged
Fokko merged 3 commits intoapache:mainfrom
Fokko:fd-refactor-schame-convertion
Nov 3, 2023
Merged

Refactor Arrow schema conversion#117
Fokko merged 3 commits intoapache:mainfrom
Fokko:fd-refactor-schame-convertion

Conversation

@Fokko
Copy link
Copy Markdown
Contributor

@Fokko Fokko commented Nov 2, 2023

We wrapped a schema in a schema.

We wrapped a schema in a schema.
@Fokko Fokko marked this pull request as ready for review November 2, 2023 12:16
@Fokko Fokko changed the title Refactor schema conversion Refactor Arrow schema conversion Nov 2, 2023
Copy link
Copy Markdown
Contributor

@bitsondatadev bitsondatadev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nits

Comment thread tests/io/test_pyarrow.py Outdated
@pytest.fixture
def file_int(schema_int: Schema, tmpdir: str) -> str:
pyarrow_schema = pa.schema(schema_to_pyarrow(schema_int), metadata={"iceberg.schema": schema_int.model_dump_json()})
pyarrow_schema = schema_to_pyarrow(schema_int, metadata={ICEBERG_SCHEMA: bytes(schema_int.model_dump_json(), 'utf-8')})
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this string be using a constant in a lib somewhere? Or at we could least create an encodings class that centralizes all the schema stuff (e.g. create a constant for 'utf-8', hides ICEBERG_SCHEMA and expose some cleaner methods that hides the bytes conversion, etc...

WDYT?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good, I've introduced a utf8 constant 👍

Comment thread pyiceberg/io/pyarrow.py
def schema_to_pyarrow(schema: Union[Schema, IcebergType]) -> pa.schema:
return visit(schema, _ConvertToArrowSchema())
def schema_to_pyarrow(schema: Union[Schema, IcebergType], metadata: Dict[bytes, bytes] = EMPTY_DICT) -> pa.schema:
return visit(schema, _ConvertToArrowSchema(metadata))
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the visit() behavior with an empty dict?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sometime we use the visitor to convert types, then we don't need to set any metadata so then a default with an empty dict makes things easier and less verbose.

Copy link
Copy Markdown
Contributor

@bitsondatadev bitsondatadev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@Fokko Fokko merged commit 9189cb3 into apache:main Nov 3, 2023
@Fokko Fokko deleted the fd-refactor-schame-convertion branch November 3, 2023 15:52
@Fokko
Copy link
Copy Markdown
Contributor Author

Fokko commented Nov 3, 2023

Thanks @bitsondatadev and @amogh-jahagirdar for the review 🥳

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants