Conversation
We wrapped a schema in a schema.
| @pytest.fixture | ||
| def file_int(schema_int: Schema, tmpdir: str) -> str: | ||
| pyarrow_schema = pa.schema(schema_to_pyarrow(schema_int), metadata={"iceberg.schema": schema_int.model_dump_json()}) | ||
| pyarrow_schema = schema_to_pyarrow(schema_int, metadata={ICEBERG_SCHEMA: bytes(schema_int.model_dump_json(), 'utf-8')}) |
There was a problem hiding this comment.
Should this string be using a constant in a lib somewhere? Or at we could least create an encodings class that centralizes all the schema stuff (e.g. create a constant for 'utf-8', hides ICEBERG_SCHEMA and expose some cleaner methods that hides the bytes conversion, etc...
WDYT?
There was a problem hiding this comment.
Sounds good, I've introduced a utf8 constant 👍
| def schema_to_pyarrow(schema: Union[Schema, IcebergType]) -> pa.schema: | ||
| return visit(schema, _ConvertToArrowSchema()) | ||
| def schema_to_pyarrow(schema: Union[Schema, IcebergType], metadata: Dict[bytes, bytes] = EMPTY_DICT) -> pa.schema: | ||
| return visit(schema, _ConvertToArrowSchema(metadata)) |
There was a problem hiding this comment.
What is the visit() behavior with an empty dict?
There was a problem hiding this comment.
Sometime we use the visitor to convert types, then we don't need to set any metadata so then a default with an empty dict makes things easier and less verbose.
|
Thanks @bitsondatadev and @amogh-jahagirdar for the review 🥳 |
We wrapped a schema in a schema.