Skip to content

Conversation

@westonpace
Copy link
Member

@westonpace westonpace commented Jul 13, 2021

  • Modified IpcWriteOptions so that it can take an instance of pyarrow.Codec instead of a str
  • Modified pyarrow.Codec to expose compression_level
  • Added helper methods to report the minimum/maximum/default values for the different codecs

@github-actions
Copy link

@westonpace westonpace changed the title ARROW-13091: [Python] Added compression_level to IpcWriteOptions ARROW-13091: [Python] Add compression_level argument to IpcWriteOptions constructor Jul 13, 2021
Copy link
Member

@lidavidm lidavidm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks.

table = pa.Table.from_arrays([values], names=["values"])

options = pa.ipc.IpcWriteOptions(compression='zstd', compression_level=1)
writer = pa.ipc.RecordBatchFileWriter(sink, table.schema, options=options)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not really a major thing, but given that the RecordBatchFileWriter can act as a context manager and sometimes people look at the tests to learn how to use things, it might be a good idea to write this in the form of

with pa.ipc.RecordBatchFileWriter(sink, table.schema, options=options) as writer:
   writer.write_table(table)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Switched.

@kszucs
Copy link
Member

kszucs commented Jul 15, 2021

@westonpace We already have a Codec python wrapper exposed, could we add an optional level argument to its constructor and reuse that from the IpcWriteOptions?

@westonpace westonpace force-pushed the feature/ARROW-13091--python-add-compression_level-argument-to-ipcwrit branch from c6a019c to 898cf7c Compare July 15, 2021 21:21
@westonpace
Copy link
Member Author

@kszucs Done. I also added some helper methods to make working with compression_level a little easier. It ended up being a bit more involved than I expected.

Copy link
Member

@lidavidm lidavidm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Two small comments.

@westonpace westonpace force-pushed the feature/ARROW-13091--python-add-compression_level-argument-to-ipcwrit branch from d623496 to 82472b4 Compare July 16, 2021 21:01
Copy link
Member

@lidavidm lidavidm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for creating the pytest marks as well.

@westonpace
Copy link
Member Author

I think this is ready for merge. @kszucs Any last thoughts?

self.c_options.codec = shared_ptr[CCodec](GetResultValue(
CCodec.Create(_ensure_compression(value))).release())
elif isinstance(value, Codec):
self.c_options.codec = (<Codec>value).wrapped
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a nit, but we could use .unwrap() here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There already was an unwrap() but it returned CCodec* and not shared_ptr<CCoded>. I think I'd need to investigate where the current unwrap() was used.

@pytest.mark.pandas
@pytest.mark.lz4
@pytest.mark.snappy
@pytest.mark.zstd
Copy link
Member

@kszucs kszucs Jul 19, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we expand this as:

pytest.mark.parametrize("codec", [
    pytest.param("lz4", marks=pytest.mark.lz4),
    pytest.param("snappy", marks=pytest.mark.snappy),
    pytest.param("zstd", marks=pytest.mark.zstd),
])

By letting pytest to iterate through the cases rather than doing it ourselves?

This would enable us to run this case just for the available plugins.

Copy link
Member

@kszucs kszucs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Weston! Just a few nits related to test parametrization.

Possibly it would be better to create global compression related fixtures in a follow-up, so we can include this in the release.

Created one: https://issues.apache.org/jira/browse/ARROW-13380

@kszucs kszucs closed this in c9b9fa4 Jul 19, 2021
@lidavidm
Copy link
Member

@github-actions crossbow submit homebrew-r-autobrew

@github-actions
Copy link

Revision: 91a6ea4

Submitted crossbow builds: ursacomputing/crossbow @ actions-611

Task Status
homebrew-r-autobrew Github Actions

@lidavidm
Copy link
Member

(I accidentally submitted master to Crossbow in an unrelated build and noticed a new failure that looked related to here, want to confirm that)

@lidavidm
Copy link
Member

See ARROW-13384 as followup

@kszucs
Copy link
Member

kszucs commented Jul 19, 2021

Seems like we have more version issues in our crossbow builds: #10659 (comment)

@westonpace westonpace deleted the feature/ARROW-13091--python-add-compression_level-argument-to-ipcwrit branch January 6, 2022 08:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants