-
Notifications
You must be signed in to change notification settings - Fork 4k
ARROW-6474: [Python] Add option to use legacy / pre-0.15 IPC message format and to set the default using PYARROW_LEGACY_IPC_FORMAT environment variable #5396
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
cc @BryanCutler @pitrou if you have any nits about this |
BryanCutler
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks @wesm ! I can work on updating future versions of Spark to not require this.
|
Is it necessary for the env var to have PYARROW in the name ( Moving onto the next word in the var name ;) "legacy" is always subjective/relative. We're always writing tomorrow's legacy code. Is there a more explicit word we can use, in case we have to feature-flag another "legacy" feature some time? Not trying to overthink this, just don't want us to underthink either. |
|
We could call it "ARROW_PRE_0_15_IPC_FORMAT". I used PYARROW to indicate that the effect of the variable is scope limited to the Python library |
|
WFM |
Codecov Report
@@ Coverage Diff @@
## master #5396 +/- ##
===========================================
- Coverage 88.58% 65.58% -23.01%
===========================================
Files 950 499 -451
Lines 126213 69129 -57084
Branches 1495 0 -1495
===========================================
- Hits 111808 45339 -66469
- Misses 14040 23790 +9750
+ Partials 365 0 -365
Continue to review full report at Codecov.
|
python/pyarrow/ipc.py
Outdated
| @@ -101,9 +105,21 @@ class RecordBatchFileWriter(lib._RecordBatchFileWriter): | |||
| Either a file path, or a writable file object | |||
| schema : pyarrow.Schema | |||
| The Arrow schema for data to be written to the file | |||
| use_legacy_format : boolean, default None | |||
| If None, use True unless overridden by PYARROW_LEGACY_IPC_FORMAT=1 | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| If None, use True unless overridden by PYARROW_LEGACY_IPC_FORMAT=1 | |
| If None, use True unless overridden by ARROW_PRE_0_15_IPC_FORMAT =1 |
python/pyarrow/ipc.py
Outdated
| @@ -70,9 +70,13 @@ class RecordBatchStreamWriter(lib._RecordBatchStreamWriter): | |||
| Either a file path, or a writable file object | |||
| schema : pyarrow.Schema | |||
| The Arrow schema for data to be written to the file | |||
| use_legacy_format : boolean, default None | |||
| If None, use True unless overridden by PYARROW_LEGACY_IPC_FORMAT=1 | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| If None, use True unless overridden by PYARROW_LEGACY_IPC_FORMAT=1 | |
| If None, use True unless overridden by ARROW_PRE_0_15_IPC_FORMAT =1 |
… default value by environment variable
|
Think I fixed the sphinx warning. I'm going to let CI fully run just in case... |
It feels gross to alter behavior with environment variables but this is probably the least invasive way to enable Apache Spark users to upgrade to pyarrow 0.15.0 or beyond if they are frozen on an older Spark release