ARROW-6474: [Python] Add option to use legacy / pre-0.15 IPC message format and to set the default using PYARROW_LEGACY_IPC_FORMAT environment variable #5396

wesm · 2019-09-16T21:25:00Z

It feels gross to alter behavior with environment variables but this is probably the least invasive way to enable Apache Spark users to upgrade to pyarrow 0.15.0 or beyond if they are frozen on an older Spark release

wesm · 2019-09-16T21:25:24Z

cc @BryanCutler @pitrou if you have any nits about this

BryanCutler

LGTM, thanks @wesm ! I can work on updating future versions of Spark to not require this.

nealrichardson · 2019-09-16T21:52:53Z

Is it necessary for the env var to have PYARROW in the name (PYARROW_LEGACY_IPC_FORMAT)? Since we should probably do the same in R (https://issues.apache.org/jira/browse/ARROW-6539), would ARROW_LEGACY_IPC_FORMAT be better?

Moving onto the next word in the var name ;) "legacy" is always subjective/relative. We're always writing tomorrow's legacy code. Is there a more explicit word we can use, in case we have to feature-flag another "legacy" feature some time? Not trying to overthink this, just don't want us to underthink either.

wesm · 2019-09-17T01:34:12Z

We could call it "ARROW_PRE_0_15_IPC_FORMAT". I used PYARROW to indicate that the effect of the variable is scope limited to the Python library

nealrichardson · 2019-09-17T02:53:13Z

WFM

codecov-io · 2019-09-17T04:35:31Z

Codecov Report

Merging #5396 into master will decrease coverage by 23%.
The diff coverage is 100%.

@@             Coverage Diff             @@
##           master    #5396       +/-   ##
===========================================
- Coverage   88.58%   65.58%   -23.01%     
===========================================
  Files         950      499      -451     
  Lines      126213    69129    -57084     
  Branches     1495        0     -1495     
===========================================
- Hits       111808    45339    -66469     
- Misses      14040    23790     +9750     
+ Partials      365        0      -365

Impacted Files	Coverage Δ
python/pyarrow/ipc.pxi	`81.56% <100%> (+0.52%)`	⬆️
python/pyarrow/tests/test_ipc.py	`99.07% <100%> (+0.03%)`	⬆️
python/pyarrow/ipc.py	`100% <100%> (ø)`	⬆️
cpp/src/arrow/util/memory.h	`0% <0%> (-100%)`	⬇️
cpp/src/gandiva/date_utils.h	`0% <0%> (-100%)`	⬇️
cpp/src/arrow/util/memory.cc	`0% <0%> (-100%)`	⬇️
cpp/src/gandiva/decimal_type_util.h	`0% <0%> (-100%)`	⬇️
cpp/src/arrow/filesystem/util_internal.cc	`0% <0%> (-100%)`	⬇️
cpp/src/arrow/compute/logical_type.h	`0% <0%> (-100%)`	⬇️
cpp/src/parquet/hasher.h	`0% <0%> (-100%)`	⬇️
... and 700 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 3bf4d80...5300e3a. Read the comment docs.

romainfrancois · 2019-09-18T13:04:47Z

python/pyarrow/ipc.py

@@ -101,9 +105,21 @@ class RecordBatchFileWriter(lib._RecordBatchFileWriter):
        Either a file path, or a writable file object
    schema : pyarrow.Schema
        The Arrow schema for data to be written to the file
+    use_legacy_format : boolean, default None
+        If None, use True unless overridden by PYARROW_LEGACY_IPC_FORMAT=1


Suggested change

If None, use True unless overridden by PYARROW_LEGACY_IPC_FORMAT=1

If None, use True unless overridden by ARROW_PRE_0_15_IPC_FORMAT =1

romainfrancois · 2019-09-18T13:04:57Z

python/pyarrow/ipc.py

@@ -70,9 +70,13 @@ class RecordBatchStreamWriter(lib._RecordBatchStreamWriter):
        Either a file path, or a writable file object
    schema : pyarrow.Schema
        The Arrow schema for data to be written to the file
+    use_legacy_format : boolean, default None
+        If None, use True unless overridden by PYARROW_LEGACY_IPC_FORMAT=1


Suggested change

If None, use True unless overridden by PYARROW_LEGACY_IPC_FORMAT=1

If None, use True unless overridden by ARROW_PRE_0_15_IPC_FORMAT =1

… default value by environment variable

wesm · 2019-09-18T18:03:24Z

Think I fixed the sphinx warning. I'm going to let CI fully run just in case...

wesm · 2019-09-18T18:47:17Z

Travis CI: https://travis-ci.org/wesm/arrow/builds/586658822

BryanCutler approved these changes Sep 16, 2019

View reviewed changes

wesm force-pushed the ARROW-6474 branch from 5300e3a to e569436 Compare September 17, 2019 23:55

romainfrancois reviewed Sep 18, 2019

View reviewed changes

wesm added 3 commits September 18, 2019 10:06

Add option to use legacy / pre-0.15 IPC message format and to set the…

e8cde58

… default value by environment variable

Rename environment variable per comments

0bb54d4

Synchronize docsstrings

a1768bd

wesm force-pushed the ARROW-6474 branch from e569436 to a1768bd Compare September 18, 2019 15:06

Fix sphinx warning

52a966d

wesm force-pushed the ARROW-6474 branch from 76140a6 to 52a966d Compare September 18, 2019 18:02

wesm closed this in 176adf5 Sep 18, 2019

wesm deleted the ARROW-6474 branch September 18, 2019 18:48

asfimport mentioned this pull request Sep 7, 2020

[Python] Provide mechanism for python to write out old format #22843

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

ARROW-6474: [Python] Add option to use legacy / pre-0.15 IPC message format and to set the default using PYARROW_LEGACY_IPC_FORMAT environment variable #5396

ARROW-6474: [Python] Add option to use legacy / pre-0.15 IPC message format and to set the default using PYARROW_LEGACY_IPC_FORMAT environment variable #5396

Uh oh!

wesm commented Sep 16, 2019

Uh oh!

wesm commented Sep 16, 2019

Uh oh!

BryanCutler left a comment

Uh oh!

nealrichardson commented Sep 16, 2019

Uh oh!

wesm commented Sep 17, 2019 •

edited

Loading

Uh oh!

nealrichardson commented Sep 17, 2019

Uh oh!

codecov-io commented Sep 17, 2019

Uh oh!

romainfrancois Sep 18, 2019

Uh oh!

romainfrancois Sep 18, 2019

Uh oh!

wesm commented Sep 18, 2019

Uh oh!

wesm commented Sep 18, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

	If None, use True unless overridden by PYARROW_LEGACY_IPC_FORMAT=1
	If None, use True unless overridden by ARROW_PRE_0_15_IPC_FORMAT =1

ARROW-6474: [Python] Add option to use legacy / pre-0.15 IPC message format and to set the default using PYARROW_LEGACY_IPC_FORMAT environment variable #5396

ARROW-6474: [Python] Add option to use legacy / pre-0.15 IPC message format and to set the default using PYARROW_LEGACY_IPC_FORMAT environment variable #5396

Uh oh!

Conversation

wesm commented Sep 16, 2019

Uh oh!

wesm commented Sep 16, 2019

Uh oh!

BryanCutler left a comment

Choose a reason for hiding this comment

Uh oh!

nealrichardson commented Sep 16, 2019

Uh oh!

wesm commented Sep 17, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

nealrichardson commented Sep 17, 2019

Uh oh!

codecov-io commented Sep 17, 2019

Codecov Report

Uh oh!

romainfrancois Sep 18, 2019

Choose a reason for hiding this comment

Uh oh!

romainfrancois Sep 18, 2019

Choose a reason for hiding this comment

Uh oh!

wesm commented Sep 18, 2019

Uh oh!

wesm commented Sep 18, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

wesm commented Sep 17, 2019 •

edited

Loading