Skip to content

Conversation

@wesm
Copy link
Member

@wesm wesm commented Sep 16, 2019

It feels gross to alter behavior with environment variables but this is probably the least invasive way to enable Apache Spark users to upgrade to pyarrow 0.15.0 or beyond if they are frozen on an older Spark release

@wesm
Copy link
Member Author

wesm commented Sep 16, 2019

cc @BryanCutler @pitrou if you have any nits about this

Copy link
Member

@BryanCutler BryanCutler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks @wesm ! I can work on updating future versions of Spark to not require this.

@nealrichardson
Copy link
Member

Is it necessary for the env var to have PYARROW in the name (PYARROW_LEGACY_IPC_FORMAT)? Since we should probably do the same in R (https://issues.apache.org/jira/browse/ARROW-6539), would ARROW_LEGACY_IPC_FORMAT be better?

Moving onto the next word in the var name ;) "legacy" is always subjective/relative. We're always writing tomorrow's legacy code. Is there a more explicit word we can use, in case we have to feature-flag another "legacy" feature some time? Not trying to overthink this, just don't want us to underthink either.

@wesm
Copy link
Member Author

wesm commented Sep 17, 2019

We could call it "ARROW_PRE_0_15_IPC_FORMAT". I used PYARROW to indicate that the effect of the variable is scope limited to the Python library

@nealrichardson
Copy link
Member

WFM

@codecov-io
Copy link

Codecov Report

Merging #5396 into master will decrease coverage by 23%.
The diff coverage is 100%.

Impacted file tree graph

@@             Coverage Diff             @@
##           master    #5396       +/-   ##
===========================================
- Coverage   88.58%   65.58%   -23.01%     
===========================================
  Files         950      499      -451     
  Lines      126213    69129    -57084     
  Branches     1495        0     -1495     
===========================================
- Hits       111808    45339    -66469     
- Misses      14040    23790     +9750     
+ Partials      365        0      -365
Impacted Files Coverage Δ
python/pyarrow/ipc.pxi 81.56% <100%> (+0.52%) ⬆️
python/pyarrow/tests/test_ipc.py 99.07% <100%> (+0.03%) ⬆️
python/pyarrow/ipc.py 100% <100%> (ø) ⬆️
cpp/src/arrow/util/memory.h 0% <0%> (-100%) ⬇️
cpp/src/gandiva/date_utils.h 0% <0%> (-100%) ⬇️
cpp/src/arrow/util/memory.cc 0% <0%> (-100%) ⬇️
cpp/src/gandiva/decimal_type_util.h 0% <0%> (-100%) ⬇️
cpp/src/arrow/filesystem/util_internal.cc 0% <0%> (-100%) ⬇️
cpp/src/arrow/compute/logical_type.h 0% <0%> (-100%) ⬇️
cpp/src/parquet/hasher.h 0% <0%> (-100%) ⬇️
... and 700 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 3bf4d80...5300e3a. Read the comment docs.

@@ -101,9 +105,21 @@ class RecordBatchFileWriter(lib._RecordBatchFileWriter):
Either a file path, or a writable file object
schema : pyarrow.Schema
The Arrow schema for data to be written to the file
use_legacy_format : boolean, default None
If None, use True unless overridden by PYARROW_LEGACY_IPC_FORMAT=1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
If None, use True unless overridden by PYARROW_LEGACY_IPC_FORMAT=1
If None, use True unless overridden by ARROW_PRE_0_15_IPC_FORMAT =1

@@ -70,9 +70,13 @@ class RecordBatchStreamWriter(lib._RecordBatchStreamWriter):
Either a file path, or a writable file object
schema : pyarrow.Schema
The Arrow schema for data to be written to the file
use_legacy_format : boolean, default None
If None, use True unless overridden by PYARROW_LEGACY_IPC_FORMAT=1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
If None, use True unless overridden by PYARROW_LEGACY_IPC_FORMAT=1
If None, use True unless overridden by ARROW_PRE_0_15_IPC_FORMAT =1

@wesm
Copy link
Member Author

wesm commented Sep 18, 2019

Think I fixed the sphinx warning. I'm going to let CI fully run just in case...

@wesm
Copy link
Member Author

wesm commented Sep 18, 2019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants