Skip to content

Conversation

@MartinNowak
Copy link
Contributor

@MartinNowak MartinNowak commented Jul 2, 2025

Rationale for this change

Expose remaining csv write options to pyarrow.

What changes are included in this PR?

Adding eol and null_string to csv.WriteOptions.

Are these changes tested?

Yes, testing of setters included.

Are there any user-facing changes?

@github-actions
Copy link

github-actions bot commented Jul 2, 2025

⚠️ GitHub issue #34577 has been automatically assigned in GitHub to PR creator.

@AlenkaF
Copy link
Member

AlenkaF commented Jul 7, 2025

Thank you for submitting a PR!
There are some fixes needed for the CI to pass:

  • Linter error:
    python/pyarrow/tests/test_csv.py:363:89: E501 line too long (119 > 88 characters)
  • Python builds error:
    pyarrow/_csv.pyx:1431:34: Object of type 'CCSVWriteOptions &' has no attribute 'eol'
    -> CCSVWriteOptions will need to be updated in the libarrow.pxd

@MartinNowak MartinNowak force-pushed the fix-34577-expand-csv-write-options branch 2 times, most recently from 34366df to dd3ae8f Compare August 4, 2025 11:45
@MartinNowak MartinNowak changed the title GH-34577 [Python] Expose eol and null_value csv WriteOptions GH-34577 [Python] Expose eol and null_string csv WriteOptions Aug 4, 2025
@MartinNowak MartinNowak force-pushed the fix-34577-expand-csv-write-options branch 2 times, most recently from 58c1bd0 to 6907545 Compare August 7, 2025 14:37
@AlenkaF
Copy link
Member

AlenkaF commented Aug 11, 2025

The failures are connected, see:

______________________________ test_write_options ______________________________

    def test_write_options():
        cls = WriteOptions
        opts = cls()
    
>       check_options_class(
            cls, include_header=[True, False], delimiter=[',', '\t', '|'],
            eol=['\n', '\r\n'], null_string=['', 'NA'],
            quoting_style=['needed', 'none', 'all_valid'])

opt/conda/envs/arrow/lib/python3.9/site-packages/pyarrow/tests/test_csv.py:362: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
opt/conda/envs/arrow/lib/python3.9/site-packages/pyarrow/tests/test_csv.py:111: in check_options_class
    opts = cls(**non_defaults)
pyarrow/_csv.pyx:1388: in pyarrow._csv.WriteOptions.__init__
    ???
pyarrow/_csv.pyx:1446: in pyarrow._csv.WriteOptions.null_string.__set__
    ???
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

>   ???
E   TypeError: expected bytes, NoneType found

@MartinNowak MartinNowak force-pushed the fix-34577-expand-csv-write-options branch from 6907545 to f95d305 Compare August 15, 2025 06:58
@github-actions github-actions bot added awaiting committer review Awaiting committer review and removed awaiting review Awaiting review labels Aug 15, 2025
Copy link
Contributor Author

@MartinNowak MartinNowak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was on vacation and it took a bit longer to get back to this. Hope it runs through now 🤞 @AlenkaF.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

???
E TypeError: expected bytes, NoneType found

It found this glorious typo 🤦. Only have semi-working syntax highlighting and didn't have enough time to figure out how to run the tests locally :/.

@AlenkaF
Copy link
Member

AlenkaF commented Aug 15, 2025

The tests are passing 👍
But the docstrings need an update:

pyarrow._csv.WriteOptions
-> pyarrow._csv.WriteOptions(include_header=None, *, batch_size=None, delimiter=None, eol=None, null_string=None, quoting_style=None)
PR01: Parameters {'delimiter', 'null_string', 'eol', 'batch_size', 'quoting_style'} not documented
PR04: Parameter "")" has no type

@MartinNowak MartinNowak force-pushed the fix-34577-expand-csv-write-options branch from f95d305 to 08fa441 Compare August 21, 2025 09:14
Copy link
Contributor Author

@MartinNowak MartinNowak Aug 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The tests are passing 👍
But the docstrings need an update:

Needed to escape the backslash to render in final doc rather than to break the doc comment.

@MartinNowak
Copy link
Contributor Author

The macOS failures seem to be apache/orc#2357 and were not present previously f95d305 @AlenkaF.

Copy link
Member

@raulcd raulcd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The macOS failures are fixed on main, can you rebase so they are fixed on CI, please?

@raulcd
Copy link
Member

raulcd commented Aug 26, 2025

@github-actions crossbow submit -g python

@github-actions github-actions bot added awaiting changes Awaiting changes and removed awaiting committer review Awaiting committer review labels Aug 26, 2025
@github-actions
Copy link

Revision: 08fa4412afc3bbb2e1599a4ad9557fe8a0f3a075

Submitted crossbow builds: ursacomputing/crossbow @ actions-c05e9be928

Task Status
example-python-minimal-build-fedora-conda GitHub Actions
example-python-minimal-build-ubuntu-venv GitHub Actions
test-conda-python-3.10 GitHub Actions
test-conda-python-3.10-hdfs-2.9.2 GitHub Actions
test-conda-python-3.10-hdfs-3.2.1 GitHub Actions
test-conda-python-3.10-pandas-latest-numpy-latest GitHub Actions
test-conda-python-3.11 GitHub Actions
test-conda-python-3.11-dask-latest GitHub Actions
test-conda-python-3.11-dask-upstream_devel GitHub Actions
test-conda-python-3.11-hypothesis GitHub Actions
test-conda-python-3.11-pandas-latest-numpy-1.26 GitHub Actions
test-conda-python-3.11-pandas-latest-numpy-latest GitHub Actions
test-conda-python-3.11-pandas-nightly-numpy-nightly GitHub Actions
test-conda-python-3.11-pandas-upstream_devel-numpy-nightly GitHub Actions
test-conda-python-3.11-spark-master GitHub Actions
test-conda-python-3.12 GitHub Actions
test-conda-python-3.12-cpython-debug GitHub Actions
test-conda-python-3.13 GitHub Actions
test-conda-python-3.9 GitHub Actions
test-conda-python-3.9-pandas-1.1.3-numpy-1.19.5 GitHub Actions
test-conda-python-emscripten GitHub Actions
test-cuda-python-ubuntu-22.04-cuda-11.7.1 GitHub Actions
test-debian-12-python-3-amd64 GitHub Actions
test-debian-12-python-3-i386 GitHub Actions
test-fedora-42-python-3 GitHub Actions
test-ubuntu-22.04-python-3 GitHub Actions
test-ubuntu-22.04-python-313-freethreading GitHub Actions
test-ubuntu-24.04-python-3 GitHub Actions

@MartinNowak MartinNowak force-pushed the fix-34577-expand-csv-write-options branch from 08fa441 to 220c3fc Compare August 27, 2025 06:09
@github-actions github-actions bot added awaiting change review Awaiting change review and removed awaiting changes Awaiting changes labels Aug 27, 2025
Copy link
Member

@raulcd raulcd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR, sorry I took a little to review.
This seems to only be testing that the options class can be generated but is not indeed testing the options, on test_csv.py we have other tests that indeed test the options like:

def test_write_read_round_trip():

or
def test_write_quoting_style():

Can we test the options are indeed working as expected when writing the CSV?

@github-actions github-actions bot added awaiting changes Awaiting changes and removed awaiting change review Awaiting change review labels Aug 27, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants