ARROW-12428: [Python] Expose pre_buffer in pyarrow.parquet #10074

lidavidm · 2021-04-16T20:13:32Z

This allows using the new option without going through the datasets API, such as to read a single file. A simple benchmark is in the JIRA. This helps close the gap between PyArrow and fsspec read performance, as fsspec performs readahead by default.

github-actions · 2021-04-16T20:13:56Z

https://issues.apache.org/jira/browse/ARROW-12428

lidavidm · 2021-04-16T20:35:23Z

The pre-buffer/Parquet logic has a bug with empty files, which I will need to fix.

python/pyarrow/parquet.py

jorisvandenbossche · 2021-04-19T10:12:46Z

Thanks @lidavidm ! (I know it goes against the point of an RC, but it would be nice if we could still get this into 4.0 ..)

lidavidm · 2021-05-04T17:25:38Z

I've rebased this, and tests pass. (Sorry for the long delay here.)

python/pyarrow/parquet.py

Co-authored-by: Joris Van den Bossche <jorisvandenbossche@gmail.com>

python/pyarrow/parquet.py

lidavidm added the Component: Python label Apr 16, 2021

lidavidm changed the title ~~ARROW-12428: [Python] Expose pre_buffer in pyarrow.parquet~~ ARROW-12428: [C++][Python] Expose pre_buffer in pyarrow.parquet Apr 16, 2021

github-actions bot added Component: C++ Component: Parquet labels Apr 16, 2021

lidavidm added Component: Parquet and removed Component: Parquet labels Apr 16, 2021

jorisvandenbossche reviewed Apr 19, 2021

View reviewed changes

python/pyarrow/parquet.py Outdated Show resolved Hide resolved

python/pyarrow/parquet.py Outdated Show resolved Hide resolved

lidavidm force-pushed the arrow-12428 branch from 2bd2754 to c8e4542 Compare April 26, 2021 18:15

lidavidm added 2 commits May 4, 2021 10:47

ARROW-12428: [Python] Expose pre_buffer in pyarrow.parquet

556e3ca

ARROW-12428: [Python] Tweak pre_buffer docstring

f0c8c61

lidavidm force-pushed the arrow-12428 branch from c8e4542 to f0c8c61 Compare May 4, 2021 14:56

lidavidm changed the title ~~ARROW-12428: [C++][Python] Expose pre_buffer in pyarrow.parquet~~ ARROW-12428: [Python] Expose pre_buffer in pyarrow.parquet May 4, 2021

jorisvandenbossche reviewed May 4, 2021

View reviewed changes

python/pyarrow/parquet.py Outdated Show resolved Hide resolved

Update python/pyarrow/parquet.py

6eced09

Co-authored-by: Joris Van den Bossche <jorisvandenbossche@gmail.com>

jorisvandenbossche reviewed May 4, 2021

View reviewed changes

python/pyarrow/parquet.py Outdated Show resolved Hide resolved

Update python/pyarrow/parquet.py

cdc3029

jorisvandenbossche approved these changes May 4, 2021

View reviewed changes

jorisvandenbossche closed this in 3a9aea3 May 5, 2021

ian-r-rose mentioned this pull request Apr 19, 2022

Use pre_buffer=True for "pyarrow" parquet engine dask/dask#8952

Merged

3 tasks

asfimport mentioned this pull request May 5, 2021

[Python] pyarrow.parquet.read_* should use pre_buffer=True #28218

Closed

jorisvandenbossche mentioned this pull request Sep 25, 2023

[Python][Dataset][Parquet] Enable Pre-Buffering by default for Parquet s3 datasets #36765

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

ARROW-12428: [Python] Expose pre_buffer in pyarrow.parquet #10074

ARROW-12428: [Python] Expose pre_buffer in pyarrow.parquet #10074

Uh oh!

lidavidm commented Apr 16, 2021

Uh oh!

github-actions bot commented Apr 16, 2021

Uh oh!

lidavidm commented Apr 16, 2021

Uh oh!

Uh oh!

Uh oh!

jorisvandenbossche commented Apr 19, 2021

Uh oh!

lidavidm commented May 4, 2021

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

ARROW-12428: [Python] Expose pre_buffer in pyarrow.parquet #10074

ARROW-12428: [Python] Expose pre_buffer in pyarrow.parquet #10074

Uh oh!

Conversation

lidavidm commented Apr 16, 2021

Uh oh!

github-actions bot commented Apr 16, 2021

Uh oh!

lidavidm commented Apr 16, 2021

Uh oh!

Uh oh!

Uh oh!

jorisvandenbossche commented Apr 19, 2021

Uh oh!

lidavidm commented May 4, 2021

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants