Skip to content

Conversation

@lidavidm
Copy link
Member

This allows using the new option without going through the datasets API, such as to read a single file. A simple benchmark is in the JIRA. This helps close the gap between PyArrow and fsspec read performance, as fsspec performs readahead by default.

@github-actions
Copy link

@lidavidm
Copy link
Member Author

The pre-buffer/Parquet logic has a bug with empty files, which I will need to fix.

@lidavidm lidavidm changed the title ARROW-12428: [Python] Expose pre_buffer in pyarrow.parquet ARROW-12428: [C++][Python] Expose pre_buffer in pyarrow.parquet Apr 16, 2021
@jorisvandenbossche
Copy link
Member

Thanks @lidavidm ! (I know it goes against the point of an RC, but it would be nice if we could still get this into 4.0 ..)

@lidavidm lidavidm changed the title ARROW-12428: [C++][Python] Expose pre_buffer in pyarrow.parquet ARROW-12428: [Python] Expose pre_buffer in pyarrow.parquet May 4, 2021
@lidavidm
Copy link
Member Author

lidavidm commented May 4, 2021

I've rebased this, and tests pass. (Sorry for the long delay here.)

Co-authored-by: Joris Van den Bossche <jorisvandenbossche@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants