-
Notifications
You must be signed in to change notification settings - Fork 56
Description
What happens?
Thanks for the great work on this project! I noticed an issue with exports to Apache Arrow from DuckDB queries in v1.4.0 which changes behaviors previously experienced. Using .arrow() from a DuckDB return now returns a pyarrow.lib.RecordBatchReader instead of a pyarrow.lib.Table. This goes against the documentation found here (which stipulates that RecordBatchReaders are the alternative): https://duckdb.org/docs/stable/guides/python/export_arrow.html.
If instead the docs and intention are updated here to prefer streaming/batched processing I suggest that all docs pages be updated to reflect this fact.
To Reproduce
Please see the following gist for an example of output: https://gist.github.com/d33bs/d49279c46142f6375b3d4f070ddd6e0f
Generally, we should see a pyarrow.lib.Table type returned from the following code (and instead see pyarrow.lib.RecordBatchReader with v1.4.0):
import duckdb
with duckdb.connect() as ddb:
result = ddb.execute("SELECT 1,2,3;").arrow()
type(result)OS:
MacOS
DuckDB Package Version:
1.4.0
Python Version:
3.12
Full Name:
Dave Bunten
Affiliation:
University of Colorado Anschutz Medical Campus
What is the latest build you tested with? If possible, we recommend testing with the latest nightly build.
I have tested with a nightly build
Did you include all relevant data sets for reproducing the issue?
Not applicable - the reproduction does not require a data set
Did you include all code required to reproduce the issue?
- Yes, I have
Did you include all relevant configuration to reproduce the issue?
- Yes, I have