-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Open
Labels
bugSomething isn't workingSomething isn't working
Description
Describe the bug
Original bug was filed against Python bindings: apache/datafusion-python#157
Describe the bug
try a sort and export a parquet file using Colab generate an Out of memory error
To Reproduce
!curl -L 'https://drive.google.com/uc?export=download&id=18gv0Yd_a-Zc7CSolol8qeYVAAzSthnSN&confirm=t' > lineitem.parquet
from datafusion import SessionContext
ctx = SessionContext()
ctx.register_parquet('lineitem', 'lineitem.parquet')
df = ctx.sql("select * from lineitem order by l_shipdate")
df.write_parquet("lineitem_Datafusion.parquet")Expected behavior
I expected to use only the available memory
here is the link comparing the same using Polars and DuckDB
https://colab.research.google.com/drive/1pfAPpIG7jpvGB_aHj-PXX66vRaRT0xlj#scrollTo=O8-lyg1y6RT2
Jure-BB and roykim98
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working