Skip to content

Conversation

@kszucs
Copy link
Member

@kszucs kszucs commented Jun 8, 2021

Which issue does this PR close?

Depends on #493

Rationale for this change

What changes are included in this PR?

Are there any user-facing changes?

name: &str,
path: &str,
has_header: bool,
delimiter: &[u8],
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here we should have a Schema argument exposed as well, but I noticed that FFI hasn't been implemented for Schema and DataType objects in arrow-rs. We should probably expose all of the ArrowSchema based structs there first, then convert pyarrow objects using the C interface rather than calling out to python functions (like the datatype python bindings are currently implemented).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree. I only learnt that the Schema and DataType have a c data interface recently. This likely requires some refactoring on the arrow-rs, as it assumes that metadata do not require a specific in-memory alignment, and yet the c data interface makes such requirement.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should be able to accept pyarrow.Schema objects once apache/arrow-rs#439 gets merged.

@kszucs kszucs changed the title [Python] Expose ExecutionContext.register_csv Expose ExecutionContext.register_csv to the python bindings Jun 9, 2021
@codecov-commenter
Copy link

Codecov Report

Merging #524 (7438b88) into master (d5bca0e) will decrease coverage by 0.03%.
The diff coverage is 0.00%.

❗ Current head 7438b88 differs from pull request most recent head 0412a5b. Consider uploading reports for the commit 0412a5b to get more accurate results
Impacted file tree graph

@@            Coverage Diff             @@
##           master     #524      +/-   ##
==========================================
- Coverage   76.03%   76.00%   -0.04%     
==========================================
  Files         157      157              
  Lines       26990    27001      +11     
==========================================
  Hits        20521    20521              
- Misses       6469     6480      +11     
Impacted Files Coverage Δ
python/src/context.rs 0.00% <0.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update d5bca0e...0412a5b. Read the comment docs.

@kszucs kszucs requested a review from jorgecarleitao June 10, 2021 16:16
@alamb alamb added the python label Jun 11, 2021
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this one ready to go @jorgecarleitao ?

@alamb
Copy link
Contributor

alamb commented Jun 23, 2021

ping @jorgecarleitao

@alamb alamb added the stale-pr label Jul 13, 2021
@alamb
Copy link
Contributor

alamb commented Jul 13, 2021

@jorgecarleitao / @kszucs -- what is the plan for this PR?

@kszucs
Copy link
Member Author

kszucs commented Jul 13, 2021

Since apache/arrow-rs#439 has been merged I can expose the schema argument as well, though we can defer that to a follow-up PR too.

@alamb
Copy link
Contributor

alamb commented Jul 13, 2021

Cool -- I am just trying to shepherd PRs that look like they got stale

@alamb
Copy link
Contributor

alamb commented Aug 2, 2021

The test appear to be failing due to #818

})
}

pub fn to_rust_schema(ob: &PyAny) -> PyResult<Schema> {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copied from https://github.com/apache/arrow-rs/blob/master/arrow-pyarrow-integration-testing/src/lib.rs#L136

Eventually we could add an optional module to arrow-rs where we implement the PyO3 conversion traits for arrow-rs <-> pyarrow interoperability for easier downstream integration.

@kszucs
Copy link
Member Author

kszucs commented Aug 5, 2021

@alamb this should be good to go, though we should revisit the FFI bindings in arrow-rs and a potential arrow-rs <-> pyarrow bridge implemented in arrow-rs in the future.

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @kszucs - I reviewed the test carefully and skimmed the code. Looks great to me. Thank you so much

for table in ["csv", "csv1", "csv2"]:
result = ctx.sql(f"SELECT COUNT(int) FROM {table}").collect()
result = pa.Table.from_batches(result)
assert result.to_pydict() == {"COUNT(int)": [4]}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@alamb alamb merged commit 5a7bbcc into apache:master Aug 6, 2021
@houqp houqp added the enhancement New feature or request label Aug 7, 2021
H0TB0X420 pushed a commit to H0TB0X420/datafusion that referenced this pull request Oct 7, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants