-
Notifications
You must be signed in to change notification settings - Fork 4k
ARROW-10111: [Rust] Added new crate with code that consumes C Data interface #8287
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Thanks for opening a pull request! Could you open an issue for this pull request on JIRA? Then could you also rename pull request title in the following format? See also: |
|
As a general design suggestion, on the C++ side we have separate structs (classes) for the import (consumer) side and the export (producer) side. Just 2 cents, though. |
Can you explain the rational for the two structs? Is it due to different ownership rules? The design so far (for consumption) in this PR:
I am still a bit lost as to who owns what: when we pass the pointer from C to Rust, does C assume that it should not free the resources if it goes out of scope? I.e. what is the contract between the consumer and the producer with respect to whom should free memory (rust's by calling the "free")? Or is the contract: both check for pointer nullability of the |
|
Having In addition to those two low-level structs, you need specific structs for importing and exporting. When exporting, you need an allocated struct that gets stored in the You can see the C++ array release callback here: https://github.com/apache/arrow/blob/master/cpp/src/arrow/c/bridge.cc#L482-L502 delete reinterpret_cast<ExportedArrayPrivateData*>(array->private_data);This is taking the When importing, you need a struct that contains the If you choose to manage ownership through buffers, since an array will have several buffers you probably want your importing struct to reference-count the You can see the exact C++ equivalent here: https://github.com/apache/arrow/blob/master/cpp/src/arrow/c/bridge.cc#L1144-L1177
Please tell me if that's clear, or don't hesitate to ask more questions :-) |
|
To set expectations right, IMO this is a very difficult task. IMO rhere are at the moment 3 issues: buffer slices aka buffer
|
Yes, it does. I think you're overestimating the difficulty here, it's actually quite simple. In C++, Buffers (and slices, which are Buffers) are reference-counted (using the shared_ptr class). To take your example:
And anyway, Rust shouldn't worry about what happens on the C++ side. You're getting a
Ah, that may be a problem. But you can start with non-dictionary arrays. I'd also recommend to start small (only primitive types, for example), check that everything works (especially lifetime handling), then implement more types.
You don't have to. |
|
In Rust, I see that I would suggest you start with that: implement only exporting, start with primitive types. Exercise using Python tests to import the data (e.g. For importing, it seems |
|
(perhaps there's also a Rust developer that's more familiar with C++ and can help you?) |
The existing implementation was not useful to support FFI as it did not specify how to release memory.
|
I have been heavily working in this problem based on your ideas, @pitrou on a separate branch, and I think I need some input. That code is still a mess, as I am still in design/experimentation phase. What it can do so far:
Step 2 causes a double free and crashes when Python releases the resource. I know why and I am working on it. While working on it, I found the catch, which I would welcome very much your input. Currently, in Rust, two distinct arrays can share a buffer via an (atomically counted) shared pointer, Say we have two arrays In this direction, exporting an array (without children for for now) is equivalent to increase the ref count by 1, and releasing the exported array is equivalent to decrease it by 1. Specifically, exporting an array consists of
This is artificially stating that our struct now also shares read ownership over that data. Because the refcount was increased by 1, rust won't free the resources automatically. Releasing an Array consists of:
Does this make any sense? Btw, is this what it is meant in this section of the C Data interface wrt to |
|
You're doing it wrong. I suggest again that you try to follow how C++ does things, otherwise you'll get lost. For example, your So, what needs to happen is you have something like this (not necessarily working, but you get the idea): struct ExportedArrayData {
buffers: Vec<Buffer> buffers,
// other stuff perhaps...
};Then your
And again, I suggest you tackle importing and exporting separately. |
|
Sorry, I think that did not make myself very clear, and/or that the code is still not very well documented, but I have been working towards exactly that. I was just a bit unsure about what you store in the private data, to help the release. That cleared it up. |
|
Hey @jorgecarleitao, I'll only be able to look at this either over the weekend or during the coming week |
|
No need, @nevi-me I am still working on this on another place. I will close it as this won't fly. I will PR separately. |
|
No worries, I'll still have a look at this so I can see the approach that you're taking |
|
Prob. the best place to see that is in this PR on my repo. It has the latest version. |
This PR: