-
Notifications
You must be signed in to change notification settings - Fork 4k
ARROW-10149: [Rust] Improved support for externally owned memory regions #8316
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
rust/arrow/src/buffer.rs
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was not being used, and thus I dropped it.
rust/arrow/src/bytes.rs
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps using a closure that mutates a captured variable?
Apparently you need to use either FnMut or a Cell:
https://stackoverflow.com/questions/38677736/passing-a-closure-that-modifies-its-environment-to-a-function-in-rust
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. I had to introduce a Mutex, because a FnMut is now mutable, while the data itself is not. I am not very happy with this, but it makes sense as we cannot assume that the C data interface is thread-safe, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I reverted this. I do not think that that Fn should be mutable, as it is just performing an FFI call, over which Rust does not need to know about mutability. I am still trying to test it, but I think that something like Arc<dyn Fn(&mut Bytes)> is a better signature,.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why don't you use a Cell?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let b = Cell::new(false);
let dealloc = Arc::new(|bytes: &mut Bytes| {
*b.get_mut() = true;
assert_eq!(bytes.as_slice(), &b"hello"[1..4]);
});does not compile because it requires moving b to inside the closure: if we move b to inside the closure (using move), the closure is no longer Fn, but FnMut. If the closure is FnMut, we can no longer wrap it inside an Arc, as Arc is immutable. To make it immutable, we need to wrap it around a Mutex.
The difference between the code we are going here and the example in SO is that we have an immutable function, as the underlying resource that this function acts upon is outside rust. I.e. from rust's perspective, the function is Fn.
One way to test this would be to make the closure to write something to a file, and verify that that was written. I.e. test that the function mutated something outside of Rust.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After trying out varying things, I got the following to work:
use std::cell::Cell;
use std::sync::Arc;
pub type VoidFn = Arc<dyn Fn()>;
fn main() {
let integer = Arc::new(Cell::new(5));
let inner = integer.clone();
let closure = Arc::new(move || {
inner.set(inner.get() + 1);
});
execute_closure(closure);
println!("After closure: {}", integer.get());
}
fn execute_closure(func: VoidFn)
{
func();
}I may be missing something though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By the way, by definition a destructor will mutate some state (visible or not), so it seems FnMut may be fine too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is awesome! Thanks a lot for the help and insight @pitrou .
The destructor mutates state. Shouldn't that state be only the state of self? IMO that is the reason rust's Drop requires a mutable ref of self, fn drop(&mut self).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have no idea, I am out of my depth here (not a Rust developer :-)).
The existing implementation was not useful to support FFI as it did not specify how to release memory.
|
Closing in favor of #8401 |
Background
Currently, a memory region (
arrow::buffer::BufferData) always knows its capacity, that it uses todropitself once it is no longer needed. It also knows whether it needs to be dropped or not viaBufferData::owner: bool.However, this is insufficient for the purposes of supporting the C Data Interface, which requires informing the owner that the region is no longer needed, typically via a function call (
release), for reference counting by the owner of the region.This PR
This PR generalizes
BufferData(and renames it toBytes, which is more natural name for this structure, a-labytes::Bytes) to support foreign deallocators. Specifically, it accepts two deallocation modes:Native(usize): the current implementationForeign(Fn): an implementation that calls a function (which can be used to call a FFI)FYI @pitrou , @nevi-me @paddyhoran @sunchao
Related to #8052 , which IMO is blocked by this functionality.