Conversation
eba4608 to
db08691
Compare
|
Thanks 🙏! I’ve also tried the alternative approach proposed: #64. The shim layer is not that complex and allows the Rust API to remain idiomatic Another reason for preferring this approach is that Let me know what are your thoughts. |
|
@osmaczko I might push back on some of your claims. Though I don't disagree with your proposal.
Disclaimer: I think this is one of those I have been enjoying having the nim bindings within this repository for 2 reasons. 1. A Product reminder:Its a reminder that the only consumer to be considered at this time is logos-chat.
Strictly speaking libchat exists to support 2. Offset boundaries.Currently the boundary between rust and nim lives within a single repository. This has been helpful for catching integration issues and such as this ABI issue. Every commit to main, ships with updated and tested nim-bindings which forces them to remain in lockstep. CI Actions can detect mismatches and errors, before they are published. Personally as a contributor this has been very helpful. If the repos aligned on language barriers then there is little observability into the highest failure point - FFI layers. The boundary between repositories is then aligned on separation of logic and functionality. Libchat provides implementations of the protocols; logos-chat provides implementation details such as IO and networking. FutureThere are many benefits to separating the bindings. Improved build times, separation of concerns, etc, etc. I feel this hold especially pronounced when there are multiple bindings. In the future I think this argument gets quite strong. For now I find it helps to think of libchat as a nim library (which happens to use alittle bit of rust in the backend). Very open to counter points and discussion here. |
|
@jazzz thanks, I agree with your reasoning. Let's follow YAGNI and postpone more complex solution. |
|
LGTM, I'm also wondering if similar issues / topics been discussed else where in Nim community? |
* chore(nim-bindings): replace dynlib dlopen with plain importc
The dynlib pragma hard-coded a library path and resolved it via dlopen() at
runtime, preventing static linking and forcing a specific load-time path.
Using bare {.importc.} lets consumers choose: link liblibchat dynamically
at link time (--passL:-llibchat) or link it statically into their binary.
* Rust -> Nim ABI (#62)
* Use correct build hook
* force sret like return from rust code for nim compatibility
* Fix target mismatch
* Update usages
* ci: add nim-bindings-test
* fix(nim-bindings): fix ABI mismatch in destroy_* FFI functions and add defer-based cleanup
Nim's C backend silently transforms large struct parameters (>16 bytes) into
pointer parameters when calling importc functions. The destroy_* functions were
declared taking T by value in Rust, but Nim always passed &T — causing Rust to
read garbage from the stack on x86-64 (SIGILL on CI) while accidentally working
on ARM64 macOS due to that ABI coincidentally also using pointers for large structs.
Fix by changing all destroy_* functions to take &mut T and using drop_in_place,
which is the correct idiom for dropping a value through a pointer.
On the Nim side, replace scattered manual destroy calls with defer, which
guarantees cleanup on all exit paths and prevents use-after-destroy bugs.
---------
Co-authored-by: Jazz Turner-Baggs <473256+jazzz@users.noreply.github.com>
Problem
From what I can see, there is an ABI incompatibility between NIM and safer_ffi. The issue appears to be caused by a difference in the calling conventions used by nim.
From initial testing. safer_ffi and c-binding interfaces use a convention called
sretto handle the return of large structs. Where nim overwrites the signature to include an extraoutparameter and does not adhere to thesretconvention. AARM64 defines this behavior inaapcs64This was surfaced by #61 which exposed a SegFault. Having spent more time that I'd like digging around in compiler code, I was unable to find concrete reason of why this was able to function before.
The reason for the segfault on my Apple Silicone mac is clear. According to aapcs64, the rust function expected the caller to allocate memory and write the address to register
x8. As nim did not follow this convention, it never initialized registerx8, resulting to undefined behavior when the memory location is accessed.Solution
This PR removes the calling convention ambiguity and forces the rust functions to follow nim convention .
Other approaches considered:
sretconventions. This produces yet another layer and seemed more problematic.Notes