Skip to content

rusk sdk v2 init#500

Open
samuel100 wants to merge 25 commits intomainfrom
feature/rust-sdk-v2
Open

rusk sdk v2 init#500
samuel100 wants to merge 25 commits intomainfrom
feature/rust-sdk-v2

Conversation

@samuel100
Copy link
Contributor

Add Rust SDK v2

Introduces the Foundry Local Rust SDK (sdk_v2/rust), updated Rust samples (samples/rust), and CI pipeline support.

SDK (sdk_v2/rust)

A safe, async Rust wrapper over the Foundry Local Core native library, providing:

  • FoundryLocalManager — singleton entry point that initializes the native engine, exposes the model catalog, and manages the local web service lifecycle.
  • Catalog — async model discovery with a 6-hour TTL cache. Lookup by alias or variant ID, with helpful error messages listing available alternatives on miss.
  • Model / ModelVariant — full model lifecycle: download (with streaming progress), load, unload, cache inspection, and removal.
  • ChatClient — OpenAI-compatible chat completions (non-streaming and streaming via futures_core::Stream). Supports temperature, top-p/top-k, response format (text, JSON, JSON schema, Lark grammar), and tool calling.
  • AudioClient — OpenAI-compatible audio transcription (non-streaming and streaming). Accepts impl AsRef<Path> for file paths.
  • Error handlingthiserror-based FoundryLocalError enum with discriminated variants (LibraryLoad, CommandExecution, ModelOperation, Validation, etc.) and a crate-wide Result<T> alias.
  • FFI bridge (detail/core_interop.rs) — dynamically loads the native .dll/.so/.dylib via libloading, with platform-aware memory deallocation and a trampoline pattern for streaming callbacks. Async methods use tokio::task::spawn_blocking to keep the runtime unblocked.
  • Build script (build.rs) — automatically downloads NuGet packages (Core, ORT, GenAI) at compile time, extracts platform-specific native libraries, and caches them across builds. Supports winml and nightly feature flags.

Re-exports async-openai request/response types for convenience so callers don't need a direct dependency.

Samples (samples/rust)

Replaces the old hello-foundry-local sample with four focused examples:

  • native-chat-completions — basic sync + streaming chat completion
  • tool-calling-foundry-local — multi-turn tool calling with streaming chunk assembly
  • audio-transcription-example — audio file transcription (sync and streaming)
  • foundry-local-webserver — starts the local web service and makes requests via the OpenAI-compatible HTTP endpoint

Each sample has its own Cargo.toml, README.md, and is included in the workspace.

CI (.github/workflows)

  • build-rust-steps.yml — reusable composite action for Rust SDK build + test steps.
  • foundry-local-sdk-build.yml — workflow that triggers on sdk_v2/rust/** and samples/rust/** changes.

Testing

  • Integration tests mirror the JavaScript SDK test suite structure: manager_test, catalog_test, model_test, model_load_manager_test, chat_client_test, audio_client_test.
  • All tests are #[ignore] by default (require native library at runtime) and can be enabled in CI with --include-ignored.
  • Shared test utilities (tests/common/mod.rs) provide config, model aliases, and helper functions.
  • cargo clippy --all-targets --all-features passes with zero code warnings.

@vercel
Copy link

vercel bot commented Mar 9, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
foundry-local Ready Ready Preview, Comment Mar 12, 2026 5:29pm

Request Review

Cargo compiles each tests/*.rs as a separate binary. The native .NET
core has global state — initializing from a second binary causes
'FoundryLocalCore has already been initialized'. Consolidating all
test modules into tests/integration.rs ensures the OnceLock singleton
initializes once and all 30 tests share it.

Also adds --test-threads=1 to CI since tests share model load/unload
state and cannot safely run in parallel.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Audio tests without explicit temperature produced non-deterministic
  transcription output. Set language('en') and temperature(0.0) on all
  audio test variants for consistent results across platforms.
- Fix catalog unknown alias assertion to match actual error message
  ('Unknown model alias' not 'not found').

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The free_native_buffer function uses CoTaskMemFree on Windows, which
lives in ole32.lib. Without this link directive, the integration test
binary fails to link with LNK2019 unresolved external symbol.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
All chat and audio tests now properly unload models after use, matching
the try/finally pattern in the JS SDK tests. Setup helpers return the
Model alongside the client so tests can call model.unload() at the end.

Also fixes model_load_manager should_load_model test to unload after
loading.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Covers previously untested Model public API: alias(), id(), variants(),
selected_variant(), is_cached(), path(), select_variant() (success and
error paths). Total integration tests: 38.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Tests the full web server lifecycle: start service → load model →
REST call to v1/chat/completions (non-streaming and SSE streaming) →
verify response → stop service → unload model.

Also tests that urls() returns the correct addresses after start.
Total integration tests: 41.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Packages the crate with cargo package and uploads the .crate file,
matching the pattern used by CS (.nupkg) and JS (.tgz) builds.
Also uploads flcore logs on all runs (including failures).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
All tests that produce response content now println! the actual output
(chat completions, transcriptions, REST responses, model metadata).
Output is visible with --nocapture (added to CI integration test step).
Helps diagnose failures by showing actual vs expected values.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
});
}
self.update_models().await?;
let map = self.models_by_alias.lock().unwrap();

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unwraps in SDKs are dangeorus, and it should be used rarely, if it's going to crash the error should be returned as you already return Result.

Something like this is probably better, although we would have to discuss this.

let map = self.models_by_alias.lock().map_err(|_| FoundryLocalError::Internal {
    reason: "Unexpected error occured in the SDK, please retry".into(),
})?;

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed. Replaced all 8 .lock().unwrap() calls in catalog.rs with .map_err(|_| FoundryLocalError::Internal { ... })?. Added a new Internal variant to FoundryLocalError.

core: Arc<CoreInterop>,
model_load_manager: Arc<ModelLoadManager>,
name: String,
models_by_alias: Mutex<HashMap<String, Model>>,

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is very inefficient, especially cause you keep cloning in method below. I don't think Clone is right thing for Model to do, but fixing that will be pretty hard, what we can do for a quick fix is the following. Wrap Model and ModelVariant to arc like: Arc, so models_by_alias: Mutex<HashMap<String, Arc>>.

Arc is a reference (like a pointer in C++), it will allow you to access without cloning full object, cloning Arc is cheap, making all model management methods faster/more efficient.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed. Maps now use Arc<Model> and Arc<ModelVariant>. Public methods return Arc-wrapped types. Cloning is now cheap (reference count bump instead of full object copy).

command: *const i8,
command_length: i32,
data: *const i8,
data_length: i32,

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

consider making it u32, length cannot be negative

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed. Changed ResponseBuffer length fields from signed to u32.


/// Read the response buffer, free the native memory, and return the data
/// string or raise an error.
fn process_response(response: &ResponseBuffer) -> Result<String> {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whenever you have cheap objects like ResponseBuffer (but many times even not cheap) consider not using ref, but just move it with (response: ResponseBuffer), this way you force caller not to call the method twice which would be wrong here. In C++ this would be copied, but Rust by default moves, invalidating the input data (in this case response).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed. process_response now takes ResponseBuffer by value (move semantics), enforcing single-use.

};

let data_str = if !response.data.is_null() && response.data_length > 0 {
Some(unsafe {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay for now, but this unsafe code needs to live separately, so its always easily findable, this is often a source of bugs.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed. Added // SAFETY: comments to every unsafe block throughout core_interop.rs. Extracted read_native_buffer() as a dedicated unsafe helper with safety documentation.

}

/// Load a model by its identifier.
pub async fn load(&self, model_id: &str) -> Result<String> {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldnt return anything

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed. load() now returns Result<()> instead of Result<String>. Callers updated.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Once execute_command_async is fixed, this will return Result<(), FoundryLocalError>

| `ModelVariant` | Single variant — download, load, unload |
| `ChatClient` | OpenAI-compatible chat completions (sync + streaming) |
| `AudioClient` | OpenAI-compatible audio transcription (sync + streaming) |
| `CreateChatCompletionResponse` | Typed chat completion response (from `async-openai`) |

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to use type from async-openai?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes — we re-export async-openai types for user convenience so callers don't need a direct dependency. Keeping them in the generated docs makes sense.

}

/// Return the local file-system path of the selected variant.
pub async fn path(&self) -> Result<String> {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should return PathBuf probably

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed. path() now returns PathBuf instead of String.

/// Created once via [`FoundryLocalManager::create`]; subsequent calls return
/// the existing instance.
pub struct FoundryLocalManager {
_config: Configuration,

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you marked it as _ if its never used, just remove it, same goes for model_load_manager below.
if it's used shouldn't be marked with underscore in public API.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed. Removed unused _config and _model_load_manager fields entirely.

}

/// Start the local web service and return the listening URLs.
pub async fn start_web_service(&self) -> Result<Vec<String>> {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Public API should return something like Result<Vec, FoundryLocalError> and not just swallow errors.

The same goes for some other methods, if it can have an issue, we do not want to swallow it and pretend it didn't happen, we need to return it to the user.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed. start_web_service now propagates errors via Result instead of silently falling back.


mod common;

mod manager_tests {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can these be split up into separate test files similar to js/cs?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed. Split into 6 separate test modules under tests/integration/: manager_test.rs, catalog_test.rs, model_test.rs, chat_client_test.rs, audio_client_test.rs, web_service_test.rs. Uses directory-based test pattern with shared common/mod.rs.

required: false
type: boolean
default: false
run-integration-tests:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tests should run on all platforms

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed. Changed run-integration-tests default to true so tests run on all platforms.

working-directory: sdk_v2/rust

env:
CARGO_FEATURES: ${{ inputs.useWinML && '--features winml' || '' }}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

evaluates to true/false and ''? the after || should be false to be more readable

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed. Changed || '' to || false for boolean readability.

- name: Check formatting
run: cargo fmt --all -- --check

- name: Run clippy
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is clippy for? would be helpful to include a comment since its new for rust and not in js/cs

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed. Added comment: Clippy is Rust's official linter for catching common mistakes, enforcing idioms, and improving code quality.

run: cargo test --test '*' ${{ env.CARGO_FEATURES }} -- --test-threads=1 --nocapture

- name: Package crate
run: cargo package ${{ env.CARGO_FEATURES }} --allow-dirty
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is allow dirty? implies some sort of non-compliant build which is OK for public pipelines but would be good to document for clarity

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed. Added comment explaining --allow-dirty flag purpose.


// ── Library name helpers ─────────────────────────────────────────────────────

#[cfg(target_os = "windows")]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we just define the extension based on the platform instead of the entire filename and dynamically generate the full filename inline? that is how our other sdks do it

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed. Now defines only LIB_EXTENSION per platform and builds the full filename dynamically via format!.

///
/// Runs the blocking FFI call on a dedicated thread via
/// [`tokio::task::spawn_blocking`] so the async runtime is never blocked.
pub async fn execute_command_async(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we want async commands here? I believe executeCommand should be blocking

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've been looking at this as well, it's an async method, however tokio is blocking, look at the method spawn_blocking.

I'm not convinced Rust should be async accross the board, it would be good if someone else with Rust expertise weight in. We are making async methods which internally does sync thing, making our code more complicated accross the board cause async immediately means we have to use Arc and Mutex which are more complicated and heavy interfaces. Async vs sync code is very different in Rust.

I would have to think about this, but I would probably try to make this method sync, and then update everything upstream where it can be sync, simplify things.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Acknowledged — both reviewers lean toward sync. The native calls are blocking (spawn_blocking wraps them). Deferring this architectural change to a follow-up PR after explicit sign-off, as it would require changes across many files (removing Arc/Mutex, changing async method signatures).

}
}

// 2. Compile-time environment variable set by build.rs.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

only #1 and #4 are necessary, we don't provide env var override or runtime var support in the other sdks

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed. Removed the runtime env var override (option #3). Kept: (1) explicit config path, (2) compile-time option_env!("FOUNDRY_NATIVE_DIR") from build.rs (needed for dev builds), (3) exe-sibling directory.

@@ -0,0 +1,7 @@
mod audio_client;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what are these mod files for? is it for exporting objects or more like a utils file? if not required, can they be removed?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is required, this tells you where code lives (e.g. audio_client.rs)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Confirmed — mod.rs files are required in Rust to declare submodules. No change needed.

@@ -0,0 +1,110 @@
# Foundry Local Rust SDK
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should be expanded to call out features and provide examples on how to do each feature similar to js/c#. AI can help you with this :P

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed. README expanded from ~110 lines to ~477 lines with features list, installation guide, quick start, per-feature usage examples, error handling, configuration reference, and sample links.

Copy link
Contributor

@prathikr prathikr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

awesome first iteration! some things to look over but great start

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a new Rust SDK v2 under sdk_v2/rust that wraps the Foundry Local Core native library (FFI + async APIs), alongside refreshed Rust samples and GitHub Actions CI support for building/testing the Rust SDK.

Changes:

  • Introduces the foundry-local-sdk Rust crate (manager/catalog/model lifecycle + OpenAI-compatible chat/audio clients).
  • Adds Rust integration tests and shared test utilities.
  • Updates Rust samples workspace and CI workflows for Rust build/test packaging (including WinML option on Windows).

Reviewed changes

Copilot reviewed 45 out of 45 changed files in this pull request and generated 15 comments.

Show a summary per file
File Description
sdk_v2/rust/tests/integration.rs Single integration test binary covering manager/catalog/model/chat/audio/web service flows
sdk_v2/rust/tests/common/mod.rs Shared test config/utilities (test-data-shared pathing, tool definition helper)
sdk_v2/rust/src/types.rs Shared SDK types (ModelInfo, response format, tool choice, etc.)
sdk_v2/rust/src/openai/mod.rs Exposes OpenAI-compatible client modules
sdk_v2/rust/src/openai/chat_client.rs Chat client + streaming wrapper and settings serialization
sdk_v2/rust/src/openai/audio_client.rs Audio transcription client + streaming wrapper and settings serialization
sdk_v2/rust/src/model_variant.rs Model variant operations (download/load/unload/path/cache) + client factories
sdk_v2/rust/src/model.rs Alias-level model wrapper with variant selection logic
sdk_v2/rust/src/lib.rs Crate root exports and re-exports of OpenAI request/response types
sdk_v2/rust/src/foundry_local_manager.rs Singleton manager initialization, catalog access, web service lifecycle
sdk_v2/rust/src/error.rs FoundryLocalError + crate-wide Result<T> alias
sdk_v2/rust/src/detail/model_load_manager.rs Load/unload/list logic via HTTP when external service is configured, else via core
sdk_v2/rust/src/detail/mod.rs Internal module wiring / re-exports
sdk_v2/rust/src/detail/core_interop.rs FFI bridge: dynamic library loading, command execution, streaming callback trampoline
sdk_v2/rust/src/configuration.rs User config -> native parameter map conversion + unit tests
sdk_v2/rust/src/catalog.rs Model discovery + TTL cache + alias/id lookup helpers
sdk_v2/rust/examples/tool_calling.rs Tool-calling example using streaming tool_calls assembly
sdk_v2/rust/examples/interactive_chat.rs Interactive streaming chat example
sdk_v2/rust/examples/chat_completion.rs Basic non-streaming + streaming chat completion example
sdk_v2/rust/build.rs Build-time NuGet download/extract of native libs + RID selection
sdk_v2/rust/README.md Rust SDK README (installation, features, platform support, quick start)
sdk_v2/rust/GENERATE-DOCS.md Rust docs generation instructions (cargo doc)
sdk_v2/rust/Cargo.toml New Rust crate manifest (deps/features/examples)
sdk_v2/rust/.rustfmt.toml Rustfmt settings
sdk_v2/rust/.clippy.toml Clippy config (MSRV)
samples/rust/tool-calling-foundry-local/src/main.rs Updated tool-calling sample using SDK v2
samples/rust/tool-calling-foundry-local/README.md Tool-calling sample instructions
samples/rust/tool-calling-foundry-local/Cargo.toml Tool-calling sample manifest
samples/rust/native-chat-completions/src/main.rs Native (non-HTTP) chat completions sample
samples/rust/native-chat-completions/README.md Native chat sample instructions
samples/rust/native-chat-completions/Cargo.toml Native chat sample manifest
samples/rust/foundry-local-webserver/src/main.rs Web service start + REST streaming sample
samples/rust/foundry-local-webserver/README.md Web service sample instructions
samples/rust/foundry-local-webserver/Cargo.toml Web service sample manifest
samples/rust/audio-transcription-example/src/main.rs Audio transcription (sync + streaming) sample
samples/rust/audio-transcription-example/README.md Audio transcription sample instructions
samples/rust/audio-transcription-example/Cargo.toml Audio transcription sample manifest
samples/rust/README.md Samples index updated to new set of examples
samples/rust/Cargo.toml Workspace members updated (replaces old hello sample)
samples/rust/hello-foundry-local/src/main.rs Removed legacy hello sample
samples/rust/hello-foundry-local/README.md Removed legacy hello sample README
samples/rust/hello-foundry-local/Cargo.toml Removed legacy hello sample manifest
.github/workflows/rustfmt.yml Removed legacy rustfmt workflow targeting old paths
.github/workflows/foundry-local-sdk-build.yml Adds Rust build jobs (win/winml/ubuntu/macos)
.github/workflows/build-rust-steps.yml New reusable workflow for Rust fmt/clippy/build/test/package/artifacts

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.


- name: Run integration tests
if: ${{ inputs.run-integration-tests }}
run: cargo test --test '*' ${{ env.CARGO_FEATURES }} -- --test-threads=1 --nocapture
Copy link

Copilot AI Mar 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cargo test --test '*' is not a valid way to run all integration tests; --test expects a concrete test target name, so this will typically fail with "no test target named '*'" (or run nothing). To execute all integration tests, use cargo test --tests (or list the specific integration test target like --test integration).

Suggested change
run: cargo test --test '*' ${{ env.CARGO_FEATURES }} -- --test-threads=1 --nocapture
run: cargo test ${{ env.CARGO_FEATURES }} --tests -- --test-threads=1 --nocapture

Copilot uses AI. Check for mistakes.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed. Changed cargo test --test '*' to cargo test --tests. Also added --include-ignored to run ignored integration tests in CI.

Comment on lines +580 to +587
#[tokio::test]
async fn should_throw_when_completing_streaming_chat_with_invalid_callback() {
let (client, model) = setup_chat_client().await;
let messages: Vec<ChatCompletionRequestMessage> = vec![];

let result = client.complete_streaming_chat(&messages, None).await;
assert!(result.is_err(), "Expected error even with empty messages");

Copy link

Copilot AI Mar 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test name mentions an "invalid callback", but the body only passes an empty messages array (same as the preceding test) and doesn't exercise any callback validation. This makes the intent unclear and duplicates coverage. Consider removing this test or rewriting it to actually validate the intended error condition.

Copilot uses AI. Check for mistakes.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed. Removed the duplicate should_throw_when_completing_streaming_chat_with_invalid_callback test.

println!("Downloading model...");
model
.download(Some(|progress: &str| {
print!("\r {progress:.1}%");
Copy link

Copilot AI Mar 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

progress is a &str; using {progress:.1}% applies string precision and will truncate the progress string to 1 character. If you want to show the reported progress value, print {progress} directly (or parse to a number and then format with one decimal).

Suggested change
print!("\r {progress:.1}%");
print!("\r {progress}%");

Copilot uses AI. Check for mistakes.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed. Changed {progress:.1}% to {progress}% in all 4 sample apps.

Comment on lines +182 to +186
let mut request = self.settings.serialize(&self.model_id, path_str);
if let Some(map) = request.as_object_mut() {
map.insert("stream".into(), json!(true));
}

Copy link

Copilot AI Mar 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For audio transcription streaming, this adds stream: true into the OpenAI request payload. The JS and C# SDKs implement streaming via the native callback without adding a stream field to the audio_transcribe request (see sdk_v2/js/src/openai/audioClient.ts and sdk_v2/cs/src/OpenAI/AudioClient.cs), so this is an API-shape inconsistency that may cause request validation differences across SDKs. Consider removing the stream field (or aligning all SDKs / documenting why Rust needs it).

Copilot uses AI. Check for mistakes.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed. Removed stream: true from the audio transcription streaming request. Streaming is now handled purely via the native callback, matching JS/C# SDKs.

Comment on lines +103 to +118
/// C-ABI trampoline that forwards chunks from the native library into a Rust
/// closure stored behind `user_data`.
unsafe extern "C" fn streaming_trampoline(
data: *const u8,
length: i32,
user_data: *mut std::ffi::c_void,
) {
if data.is_null() || length <= 0 {
return;
}
let closure = &mut *(user_data as *mut Box<dyn FnMut(&str)>);
let slice = std::slice::from_raw_parts(data, length as usize);
if let Ok(chunk) = std::str::from_utf8(slice) {
closure(chunk);
}
}
Copy link

Copilot AI Mar 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

streaming_trampoline invokes a Rust closure directly from an extern "C" callback. If the user callback panics, the unwind would cross the FFI boundary, which is undefined behavior (and may abort or corrupt state). Please wrap the callback invocation in std::panic::catch_unwind and convert any panic into a captured error (or abort the stream safely) so no panic can unwind past the C ABI boundary.

Copilot uses AI. Check for mistakes.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed. Wrapped streaming_trampoline body in std::panic::catch_unwind(AssertUnwindSafe(...)) to prevent UB from panics crossing the FFI boundary.

Comment on lines +821 to +829
#[tokio::test]
async fn should_transcribe_audio_without_streaming_with_temperature() {
let (mut client, model) = setup_audio_client().await;
client.language("en").temperature(0.0);

let response = client
.transcribe(&audio_file())
.await
.expect("transcribe with temperature failed");
Copy link

Copilot AI Mar 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These "with_temperature" transcription tests set temperature(0.0), which is the same value used in the non-"with_temperature" variants. As written, they don't actually verify that changing temperature is accepted/applied. Consider using a non-default temperature (or asserting the request/behavior difference) so the test meaningfully covers the setting.

Copilot uses AI. Check for mistakes.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed. Changed temperature from 0.0 to 0.5 in the with_temperature transcription test.

Comment on lines +867 to +878
#[tokio::test]
async fn should_transcribe_audio_with_streaming_with_temperature() {
let (mut client, model) = setup_audio_client().await;
client.language("en").temperature(0.0);

let mut full_text = String::new();

let mut stream = client
.transcribe_streaming(&audio_file())
.await
.expect("transcribe_streaming with temperature setup failed");

Copy link

Copilot AI Mar 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This streaming "with_temperature" transcription test sets temperature(0.0), matching the other streaming test. That means it doesn't actually validate that temperature changes are supported. Consider using a different temperature value (or asserting on request metadata) to make this test cover the intended behavior.

Copilot uses AI. Check for mistakes.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed. Changed temperature from 0.0 to 0.5 in the streaming with_temperature transcription test.

Comment on lines +39 to +71
pub fn create(config: FoundryLocalConfig) -> Result<&'static Self> {
// If already initialised, return the existing instance.
if let Some(mgr) = INSTANCE.get() {
return Ok(mgr);
}

let internal_config = Configuration::new(config)?;
let core = Arc::new(CoreInterop::new(&internal_config)?);

// Send the configuration map to the native core.
let init_params = json!({ "Params": internal_config.params });
core.execute_command("initialize", Some(&init_params))?;

let service_endpoint = internal_config.params.get("WebServiceExternalUrl").cloned();

let model_load_manager =
Arc::new(ModelLoadManager::new(Arc::clone(&core), service_endpoint));

let catalog = Catalog::new(Arc::clone(&core), Arc::clone(&model_load_manager))?;

let manager = Self {
_config: internal_config,
core,
_model_load_manager: model_load_manager,
catalog,
urls: std::sync::Mutex::new(Vec::new()),
};

// Attempt to store; if another thread raced us, return whichever won.
match INSTANCE.set(manager) {
Ok(()) => Ok(INSTANCE.get().unwrap()),
Err(_) => Ok(INSTANCE.get().unwrap()),
}
Copy link

Copilot AI Mar 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FoundryLocalManager::create is intended to be a singleton initializer, but the current pattern (INSTANCE.get() check followed by initialization and then INSTANCE.set) is not atomic. If two threads call create concurrently, both can run initialize and build a catalog before one loses the set race, which can reintroduce the "already initialized" failure this singleton is meant to prevent. Consider using OnceLock::get_or_try_init (or equivalent) so the initialization closure runs at most once and only one thread executes the native initialize call.

Copilot uses AI. Check for mistakes.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed. Replaced non-atomic get()+set() pattern with Once + OnceLock to ensure initialization runs exactly once (stable Rust compatible — get_or_try_init is nightly-only).

Comment on lines +28 to +36
model
.download(Some(|progress: &str| {
print!("\r {progress:.1}%");
io::stdout().flush().ok();
}))
.await?;
println!();
}

Copy link

Copilot AI Mar 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this progress callback, progress is a &str, and the format specifier {progress:.1} applies string precision (truncation), so it will only print the first character of the progress string. If the intent is to display the full progress (or a numeric percentage), print {progress} as-is or parse to a number before formatting.

Copilot uses AI. Check for mistakes.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed. Changed {progress:.1} to {progress}%.

print!("Downloading model {model_alias}...");
model
.download(Some(move |progress: &str| {
print!("\rDownloading model... {progress:.1}%");
Copy link

Copilot AI Mar 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here progress is a &str, and {progress:.1}% will truncate the string to a single character (string precision) rather than formatting a numeric percentage. Consider printing {progress} as-is, or parsing it to a numeric type before applying float formatting.

Suggested change
print!("\rDownloading model... {progress:.1}%");
print!("\rDownloading model... {progress}%");

Copilot uses AI. Check for mistakes.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed. Changed {progress:.1}% to {progress}%.

samuel100 and others added 2 commits March 12, 2026 12:47
Add comprehensive documentation matching the quality and structure of the
JS and C# SDK READMEs. The expanded README now includes:

- Feature overview listing all SDK capabilities
- Installation instructions with tokio dependency guidance
- Quick Start example with numbered steps
- Usage sections for catalog browsing, model lifecycle, chat completions,
  streaming responses, tool calling, response format options, audio
  transcription, and embedded web service
- Chat client settings reference table
- Error handling guide with FoundryLocalError enum variants
- Configuration reference table with all FoundryLocalConfig fields
- Links to sample applications in samples/rust/

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Split the monolithic integration.rs into separate test modules while
keeping a single test binary (tests/integration/main.rs) to avoid
.NET native runtime re-initialization errors:

- manager_test.rs: FoundryLocalManager tests
- catalog_test.rs: Catalog operation tests
- model_test.rs: Model operations (load, unload, cache, introspection)
- chat_client_test.rs: ChatClient (completions, streaming, tool calling)
- audio_client_test.rs: AudioClient (transcription, streaming)
- web_service_test.rs: REST API tests

Additional fixes:
- Remove duplicate 'invalid callback' test (was identical to empty
  messages streaming test)
- Use temperature(0.5) instead of 0.0 in with_temperature audio tests
  so they actually validate non-default temperature behavior

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
samuel100 added a commit that referenced this pull request Mar 12, 2026
…rapping

Resolve all 37 review comments from nenad1002, prathikr, and copilot-bot:

Safety & correctness:
- Fix Windows buffer deallocation: CoTaskMemFree -> LocalFree (matches FreeHGlobal)
- Wrap FFI streaming callback in catch_unwind to prevent UB from panics
- Fix singleton TOCTOU race with Once + OnceLock (stable Rust compatible)
- Replace all .unwrap() on mutex locks with FoundryLocalError::Internal

API design:
- Wrap Model/ModelVariant in Arc for efficient sharing (no full clones)
- Change ResponseBuffer length to u32, move by value (consume-once)
- Return PathBuf for file system paths, remove unused struct fields
- Public APIs return Result instead of swallowing errors
- Define only platform extension, build native lib filename dynamically
- Reduce resolve_library_path to 3 strategies (config, build.rs, exe-sibling)

CI/workflow:
- Weave Rust jobs into platform-specific blocks (match cs/js format)
- Remove ubuntu (known issues), enable tests on all platforms
- Fix cargo test command (--tests, --include-ignored), fix boolean default
- Add comments for clippy and --allow-dirty

Tests:
- Split monolithic integration.rs into 6 per-feature test modules
- Remove duplicate invalid-callback test
- Use non-default temperature (0.5) in with_temperature tests

Samples:
- Fix {progress:.1}% -> {progress}% in all 4 samples (string truncation bug)

Consistency & performance:
- Remove stream:true from audio transcription (match JS/C# SDKs)
- Isolate unsafe code with SAFETY comments throughout core_interop
- Reuse reqwest::Client in ModelLoadManager (connection pooling)
- Fix libs_already_present to check specific core library file

Bug fix (not in review):
- Auto-detect WinAppSDK Bootstrap DLL and set Bootstrap=true in config
  params before native initialize (matches JS SDK behavior, required for
  WinML/OpenVINO execution providers)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@samuel100
Copy link
Contributor Author

Additional fix: WinAppSDK Bootstrap auto-detection

During testing, discovered that the Rust SDK was failing to load WinML/OpenVINO execution providers because the native core's WinAppSDK bootstrapper was never being triggered.

Root cause: The Rust SDK pre-loaded Microsoft.WindowsAppRuntime.Bootstrap.dll but never set Bootstrap = true in the config params passed to the native initialize command. Without this param, the native core skips the WinAppSDK bootstrapper, which means plug-in EPs (like OpenVINO) are unavailable.

Fix: CoreInterop::new now auto-detects the Bootstrap DLL next to the resolved core library and sets Bootstrap = true in config params before initialize is called. This matches the JS SDK behavior (see sdk_v2/js/src/detail/coreInterop.ts:56-63).

…rapping

Resolve all 37 review comments from nenad1002, prathikr, and copilot-bot:

Safety & correctness:
- Fix Windows buffer deallocation: CoTaskMemFree -> LocalFree (matches FreeHGlobal)
- Wrap FFI streaming callback in catch_unwind to prevent UB from panics
- Fix singleton TOCTOU race with Once + OnceLock (stable Rust compatible)
- Replace all .unwrap() on mutex locks with FoundryLocalError::Internal

API design:
- Wrap Model/ModelVariant in Arc for efficient sharing (no full clones)
- Change ResponseBuffer length to u32, move by value (consume-once)
- Return PathBuf for file system paths, remove unused struct fields
- Public APIs return Result instead of swallowing errors
- Define only platform extension, build native lib filename dynamically
- Reduce resolve_library_path to 3 strategies (config, build.rs, exe-sibling)

CI/workflow:
- Weave Rust jobs into platform-specific blocks (match cs/js format)
- Remove ubuntu (known issues), enable tests on all platforms
- Fix cargo test command (--tests, --include-ignored), fix boolean default
- Add comments for clippy and --allow-dirty

Tests:
- Split monolithic integration.rs into 6 per-feature test modules
- Remove duplicate invalid-callback test
- Use non-default temperature (0.5) in with_temperature tests

Samples:
- Fix {progress:.1}% -> {progress}% in all 4 samples (string truncation bug)

Consistency & performance:
- Remove stream:true from audio transcription (match JS/C# SDKs)
- Isolate unsafe code with SAFETY comments throughout core_interop
- Reuse reqwest::Client in ModelLoadManager (connection pooling)
- Fix libs_already_present to check specific core library file

Bug fix (not in review):
- Auto-detect WinAppSDK Bootstrap DLL and set Bootstrap=true in config
  params before native initialize (matches JS SDK behavior, required for
  WinML/OpenVINO execution providers)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
.await
.map_err(|e| FoundryLocalError::CommandExecution {
reason: format!("task join error: {e}"),
})?
Copy link

@nenad1002 nenad1002 Mar 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, this is swallowed by tokio, if you use Claude ask it to return an error out, then change the method execute_command_async to return Result<String, FoundryLocalError>.

This will force you to return Result<(), FoundryLocalError> or similar on above funs.

load() should return Error if there is an error.

Copy link
Contributor Author

@samuel100 samuel100 Mar 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've restructured this so the error can no longer be swallowed.

Rather than just fixing the consumer side, I moved the error propagation into execute_command_streaming_channel itself. The channel now carries Result<String> items instead of plain String, and after the FFI call completes, any error from the native core response buffer is sent as the final channel item:

   tokio::task::spawn_blocking(move || {
       let tx_chunk = tx.clone();
       let result = this.execute_command_streaming(
           &command,
           params.as_ref(),
           move |chunk: &str| {
               let _ = tx_chunk.send(Ok(chunk.to_owned()));
           },
       );

       // Surface any error through the channel so it
       // cannot be silently swallowed.
       if let Err(e) = result {
           let _ = tx.send(Err(e));
       }
   });

This eliminates the JoinHandle<Result<String>> from the return type entirely — it's now impossible for consumers to accidentally drop the error. ChatCompletionStream and AudioTranscriptionStream are much simpler as a result:
no handle field, no close() method, and poll_next just forwards Ok/Err items from the channel. Net ~100 lines deleted.

samuel100 and others added 4 commits March 12, 2026 16:42
…wing them

When the mpsc channel closes in poll_next, the Stream implementations now
check the JoinHandle for errors from the native core response buffer.
Previously, these errors were silently dropped by tokio if the consumer
only iterated the stream without calling close().

The fix polls the JoinHandle when Ready(None) is received from the channel:
- Ok(Ok(_)): stream ends normally
- Ok(Err(e)): surfaces the FFI error as the final stream item
- Err(e): surfaces the tokio join error as the final stream item

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Move error propagation into execute_command_streaming_channel itself:
the channel now carries Result<String> items, and the blocking task
sends any error from the native core response buffer as the final
channel item.  This eliminates the JoinHandle<Result<String>> from the
return type, making it impossible for consumers to silently swallow
errors.

Simplifies ChatCompletionStream and AudioTranscriptionStream:
- Remove handle field and close() method
- poll_next now just forwards Ok/Err items from the channel
- Errors surface naturally as stream items

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Follow-up to the channel restructure — samples also referenced the
now-removed close() method on ChatCompletionStream and
AudioTranscriptionStream.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants