Skip to content

fix: async adapter string params + shim routing#96

Merged
avrabe merged 16 commits intomainfrom
fix/async-string-params-and-issue-92
Apr 21, 2026
Merged

fix: async adapter string params + shim routing#96
avrabe merged 16 commits intomainfrom
fix/async-string-params-and-issue-92

Conversation

@avrabe
Copy link
Copy Markdown
Contributor

@avrabe avrabe commented Apr 14, 2026

Summary

Follow-up to PR #95. Fixes task.return shim routing for the component wrapper and adds infrastructure for string param cross-memory copy.

Changes

  • Export task.return shims from fused module ($task_return_shim_N)
  • Component wrapper aliases shim exports instead of canonical task.return for shimmed imports (fixes intra-component forwarding through indirect table)
  • Add string param cross-memory copy via cabi_realloc + memory.copy
  • Fix component_realloc_index to prefer module 0's realloc

Status

P3 compute functions (prime, fibonacci, factorial, collatz) produce correct results on stock wasmtime. String-returning functions need a resolver-level fix for pointer_pair_positions offset mapping with mixed-type params.

73/73 P2 runtime tests pass.

🤖 Generated with Claude Code

avrabe and others added 15 commits April 13, 2026 21:25
Add cross-memory copy for string/list INPUT params in the async
callback adapter. When the call crosses a memory boundary, the
adapter allocates in callee memory via cabi_realloc and copies
string data from caller memory before calling [async-lift].

Text functions still fail because the text_processor uses intra-
component forwarding functions (call_indirect through fixup table)
that route to canonical task.return. These forwarding functions
bypass our shim. Fix requires the component wrapper to provide
shim functions instead of canonical task.return for shimmed imports.

73/73 P2 runtime tests pass. P3 compute functions correct.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Export task.return shims from fused module ($task_return_shim_N)
- Component wrapper aliases shim exports instead of using canonical
  task.return for shimmed imports (fixes intra-component forwarding)
- Fix component_realloc_index to prefer module 0's realloc

Text functions: shim routing works (no more "invalid task.return
signature") but param copy uses wrong pointer_pair_positions for
mixed-type params (copies shift instead of text_ptr). Needs the
sync adapter's position mapping logic.

73/73 P2 tests pass. P3 compute functions correct.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Export task.return shims ($task_return_shim_N) from fused module
- Component wrapper aliases shim exports instead of canonical task.return
- Add string param cross-memory copy (caller→callee via cabi_realloc)
- Fix component_realloc_index to prefer module 0
- Debug logging for param copy positions

Text functions: shim routing works but pointer_pair_positions has
wrong offsets for mixed-type params (e.g., caesar(u32, string) reports
position [0] instead of [1]). Needs resolver-level fix for async
adapter param flattening.

73/73 P2 tests pass. P3 compute functions correct.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Compute pointer pair positions from CALLER's flat param types
instead of the CALLEE's component type order. The caller's locals
are in canon-lower order which may differ from the callee's
component type param order.

Results:
  caesar 3 "hello" → runs without crash (param copy works!)
    but result not printed (string result delivery incomplete)
  analyze "hello world" → partial output with garbage pointers
    (complex record type not fully handled)
  prime/fibonacci/factorial/collatz → all correct

73/73 P2 tests pass.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Reverting the forwarding function body replacement approach — the
forwarding functions have different types than the shims (dispatch
type (i32,i32) vs actual task.return type). The wrapper-level shim
routing handles the import table correctly.

The forwarding function's call_indirect table[N] resolves to the
import function, which the wrapper provides as the shim export.
The types match for (i32,i32) → () shims (string results).

Caesar runs without crash but produces empty output — the shim
globals may not be receiving values. Needs deeper table/import
tracing.

73/73 P2 tests pass. Compute functions correct.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Patch element segments to replace task.return import references with
shim function references. This ensures call_indirect through
element-segment-initialized tables calls the shim instead of the
stub import.

Root cause identified for remaining string function issue: the
forwarding function's table index (i32.const 2) uses ORIGINAL
component import numbering, but the element segment uses MERGED
import numbering. Position 2 in the merged element segment is
'analyze', not 'caesar-cipher' as the forwarding function expects.
This is a merger-level table index remapping issue.

73/73 P2 tests pass. Compute functions correct.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Fix task.return shim matching by using element segment positions
instead of name-based matching. Each component's forwarding module
has an element segment that maps table positions to task.return
imports. After patching these with shim references, build a
(comp_idx, func_name) → globals lookup using the correct positions.

For components with direct task.return calls (no forwarding module),
fall back to name-based matching using original function names
extracted from the main module's imports.

Results on stock wasmtime 41:
  prime 7      → 7 is prime              ok
  fibonacci 10 → fibonacci(10) = 55      ok
  factorial 5  → 120                     ok
  collatz 27   → 111 steps               ok
  uppercase    → HELLO                   ok
  reverse      → dlrow olleh             ok
  caesar       → ifmmp                   ok
  count_chars  → 11                      ok
  search       → returns ptr values      todo
  frequencies  → out-of-bounds           todo

8/10 commands correct. 73/73 P2 tests still pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Plumb component-level result types from CanonicalEntry::TaskReturn
through to TaskReturnShimInfo and the adapter, so cross-memory copy
of list returns can use element-aware byte counts.

For list<T>, byte_count = len * cabi_size_of(T) instead of the
previous byte-string assumption. This fixes search-positions
(list<u32>: now 4 bytes per element) and unblocks analyze
(text-stats record return). Adds cabi_size_align() implementing
the canonical ABI size/alignment rules for value types.

Result-type plumbing strategy:
  - For each async callee component, build name -> result_type map
    from canonical Lift entries (core_func_index -> export name via
    core_entity_order traversal of CoreInstanceExport aliases, plus
    type_index -> ComponentTypeKind::Function results).
  - Match each shim to a result type by original_func_name, falling
    back to ordered claim from the source's TaskReturn list when
    the shim's source core import was rerouted to a forwarding
    function.
  - Pre-resolve Type(idx) references into self-contained value
    types via resolve_component_val_type so the adapter doesn't
    need source-component access at codegen time.

Also fixes an index-space bug where merged_idx was compared to
import-vector position instead of merged function index, causing
original_func_name extraction to fail for any component with
non-function imports interleaved before its task-return imports.

Adds emit_patch_nested_indirections + collect_indirections for
walking list<record-with-string> elements and patching nested
(ptr, len) pairs after bulk copy. Currently disabled for nested
record types -- the inner cross-memory string copy traps OOB and
needs further investigation. Lists of plain types work end-to-end.

Results on stock wasmtime 41:
  prime, fibonacci, factorial, collatz   ok
  transform uppercase/reverse, caesar    ok
  count_chars                            ok
  analyze (text-stats record)            ok (was: garbage)
  search (list<u32>)                     ok (was: panic)
  frequencies (list<record<string,u32>>) todo (still panics)

9/10 P3 commands correct. 203/203 P2 tests pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two changes to the nested-indirection patching helper:

  1. Read (string ptr, len) directly from CALLEE memory at
     `list_ptr + i*elem_size` instead of from caller memory after
     bulk copy. The bulk copy is still needed for the record body
     (count field, etc.), but for indirect fields whose values are
     callee-side addresses we want the source-of-truth value
     regardless of what the bulk copy produced.

  2. Wrap the inner cross-memory copy in a bounds check. If
     `old_ptr + buf_len` exceeds the callee's current linear memory
     size, skip the patch instead of triggering an unrecoverable
     OOB trap inside the adapter. With the skip in place, frequencies
     still fails (the unpatched callee pointer panics in the runner),
     but the failure mode is now contained to the runner side rather
     than the adapter, and other typed lists with valid pointer
     fields will be patched correctly.

The frequencies failure root cause is still under investigation:
loaded values from the apparent-correct callee record offsets
exceed the callee memory size, suggesting either the layout is not
canonical or the list pointer doesn't point where expected. Search,
analyze, and the other 7 P3 commands continue to work.

276/276 P2 tests pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The wit-bindgen async lowering for `list<record>` returning component
functions allocates the records buffer via `Cleanup::new`, whose drop
guard fires at the end of the async block — between EXIT and when our
adapter reads the shim globals. By that point the records buffer has
been freed and overwritten with allocator free-list patterns (0xFFFFFFFF
sentinels for multi-record cases; less visible corruption for single
records because the dealloc is lazy enough for the data to survive).

The fix: extend the task.return shim to deep-copy both the records
buffer and each indirect string into a fresh callee-side allocation
before storing the stabilized pointer to globals. Because the stable
buffer is owned by the callee allocator directly (not by the Cleanup
guard), it survives the async-block teardown. The adapter's existing
cross-memory copy then operates on stable data.

Implementation:
- generate_stabilizing_shim emits callee-side memcpy for the records
  buffer, then loops over indirection offsets to copy each sub-buffer
  (currently strings). Written into the shim body when result_type is
  list<T> with at least one indirection via collect_indirections.
- collect_indirections / cabi_size_align in adapter/fact.rs are now
  pub(crate) so the shim generator can use them; adapter/fact is also
  pub(crate).
- Adapter retains its memory-size bounds check on the inner patch copy
  as a safety net for other typed-list cases not yet stabilized.

Results on stock wasmtime 41:
  prime, fibonacci, factorial, collatz    ok
  transform uppercase/reverse, caesar     ok
  analyze (text-stats record)             ok
  search (list<u32>)                      ok
  frequencies (list<record<string,u32>>)  ok (was: trap / garbage strings)

10/10 P3 commands correct. 276/276 P2 tests pass.

Known follow-up: batch (list<u64> input + list<record<string,u64,u64>>
output) still returns partial garbage because param-side copy of
list<u64> uses byte-count = element-count instead of element-count * 8.
Separate bug on the input-side typed copy — will fix in a follow-up.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The async adapter's caller→callee param copy treated len as a byte
count, which only matches string (1 byte per code unit). For typed
lists like list<u32> or list<u64>, len is the element count and the
actual buffer size is len * sizeof(T) — so the adapter was copying
too few bytes, truncating the input.

Fix: consult `site.requirements.param_copy_layouts` (same source the
sync adapter uses) and multiply len by the per-element byte size when
allocating and copying. Mirrors the result-side typed sizing that
landed in commits 4569b55 / e5cd80f for the output path.

Before: `batch 5 7 10` returned (fibonacci(5), ..., collatz(5), then
garbage for the next inputs because only the first u64 was copied).
After: all 9 records correct.

  batch 5 10 15 →
    fibonacci: input=5 output=5
    is_prime: input=5 output=1
    collatz_steps: input=5 output=5
    fibonacci: input=10 output=55
    is_prime: input=10 output=0
    collatz_steps: input=10 output=6
    fibonacci: input=15 output=610
    is_prime: input=15 output=0
    collatz_steps: input=15 output=17

11/11 P3 commands correct on stock wasmtime 41.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Address the confirmed Mythos finding LS-A-7: the three transcoding
emitters (emit_utf8_to_utf16_transcode, emit_utf16_to_utf8_transcode,
emit_latin1_to_utf8_transcode) were computing the destination
allocation size via LocalGet(len); I32Const(K); I32Mul; Call(realloc)
with no upper-bound check on len and no null check on the realloc
return.

Impact on unfixed code:
  (a) For len > u32::MAX / K the i32.mul wraps mod 2^32 to a small
      alloc_size; cabi_realloc returns a short buffer; the transcode
      loop writes up to K*len bytes out of bounds.
  (b) For OOM, cabi_realloc returns 0; without a null check the
      transcode loop writes into callee memory starting at offset 0.

This commit adds two guards to each emitter:
  1. Before the multiply:
       LocalGet(len); I32Const(u32::MAX/K); I32GtU; If; Unreachable; End
  2. After the realloc call:
       LocalGet(out_ptr); I32Eqz; If; Unreachable; End

Both guards trap via wasm `unreachable`, which matches the Canonical
ABI's requirement that lift/lower trap rather than silently overwrite.

Added three byte-scan regression tests (LS-A-7 PoC):
  - ls_a_7_utf8_to_utf16_emits_overflow_and_null_guards
  - ls_a_7_utf16_to_utf8_emits_overflow_and_null_guards
  - ls_a_7_latin1_to_utf8_emits_overflow_and_null_guards

Each test emits the corresponding transcode body and scans the raw
bytes for the two guard sequences. They fail on the unfixed emitter
and pass once both guards are present.

Also appended LS-A-7 as `approved` to safety/stpa/loss-scenarios.yaml.

169/169 lib tests pass. 11/11 P3 commands correct on stock wasmtime 41.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Commit fcad26a fixed the three transcoding emitters. A follow-up audit
found 15 more sites across adapter/fact.rs and lib.rs with the same
class of bug — an i32.mul on a caller- or callee-controlled length
feeding cabi_realloc, with no overflow check and no null check on
the return. Every fused adapter emitted by the affected sites was
potentially exploitable for out-of-bounds writes into the target
memory (wrap-to-small on large len; write-at-zero on OOM).

Introduces two pub(crate) helpers in adapter/fact.rs:

  emit_checked_realloc(body, realloc_func, result_local)
      consume 4 stack-pushed realloc args, store result, trap via
      unreachable if the pointer is 0 (LS-A-7 leg b).

  emit_overflow_guard(body, len_local, k)
      trap via unreachable if len_local > u32::MAX / k before the
      multiply that feeds the allocation (LS-A-7 leg a). No-op when
      k <= 1.

Applies both helpers at the audited sites. HIGH-severity (caller-len
drives the mul): 13 sites including both bulk copies and variant-arm
copies in generate_memory_copy_adapter, generate_params_ptr_adapter,
generate_retptr_adapter, emit_inner_pointer_fixup,
generate_async_callback_adapter, plus the stabilizing shim in lib.rs.
Null-check only: 3 sites where length is constant or bounded
(emit_patch_nested_indirections, a single-alloc result path in
generate_memory_copy_adapter, and the outer params-area alloc in
generate_params_ptr_adapter).

  169/169 lib tests pass (including the 3 LS-A-7 byte-scan regressions)
  11/11 P3 commands correct on stock wasmtime 41
  clippy clean

Follow-up: add a cross-file CI lint that scans any emitted adapter for
a bare Call(realloc) without an immediately following null guard —
catches future regressions without requiring each site to be re-audited.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… math

Mythos discover on attestation.rs flagged real correctness bugs but
couldn't satisfy the oracle because every defect depends on
SystemTime::now(), which Kani can't model symbolically.

This commit isolates the SystemTime dependency behind thin wrappers:

  chrono_timestamp()      -> SystemTime::now() -> chrono_timestamp_from(secs: u64)
  generate_uuid()         -> SystemTime::now() -> generate_uuid_from(entropy: u128)

The pure functions become directly testable with pinned inputs.

Additionally, chrono_timestamp_from now uses Howard Hinnant's
civil_from_days algorithm (400-year Gregorian cycle) instead of the
broken 365-days-per-year + 30-days-per-month approximation. The old
code drifted one day every 4 years and produced invalid ISO 8601
dates like 2026-02-30 because every month was modeled as 30 days.

New pinned tests (all 6 passing):
  - test_chrono_timestamp_from_epoch (1970-01-01T00:00:00Z)
  - test_chrono_timestamp_from_2025_new_year (boundary after 2024 leap)
  - test_chrono_timestamp_from_2025_march_boundary
  - test_chrono_timestamp_from_2024_leap_march (Feb 29 -> Mar 1)
  - test_generate_uuid_from_pinned_zero
  - test_generate_uuid_from_distinct_entropy_differs

All 16 attestation tests pass. No external crate dependency added.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds the portable Mythos-style pipeline scripts that were used to
discover LS-A-7 (transcoder overflow) and its class-wide siblings:

  scripts/mythos/HOWTO.md      the 4-phase workflow + rationale
  scripts/mythos/rank.md       per-file threat-model ranking rubric
  scripts/mythos/discover.md   Mythos-verbatim discovery prompt
  scripts/mythos/validate.md   fresh-validator prompt
  scripts/mythos/emit.md       loss-scenario YAML template

First run outcome: 1 confirmed vulnerability + 15 class-sibling sites
found and fixed on this branch. Keeping the scripts in-tree so future
runs can replay the pipeline against new code.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds meld-core/tests/realloc_safety.rs as a cross-cutting regression
gate for loss-scenario LS-A-7. Where the 3 byte-scan tests in
adapter/fact.rs exercise each transcode emitter individually, this
test fuses two programmatic components end-to-end and then walks
every function body in the output, verifying that every call whose
target is a cabi_realloc-family function is followed by a contiguous
i32.eqz; if; unreachable; end null guard within 8 instructions.

The intent is to catch future regressions without requiring every
new emitter to be audited or have a bespoke byte-scan test — any
site that forgets the guard shows up as a concrete
(function_idx, byte_offset, target) entry when the test fails.

Fixtures are a self-contained caller+callee pair (string -> u32)
reused from the SR-12 shape in tests/adapter_safety.rs, built inline
with wasm-encoder. No filesystem assets.

Test name: ls_a_7_every_realloc_call_has_null_guard
Run: cargo test --test realloc_safety

On the current branch tip this scans 1 call site and passes; the
test guards against vacuous success by asserting at least one
realloc index was resolved and at least one call was scanned.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@avrabe avrabe merged commit 30cf088 into main Apr 21, 2026
4 checks passed
@avrabe avrabe deleted the fix/async-string-params-and-issue-92 branch April 21, 2026 19:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant