Skip to content

fix(extension): harden runtime resource validation — bounds extension ingress surfaces#94

Closed
Lythaeon wants to merge 5 commits intomainfrom
fix/security-hardening-audit
Closed

fix(extension): harden runtime resource validation — bounds extension ingress surfaces#94
Lythaeon wants to merge 5 commits intomainfrom
fix/security-hardening-audit

Conversation

@Lythaeon
Copy link
Copy Markdown
Owner

@Lythaeon Lythaeon commented Apr 10, 2026

Description

Hardens externally facing runtime surfaces so invalid extension manifests and abusive websocket provider responses fail fast before SOF allocates unbounded buffers or drifts into ambiguous runtime state. This PR also bumps the workspace patch version to 0.18.2.

Changes

  • crates/sof-observer/src/framework/extension_host.rs
    • reject empty runtime extension names
    • reject empty resource_id values in runtime extension manifests
    • reject empty shared visibility tags
    • bound read_buffer_bytes with a startup-time maximum
    • apply explicit websocket frame/message caps for extension websocket connectors
    • add regression tests for stricter validation and transport config
  • crates/sof-observer/src/provider_stream/websocket.rs
    • bound provider websocket frame/message sizes for built-in websocket sources
    • harden replay HTTP companion calls with bounded client timeouts
    • reject non-success replay HTTP status codes before decode
    • reject oversized replay HTTP bodies before full read / decode
    • add regression tests for websocket transport bounds and replay HTTP failures
  • docs/architecture/runtime-extension-hooks.md
    • document stricter runtime extension validation rules
  • crates/sof-observer/README.md
    • document stricter runtime extension metadata requirements
  • workspace crate manifests / docs / lockfile
    • bump published workspace version from 0.18.1 to 0.18.2

For slice-related changes, include:

  • Affected slices: framework, provider_stream, app/runtime
  • Cross-slice communication changes (if any) and why: none; validation is tightened before runtime dispatch and replay hydration
  • Migration requirements (if any): invalid manifests now fail startup instead of being tolerated; oversized upstream websocket / replay HTTP payloads are rejected earlier

Motivation

Business motivation:

  • Reduce operational risk on the direct ingress and provider paths users are most likely to run in production.
  • Keep malformed configs and hostile upstream payloads from turning into oversized allocations, indefinite waits, or ambiguous runtime metadata.

Technical motivation:

  • Runtime extensions accepted empty names/ids/tags and unbounded read_buffer_bytes.
  • Provider websocket clients lacked explicit tungstenite frame/message caps.
  • Websocket replay HTTP calls did not fail early on non-success responses or oversized bodies.
  • These are correctness and security-hardening gaps on externally facing runtime surfaces.

Alternative approaches considered:

  • Leave validation to extension authors or deploy tooling. Rejected because the runtime owns these invariants and must fail closed.
  • Soft-clamp invalid manifest values instead of rejecting startup. Rejected because silent mutation hides operator mistakes.
  • Depend on upstream provider limits alone. Rejected because SOF still owns its local memory and timeout bounds.

Scope and impact

  • Affected slices: framework, provider_stream, app/runtime
  • Data/API changes: no public API shape changes; invalid configs and oversized/bad upstream websocket replay responses now fail earlier
  • Backward compatibility: preserved for valid manifests and normal provider responses
  • Performance impact: none intended; this PR is hardening-only
  • Security impact: reduces memory abuse risk, ambiguous extension metadata, and replay/provider fail-open behavior

Testing

  • Unit tests
  • Integration tests
  • Manual verification
  • Performance checks (if applicable)
  • Security checks (if applicable)

Commands/results:

cargo test -p sof --lib startup_rejects_empty_extension_name -- --nocapture
cargo test -p sof --lib startup_rejects_empty_resource_id -- --nocapture
cargo test -p sof --lib startup_rejects_empty_shared_visibility_tag -- --nocapture
cargo test -p sof --lib startup_rejects_oversized_read_buffer_bytes -- --nocapture
cargo test -p sof --lib extension_websocket_transport_config_caps_frames_from_chunk_size -- --nocapture
cargo test -p sof --lib provider_stream::websocket:: --features provider-websocket -- --nocapture
cargo test -p sof --lib websocket_rpc_get_ --features provider-websocket -- --nocapture
cargo clippy -p sof --all-features --tests -- -D warnings
cargo check --workspace
cargo make ci
cargo audit

Results:

  • targeted regression tests passed
  • websocket provider module tests passed with provider-websocket
  • cargo clippy passed for sof all-features test build
  • cargo check --workspace passed
  • cargo make ci passed
  • cargo audit still reports upstream dependency issues in vendored Solana / helius-laserstream; this PR does not change those trees

Related issues and documentation

  • Fixes:
  • Related:
  • Architecture docs: docs/architecture/README.md
  • Relevant ARD/ADR:
    • docs/architecture/ard/0003-slice-dependency-contracts.md
    • docs/architecture/ard/0007-infrastructure-composition-and-runtime-model.md
    • docs/architecture/ard/0008-observability-and-operability-standards.md
  • Operations/runbook updates: none

Reviewer checklist

  • Code follows project standards and architecture constraints
  • Slice boundaries are respected (docs/architecture/ard/0003-slice-dependency-contracts.md)
  • Tests added/updated and passing
  • Documentation updated (README/docs/operations as needed)
  • No undocumented breaking change
  • Performance trade-offs documented where relevant
  • Security considerations addressed where relevant

Additional notes

  • Remaining cargo audit findings are upstream in dependency trees already present on main.
  • Patch bump is included because runtime validation behavior and published docs were updated together.

@Lythaeon
Copy link
Copy Markdown
Owner Author

Superseded by the same hardening work moved onto perf/runtime-churn-reduction as requested. Closing this branch-specific PR.

@Lythaeon Lythaeon closed this Apr 10, 2026
@Lythaeon Lythaeon deleted the fix/security-hardening-audit branch April 10, 2026 19:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant