Skip to content

Add resilience test coverage for frozen kernel, backpressure, and re-auth flow#52

Open
jverdicc wants to merge 1 commit intomainfrom
codex/create-agent-resilience-test-suite
Open

Add resilience test coverage for frozen kernel, backpressure, and re-auth flow#52
jverdicc wants to merge 1 commit intomainfrom
codex/create-agent-resilience-test-suite

Conversation

@jverdicc
Copy link
Owner

Motivation

  • Provide deterministic integration-level coverage for agent resilience cases around a Frozen kernel response, heavy client probing, and ledger re-authentication so regressions in fail-safe and backpressure handling are caught early.

Description

  • Add a new test harness at tests/agent_resilience.rs implementing a FakeKernel and AgentRuntime to simulate kernel status, bounded worker concurrency, and agent queueing semantics.
  • Implement three scenarios: kernel cut-off/frozen detection and fail-safe entry, heavy probing of 5,000 ClaimCapsule submissions to validate backpressure and bounded in-flight concurrency, and re-authentication after a ledger reset to ensure frozen-session state is cleared.
  • Wire the root-level test into the discos-client crate by adding a [[test]] entry in crates/discos-client/Cargo.toml so the integration runs with crate tests.
  • Tests are deterministic, self-contained, and do not require external services.

Testing

  • Ran cargo fmt --check and it succeeded.
  • Ran cargo clippy --workspace --all-targets -- -D warnings and it succeeded.
  • Ran cargo test -p discos-client --test agent_resilience and the added resilience suite passed (3 tests Ok).
  • Ran cargo test --workspace which exercises the whole workspace; the new tests did not cause the overall failure but the full workspace run failed due to pre-existing discos-core golden-vector test failures unrelated to these changes (two topicid golden-vector tests).

Codex Task

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 09039ec872

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".


let queue_peak = runtime.queue_peak().await;
drop(runtime);
sleep(Duration::from_millis(250)).await;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Replace fixed sleep with completion synchronization

This assertion depends on a hard-coded 250ms delay instead of waiting for the worker task to finish draining the channel, so the test can fail nondeterministically on slower or contended CI runners even when the runtime is correct. In this test we enqueue 5,000 capsules and then assert exact processed count; without an explicit completion signal (e.g., joining the worker/drain future), timing variance makes the suite flaky and violates the deterministic-resilience coverage goal.

Useful? React with 👍 / 👎.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant