Skip to content

feat(tests): add end-to-end tests for shell PTY and session management#1424

Merged
RealKai42 merged 7 commits intomainfrom
kaiyi/e2e-test-plan
Mar 13, 2026
Merged

feat(tests): add end-to-end tests for shell PTY and session management#1424
RealKai42 merged 7 commits intomainfrom
kaiyi/e2e-test-plan

Conversation

@RealKai42
Copy link
Copy Markdown
Collaborator

@RealKai42 RealKai42 commented Mar 12, 2026

Checklist

  • I have read the CONTRIBUTING document.
  • I have linked the related issue, if any.
  • I have added tests that prove my fix is effective or that my feature works.
  • I have run make gen-changelog to update the changelog.
  • I have run make gen-docs to update the user documentation.

Open with Devin

Copilot AI review requested due to automatic review settings March 12, 2026 14:37
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds end-to-end coverage for the interactive shell PTY flow and tightens session/context rotation expectations, alongside improving shell tool cancellation behavior.

Changes:

  • Add a Unix-only PTY-driven E2E test suite covering multi-turn chat, approvals, questions, mode toggling, session resume/replay, /clear, and cancellation recovery.
  • Update wire session tests to reflect _system_prompt persistence and new context rotation naming (context_N.jsonl).
  • Ensure Shell tool cancels by killing the underlying process when the task is cancelled.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
tests_e2e/test_wire_sessions.py Updates session/context assertions for _system_prompt and rotated context file naming.
tests/tools/test_shell_bash.py Adds an async test to verify task cancellation kills the underlying KAOS process.
tests/e2e/test_shell_pty_e2e.py New PTY-based E2E tests covering interactive shell behaviors and session management flows.
tests/e2e/shell_pty_helpers.py New helper utilities for running the CLI in a PTY, driving input, and parsing wire/session artifacts.
tests/e2e/__init__.py Marks tests/e2e as a package for imports.
src/kimi_cli/tools/shell/__init__.py Adds CancelledError handling to kill the subprocess when a shell command run is cancelled.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +122 to 126
await process.kill()
raise
except TimeoutError:
await process.kill()
raise
Copy link

Copilot AI Mar 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On cancellation/timeout you call process.kill() and immediately re-raise, but you never wait() afterwards. With asyncio subprocesses (and KAOS’ own tests), killing without waiting can leave a zombie process until GC. Consider ensuring the process is reaped (e.g., kill then wait with a short timeout) in both the CancelledError and TimeoutError paths.

Suggested change
await process.kill()
raise
except TimeoutError:
await process.kill()
raise
try:
await process.kill()
try:
# Ensure the process is reaped to avoid zombies.
await asyncio.wait_for(process.wait(), 1.0)
except Exception:
# Ignore errors during best-effort cleanup.
pass
finally:
raise
except TimeoutError:
try:
await process.kill()
try:
# Ensure the process is reaped to avoid zombies.
await asyncio.wait_for(process.wait(), 1.0)
except Exception:
# Ignore errors during best-effort cleanup.
pass
finally:
raise

Copilot uses AI. Check for mistakes.

after_mark = shell.mark()
shell.send_line("history-after-clear")
shell.read_until_contains("Before clear result.", after=after_mark)
Copy link

Copilot AI Mar 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After clearing the context, the second scripted turn should assert the post-clear scripted response. Right now it waits for "Before clear result." again, which contradicts the scripted config ("After clear result.") and would allow a replay bug to slip through or fail incorrectly.

Suggested change
shell.read_until_contains("Before clear result.", after=after_mark)
shell.read_until_contains("After clear result.", after=after_mark)

Copilot uses AI. Check for mistakes.
Comment on lines +468 to +469
time.sleep(2.3)
assert not (work_dir / "cancel_output.txt").exists()
Copy link

Copilot AI Mar 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test uses a fixed time.sleep(2.3) to assert the cancelled command didn’t create the output file. Fixed sleeps are prone to CI flakiness (slow machines) and unnecessarily slow fast runs; prefer polling with a deadline (e.g., loop until either the file appears or a timeout expires).

Suggested change
time.sleep(2.3)
assert not (work_dir / "cancel_output.txt").exists()
deadline = time.monotonic() + 5.0
cancel_output = work_dir / "cancel_output.txt"
while time.monotonic() < deadline:
if cancel_output.exists():
pytest.fail("cancel_output.txt was created despite command cancellation")
time.sleep(0.1)
assert not cancel_output.exists()

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Contributor

@devin-ai-integration devin-ai-integration bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Devin Review: No Issues Found

Devin Review analyzed this PR and found no potential bugs to report.

View in Devin Review to see 3 additional findings.

Open in Devin Review

- Update expected text for "Other" input in question test to match
  _PromptLiveView output ("Enter the custom answer, then press Enter.")
- Add _send_key_and_wait helper with retry logic for flaky key presses
- Increase default timeouts from 8s to 15s for CI environments
- Skip test_shell_clear test (Reload hangs in inline prompt mode)
- Remove NO_COLOR from PTY env to ensure Rich renders properly on CI
@RealKai42 RealKai42 force-pushed the kaiyi/e2e-test-plan branch from 8154cc5 to bd7b171 Compare March 12, 2026 15:43
@RealKai42 RealKai42 closed this Mar 12, 2026
@RealKai42 RealKai42 reopened this Mar 12, 2026
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 777862971b

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

)
return await process.wait()
except asyncio.CancelledError:
await process.kill()
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Guard kill call when cancelled command already exited

If cancellation is delivered just after the subprocess has already finished (for example, user presses Esc as a short command exits), this unconditional kill() can raise ProcessLookupError (as asyncio.subprocess.Process.kill() does on exited processes). That replaces the expected CancelledError with an unexpected failure, so the shell turn may surface an internal error instead of a clean interruption; check process.returncode or suppress ProcessLookupError during cleanup.

Useful? React with 👍 / 👎.

The question roundtrip test was failing because prompt_toolkit's
differential renderer fragments text across cursor-positioning escape
sequences. After CSI stripping, the literal "Need anything else?" was
mangled (e.g. "Neednything else"), so read_until_contains never matched.
The retry loop then accidentally answered the second question too.

Fix: wait for the "✓" checkmark in the tab bar instead – it's a Unicode
character unaffected by CSI stripping and uniquely signals that Q1 was
answered and Q2 is now displayed.

Also fix the reject-and-recover test by waiting for the "Used Shell"
marker before looking for the prompt, avoiding a mid-turn ✨ match.
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: f03cb659df

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

)
return await process.wait()
except asyncio.CancelledError:
await process.kill()
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Guard kill when cancellation races process exit

When cancellation lands right after the subprocess has already exited, await process.kill() can raise ProcessLookupError (the local KAOS wrapper directly calls asyncio.subprocess.Process.kill()), which replaces the expected CancelledError with an internal failure. This can surface sporadic tool errors when users interrupt near command completion; check process.returncode or suppress ProcessLookupError before re-raising cancellation.

Useful? React with 👍 / 👎.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants