feat(tests): add end-to-end tests for shell PTY and session management#1424
feat(tests): add end-to-end tests for shell PTY and session management#1424
Conversation
fix(tests): ensure cancelled commands properly kill processes
There was a problem hiding this comment.
Pull request overview
Adds end-to-end coverage for the interactive shell PTY flow and tightens session/context rotation expectations, alongside improving shell tool cancellation behavior.
Changes:
- Add a Unix-only PTY-driven E2E test suite covering multi-turn chat, approvals, questions, mode toggling, session resume/replay, /clear, and cancellation recovery.
- Update wire session tests to reflect
_system_promptpersistence and new context rotation naming (context_N.jsonl). - Ensure Shell tool cancels by killing the underlying process when the task is cancelled.
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
tests_e2e/test_wire_sessions.py |
Updates session/context assertions for _system_prompt and rotated context file naming. |
tests/tools/test_shell_bash.py |
Adds an async test to verify task cancellation kills the underlying KAOS process. |
tests/e2e/test_shell_pty_e2e.py |
New PTY-based E2E tests covering interactive shell behaviors and session management flows. |
tests/e2e/shell_pty_helpers.py |
New helper utilities for running the CLI in a PTY, driving input, and parsing wire/session artifacts. |
tests/e2e/__init__.py |
Marks tests/e2e as a package for imports. |
src/kimi_cli/tools/shell/__init__.py |
Adds CancelledError handling to kill the subprocess when a shell command run is cancelled. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| await process.kill() | ||
| raise | ||
| except TimeoutError: | ||
| await process.kill() | ||
| raise |
There was a problem hiding this comment.
On cancellation/timeout you call process.kill() and immediately re-raise, but you never wait() afterwards. With asyncio subprocesses (and KAOS’ own tests), killing without waiting can leave a zombie process until GC. Consider ensuring the process is reaped (e.g., kill then wait with a short timeout) in both the CancelledError and TimeoutError paths.
| await process.kill() | |
| raise | |
| except TimeoutError: | |
| await process.kill() | |
| raise | |
| try: | |
| await process.kill() | |
| try: | |
| # Ensure the process is reaped to avoid zombies. | |
| await asyncio.wait_for(process.wait(), 1.0) | |
| except Exception: | |
| # Ignore errors during best-effort cleanup. | |
| pass | |
| finally: | |
| raise | |
| except TimeoutError: | |
| try: | |
| await process.kill() | |
| try: | |
| # Ensure the process is reaped to avoid zombies. | |
| await asyncio.wait_for(process.wait(), 1.0) | |
| except Exception: | |
| # Ignore errors during best-effort cleanup. | |
| pass | |
| finally: | |
| raise |
|
|
||
| after_mark = shell.mark() | ||
| shell.send_line("history-after-clear") | ||
| shell.read_until_contains("Before clear result.", after=after_mark) |
There was a problem hiding this comment.
After clearing the context, the second scripted turn should assert the post-clear scripted response. Right now it waits for "Before clear result." again, which contradicts the scripted config ("After clear result.") and would allow a replay bug to slip through or fail incorrectly.
| shell.read_until_contains("Before clear result.", after=after_mark) | |
| shell.read_until_contains("After clear result.", after=after_mark) |
| time.sleep(2.3) | ||
| assert not (work_dir / "cancel_output.txt").exists() |
There was a problem hiding this comment.
This test uses a fixed time.sleep(2.3) to assert the cancelled command didn’t create the output file. Fixed sleeps are prone to CI flakiness (slow machines) and unnecessarily slow fast runs; prefer polling with a deadline (e.g., loop until either the file appears or a timeout expires).
| time.sleep(2.3) | |
| assert not (work_dir / "cancel_output.txt").exists() | |
| deadline = time.monotonic() + 5.0 | |
| cancel_output = work_dir / "cancel_output.txt" | |
| while time.monotonic() < deadline: | |
| if cancel_output.exists(): | |
| pytest.fail("cancel_output.txt was created despite command cancellation") | |
| time.sleep(0.1) | |
| assert not cancel_output.exists() |
- Update expected text for "Other" input in question test to match
_PromptLiveView output ("Enter the custom answer, then press Enter.")
- Add _send_key_and_wait helper with retry logic for flaky key presses
- Increase default timeouts from 8s to 15s for CI environments
- Skip test_shell_clear test (Reload hangs in inline prompt mode)
- Remove NO_COLOR from PTY env to ensure Rich renders properly on CI
8154cc5 to
bd7b171
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 777862971b
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| ) | ||
| return await process.wait() | ||
| except asyncio.CancelledError: | ||
| await process.kill() |
There was a problem hiding this comment.
Guard kill call when cancelled command already exited
If cancellation is delivered just after the subprocess has already finished (for example, user presses Esc as a short command exits), this unconditional kill() can raise ProcessLookupError (as asyncio.subprocess.Process.kill() does on exited processes). That replaces the expected CancelledError with an unexpected failure, so the shell turn may surface an internal error instead of a clean interruption; check process.returncode or suppress ProcessLookupError during cleanup.
Useful? React with 👍 / 👎.
The question roundtrip test was failing because prompt_toolkit's differential renderer fragments text across cursor-positioning escape sequences. After CSI stripping, the literal "Need anything else?" was mangled (e.g. "Neednything else"), so read_until_contains never matched. The retry loop then accidentally answered the second question too. Fix: wait for the "✓" checkmark in the tab bar instead – it's a Unicode character unaffected by CSI stripping and uniquely signals that Q1 was answered and Q2 is now displayed. Also fix the reject-and-recover test by waiting for the "Used Shell" marker before looking for the prompt, avoiding a mid-turn ✨ match.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: f03cb659df
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| ) | ||
| return await process.wait() | ||
| except asyncio.CancelledError: | ||
| await process.kill() |
There was a problem hiding this comment.
Guard kill when cancellation races process exit
When cancellation lands right after the subprocess has already exited, await process.kill() can raise ProcessLookupError (the local KAOS wrapper directly calls asyncio.subprocess.Process.kill()), which replaces the expected CancelledError with an internal failure. This can surface sporadic tool errors when users interrupt near command completion; check process.returncode or suppress ProcessLookupError before re-raising cancellation.
Useful? React with 👍 / 👎.
Checklist
make gen-changelogto update the changelog.make gen-docsto update the user documentation.