feat: implement new_session:true support for multi-turn task session … by HaotianChen616 · Pull Request #330 · pinchbench/skill

HaotianChen616 · 2026-04-15T08:10:41Z

feat: implement full multi-turn user prompt support with session isolation

Summary

PinchBench currently has partial multi-session support — the sessions field in task frontmatter is processed to send sequential prompts, but the new_session: true flag (already defined in task_second_brain.md) is completely ignored by the Python codebase. This means tasks that need to test cross-session memory or fresh-context behavior cannot work correctly.

This PR implements full multi-turn user prompt support by:

Implementing new_session: true session isolation in lib_agent.py — when a session entry has new_session: true, the agent's session state is cleaned up and a new session_id is generated, simulating a user returning after closing the agent. The workspace (and any files created) is preserved across sessions.
Adding transcript archiving per session — before starting a new session, the current session's transcript is archived. After all sessions complete, transcripts are merged so the grading engine can evaluate the full conversation history.
Documenting multi-session tasks in TASK_TEMPLATE.md — adds a complete "Multi-Session Tasks" section with field descriptions, usage guidelines, and YAML examples.
Adding two new multi-turn tasks:
- task_iterative_code_refine.md — tests iterative code refinement across 3 sessions, with the final session using new_session: true to verify the agent can work from file-based context alone
- task_session_chain_analysis.md — tests 4-turn structured code analysis (single-session multi-turn) where the agent reads TypeScript source files, produces JSON chain maps, designs minimal changes, extracts code evidence, and writes a delivery summary
Adding tests — test_multi_session.py covers frontmatter parsing, new_session flag handling, transcript archiving, and task loading.

Changes

`scripts/lib_agent.py`

Added _archive_transcript() helper to save per-session transcripts before session resets
Modified execute_openclaw_task() multi-session loop to:
- Track current_session_id separately (changes when new_session: true)
- Call cleanup_agent_sessions() and generate a new session ID when new_session: true
- Log session transitions clearly
After all sessions complete, merge archived transcripts with the final session transcript when new_session was used

`tasks/TASK_TEMPLATE.md`

Added sessions and new_session YAML examples to the frontmatter template
Added "Multi-Session Tasks" documentation section with field table, how-it-works explanation, and usage guidelines
Added multi-session checklist items

`tasks/task_iterative_code_refine.md` (new)

3-session task: initial implementation → add error handling → fresh-session review
Tests iterative refinement and cross-session file-based context retrieval
Automated grading checks both calculator.py and review.txt

`tasks/task_session_chain_analysis.md` (new)

4-session multi-turn task (single conversation context, no new_session): structured ingest → minimal design → evidence extraction → delivery summary
Source files from the OpenClaw agent command pipeline (session.ts, session-store.ts, delivery.ts, agent-command.ts) copied to tasks/assets/session_chain/
Tests the agent's ability to maintain context across turns, produce structured JSON, extract precise code evidence, and generate traceable design review artifacts
Automated grading checks chain_map stages, evidence_index, function references, JSON validity, and traceability

`tasks/assets/session_chain/` (new)

session.ts — OpenClaw session resolution logic (resolveSession, resolveSessionKeyForRequest)
session-store.ts — Session store update logic (updateSessionStoreAfterAgentRun)
delivery.ts — Agent command delivery logic (deliverAgentCommandResult)
agent-command.ts — Agent command execution pipeline (prepareAgentCommandExecution, agentCommandInternal)

`tasks/manifest.yaml`

Added task_iterative_code_refine and task_session_chain_analysis to the task list

`tests/test_multi_session.py` (new)

TestMultiSessionFrontmatterParsing — sessions list, new_session flag, string entries
TestNewSessionHandling — archive-before-cleanup behavior verification
TestArchiveTranscript — transcript archiving with/without transcript path
TestMultiSessionTaskLoading — integration test verifying task_second_brain and task_iterative_code_refine load correctly

Motivation

Real-world AI coding agents are used in multi-turn conversations. Users:

Ask agents to create something, then refine it iteratively
Return to agents in new sessions expecting them to pick up where they left off using files/context
Need agents that can work with both conversation history AND file-based persistence

PinchBench should test all of these scenarios. The existing task_second_brain.md was designed to test cross-session memory, but it couldn't work correctly because new_session: true was never implemented.

Testing

# Run the new tests
cd /path/to/skill
uv run pytest tests/test_multi_session.py -v

# Lint check
ruff check scripts/lib_agent.py

Backward Compatibility

Single-session tasks: No change in behavior
Multi-session tasks without new_session: No change in behavior (all prompts share one session, same as before)
Multi-session tasks with new_session: true: Now correctly isolates sessions. Previously, new_session was silently ignored, so this is a bug fix that makes task_second_brain.md work as originally intended.

HaotianChen616 · 2026-04-15T08:12:31Z

#329

kilo-code-bot · 2026-04-15T08:12:54Z

Code Review Summary

Status: No Issues Found | Recommendation: Merge

The new commits revert the new_session: true session isolation implementation that was added in the initial version of this PR. The revert cleanly removes _archive_transcript, the session-isolation loop changes, transcript merging logic, documentation additions, asset files, and tests. No new issues introduced.

Files Reviewed (5 files changed in new commits)

scripts/lib_agent.py — reverted new_session handling
tasks/TASK_TEMPLATE.md — reverted multi-session docs
tasks/assets/session_chain/agent-command.ts — deleted
tasks/assets/session_chain/delivery.ts — deleted
tests/test_multi_session.py — deleted

_{Reviewed by claude-4.6-sonnet-20260217 · 172,090 tokens}

ScuttleBot · 2026-04-16T14:00:56Z

👋 Hi @HaotianChen616! I'm @olearycrew's OpenClaw bot doing a triage pass.

Heads up — this PR now has merge conflicts with main, likely from the batch of task PRs that were merged yesterday (#315, #319, #320, #322, and others all touched manifest.yaml).

Would you be able to rebase on main when you get a chance? The Kilo Code Review still shows no issues and recommends merge, so once conflicts are resolved this should be good to go.

Thanks for implementing the new_session:true support! 🦀

HaotianChen616 · 2026-04-16T17:55:09Z

👋 Hi @HaotianChen616! I'm @olearycrew's OpenClaw bot doing a triage pass.

Heads up — this PR now has merge conflicts with main, likely from the batch of task PRs that were merged yesterday (#315, #319, #320, #322, and others all touched manifest.yaml).

Would you be able to rebase on main when you get a chance? The Kilo Code Review still shows no issues and recommends merge, so once conflicts are resolved this should be good to go.

Thanks for implementing the new_session:true support! 🦀

Of course — thanks for the heads up! I'll rebase on main to resolve the merge conflicts and follow up once that's done.

…isolation - Implement new_session: true in lib_agent.py: when a session entry has new_session: true, archive the current transcript, clean up the agent's session state, and generate a new session_id so the agent starts with no conversation history (workspace files are preserved) - Add _archive_transcript() helper for per-session transcript preservation before session resets - Merge all archived session transcripts after all sessions complete so the grading engine can inspect the full conversation history - Document sessions and new_session fields in TASK_TEMPLATE.md with a dedicated Multi-Session Tasks section and updated author checklist - Add task_iterative_code_refine: a 3-session iterative refinement task demonstrating multi-turn conversation and new_session isolation - Add test_multi_session.py with 11 tests covering frontmatter parsing, new_session flag detection, transcript archiving, and integration

HaotianChen616 · 2026-04-17T01:38:43Z

👋 Hi @HaotianChen616! I'm @olearycrew's OpenClaw bot doing a triage pass.

Heads up — this PR now has merge conflicts with main, likely from the batch of task PRs that were merged yesterday (#315, #319, #320, #322, and others all touched manifest.yaml).

Would you be able to rebase on main when you get a chance? The Kilo Code Review still shows no issues and recommends merge, so once conflicts are resolved this should be good to go.

Thanks for implementing the new_session:true support! 🦀

I've rebased on main and resolved the conflict. The PR should now be conflict-free and ready to merge.

HaotianChen616 · 2026-04-17T03:35:32Z

@olearycrew Rebased on main and resolved the merge conflicts. Should be good to go now — CI and code review both look clean. Let me know if anything else is needed.

HaotianChen616 force-pushed the feat/multi-session-new-session-support branch from 765c841 to ac6734e Compare April 17, 2026 01:34

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: implement new_session:true support for multi-turn task session …#330

feat: implement new_session:true support for multi-turn task session …#330
HaotianChen616 wants to merge 1 commit intopinchbench:mainfrom
HaotianChen616:feat/multi-session-new-session-support

HaotianChen616 commented Apr 15, 2026

Uh oh!

HaotianChen616 commented Apr 15, 2026

Uh oh!

kilo-code-bot bot commented Apr 15, 2026 •

edited

Loading

Uh oh!

ScuttleBot commented Apr 16, 2026

Uh oh!

HaotianChen616 commented Apr 16, 2026

Uh oh!

HaotianChen616 commented Apr 17, 2026

Uh oh!

HaotianChen616 commented Apr 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

HaotianChen616 commented Apr 15, 2026

feat: implement full multi-turn user prompt support with session isolation

Summary

Changes

scripts/lib_agent.py

tasks/TASK_TEMPLATE.md

tasks/task_iterative_code_refine.md (new)

tasks/task_session_chain_analysis.md (new)

tasks/assets/session_chain/ (new)

tasks/manifest.yaml

tests/test_multi_session.py (new)

Motivation

Testing

Backward Compatibility

Uh oh!

HaotianChen616 commented Apr 15, 2026

Uh oh!

kilo-code-bot bot commented Apr 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Code Review Summary

Uh oh!

ScuttleBot commented Apr 16, 2026

Uh oh!

HaotianChen616 commented Apr 16, 2026

Uh oh!

HaotianChen616 commented Apr 17, 2026

Uh oh!

HaotianChen616 commented Apr 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

`scripts/lib_agent.py`

`tasks/TASK_TEMPLATE.md`

`tasks/task_iterative_code_refine.md` (new)

`tasks/task_session_chain_analysis.md` (new)

`tasks/assets/session_chain/` (new)

`tasks/manifest.yaml`

`tests/test_multi_session.py` (new)

kilo-code-bot bot commented Apr 15, 2026 •

edited

Loading