Skip to content

feat: support checkpoint-based continuation after unexpected long-task interruptions #217

@huangrichao2020

Description

@huangrichao2020

Scenario

This is a real pitfall ordinary developers will hit.

When a long task is halfway done, the machine may lose network access, go to sleep, run out of memory, have its process cleaned up, close the window manually, or GenericAgent itself may exit unexpectedly. For users, the worst part is not the failure itself, but losing all task context and not knowing what had already been done.

Current Pain Points

  • After an unexpected interruption, GenericAgent can only restart the task from scratch.
  • There is no task-level checkpoint covering the current goal, last tool call, key files, phase state, and unfinished items.
  • After restart, GenericAgent does not know whether there was an unfinished task.
  • The user has to describe the context again, which makes long tasks painful.

Suggested Direction

Add a checkpoint-first task continuation mechanism:

  • Save task_id, user intent, source, cwd, and start time when a task starts.
  • Save tool name, argument summary, result summary, and current phase before/after each tool call.
  • Save a short summary, next step, and risks after each LLM turn.
  • Archive the checkpoint when the task completes.
  • On startup, detect fresh checkpoints. If the interruption was recent, generate an automatic resume prompt; if it is too old, ask the user whether to continue.

A lightweight implementation could use an atomic state file such as temp/ga_state.json, with checkpoint.status = in_progress/interrupted, plus a recent task journal. This does not require a heavy database and can already cover most real-world cases.

Acceptance Criteria

  • After force-killing the process and restarting, GenericAgent can detect the last unfinished task.
  • It can display the last task, breakpoint, last tool, and whether the checkpoint is fresh.
  • When the user replies “continue”, the agent resumes from the checkpoint instead of starting from zero.
  • After completion, the checkpoint is archived and does not pollute the next task.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions