Skip to content

feat(waypoint): session checkpoint plugin for multi-session progress tracking#2

Merged
armstrongl merged 7 commits intomainfrom
feat/waypoint-plugin
Apr 21, 2026
Merged

feat(waypoint): session checkpoint plugin for multi-session progress tracking#2
armstrongl merged 7 commits intomainfrom
feat/waypoint-plugin

Conversation

@armstrongl
Copy link
Copy Markdown

Summary

  • Adds waypoint plugin: lightweight, automatic session checkpoints for multi-session projects
  • Stop hook blocks session end until .context/checkpoint.md is written with structured handoff context
  • CLAUDE.md instructions handle session-start checkpoint reading, plan checkbox auto-completion, and commit message plan anchoring
  • Includes ideation doc with full exploration of 40 candidates narrowed to 6 survivors

How it works

  1. Session end: Stop hook detects stale/missing checkpoint, blocks the agent, and instructs it to write one
  2. Session start: CLAUDE.md tells the agent to read the checkpoint before doing anything else
  3. During work: Agent updates plan checkboxes and includes [unit:N] in commit messages

Files

  • plugins/waypoint/.claude-plugin/plugin.json — manifest with Stop hook
  • plugins/waypoint/.cursor-plugin/plugin.json — Cursor compat
  • plugins/waypoint/hooks/stop-checkpoint.sh — blocks stop if checkpoint stale (>5 min)
  • plugins/waypoint/CLAUDE.md — session-start, during-work, and session-end instructions
  • docs/ideation/2026-04-13-session-progress-tracking-ideation.md — ideation artifact

Test plan

  • Install plugin via claude plugin add ./plugins/waypoint
  • Start a session, verify CLAUDE.md instructions load
  • End a session without writing checkpoint — verify hook blocks and instructs
  • Write checkpoint, end session — verify hook approves
  • Start new session — verify agent reads checkpoint first

…and sequencing

120 raw candidates from 12 sub-agents across 10 ideation frames
(information theory, PL semantics, systems architecture, biology,
organizational process, economics, linguistics, physics, design,
music/military/game theory). 30 ranked ideas, 69 rejections.

New ideas EveryInc#14-30 cover: empirical ablation, module unbundling,
circuit breaker, JIT specialization, Kolmogorov compression,
carrying cost budgeting, register mismatch correction, pidgin
instruction language, Schelling point architecture, renormalization
group flow, cartographic zoom levels, OODA decision manifests,
and congestion pricing.

Added implementation sequencing with dependency graph, phase
estimates, and speed x impact ranking. Recommended first move:
EveryInc#26 (Register Mismatch Correction).
…ch correction

Provides stakeholder-facing framing of the tutorial-to-specification register
correction initiative, with real before/after examples from ce-review and
projected per-skill savings estimates.
…sification and savings estimates

Catalogs 6 tutorial-register pattern classes (progressive explanation, redundant
clarification, motivational framing, inline rationale, hedging markers, indirect
speech acts) with structural detection heuristics, 2-axis severity ratings, and
transformation rules aligned to the skill compliance checklist.

Includes 12 before/after examples from ce-review, a rationale classification
framework with 6 worked borderline examples, HTML comment preservation rules,
and sampling-based savings estimates for all top-7 skills (19.6-25.4KB estimated
net savings across 283KB, correcting prior 20-30% projections to 8-13%).
Strip tutorial-register content (progressive explanation, redundant
clarification, motivational framing, inline rationale, hedging markers,
indirect speech acts) from the 7 largest skills using the methodology
from docs/references/register-mismatch-correction-methodology.md.

Total savings: 29,894 bytes (10.6% of 283,324 bytes across 7 skills).
22 HTML comments preserved for compliance-critical rationale.

Per-skill results:
- ce-work: 3,467 bytes (12.8%)
- ce-compound: 6,363 bytes (20.4%)
- ce-plan: 3,723 bytes (8.9%)
- ce-compound-refresh: 5,485 bytes (11.4%)
- ce-review: 6,145 bytes (11.2%)
- ce-work-beta: 3,952 bytes (12.3%)
- orchestrating-swarms: 759 bytes (1.6%)
…ess tracking

Lightweight plugin that automatically drops checkpoint breadcrumbs at session
boundaries. A Stop hook blocks the agent from ending until it writes a structured
checkpoint to .context/checkpoint.md. CLAUDE.md instructions handle session-start
reading, plan checkbox auto-completion, and commit message anchoring.
# Conflicts:
#	plugins/compound-engineering/skills/ce-plan/SKILL.md
#	plugins/compound-engineering/skills/ce-work-beta/SKILL.md
#	plugins/compound-engineering/skills/ce-work/SKILL.md
#	plugins/compound-engineering/skills/orchestrating-swarms/SKILL.md
armstrongl added a commit that referenced this pull request Apr 16, 2026
…yInc#21 for Batch 3

- Brainstorm #2 Lean Agent Dispatch: shared context dedup delivers
  ~140-195 KB/review by writing dispatch content to .context/ once
  instead of inlining ~16 KB per reviewer. Archetype cleanup deferred.
- Plan #3 Diff-Proportional Scaling: 5 units adding 4-tier caps
  (trivial/small/medium/large) with priority-ordered selection across
  Tier 1-4 reviewer categories.
- Plan EveryInc#8+EveryInc#21 Combined Dedup: 7 units covering AGENTS.md canonical
  sections, native tool guidance removal from 14 files, phase-scoping
  the critical "NEVER CODE" interference, semantic compression, and
  staleness checks.

Updates meta-plan tracking for all Batch 3 items.
armstrongl added a commit that referenced this pull request Apr 16, 2026
…ryInc#8+EveryInc#21, #2

Batch 3 (Phase 2 - Structural Dedup) execution:

- #3 Diff-Proportional Scaling: 5 units adding cap: arg, EXECUTABLE_LINES
  counting, tier table, priority ordering, and capped team announcement to
  ce-review SKILL.md. Trivial diffs now dispatch max 8 agents (was unbounded).

- EveryInc#8+EveryInc#21 Cross-Skill Pipeline Dedup: 7 units across 25 files. AGENTS.md
  canonical Cross-Platform Interaction Convention section, native tool guidance
  removal from 14 agent/skill files, phase-scoping "NEVER CODE" interference
  (ce-plan/ce-work/ce-review), semantic compression of 3 pipeline skills,
  question-tool drift staleness check, cross-reference matrix, and
  carrying-waste manifest (97% of pipeline content is phase-specific).

- #2 Lean Agent Dispatch: planned + executed 3 units. Write-once dispatch
  context deduplicates 15.7KB of shared template/schema/scope across N
  sub-agents. ~144KB savings per 10-reviewer dispatch. Lean prompts pass
  paths instead of inlining shared content.

All 701 tests pass (4 new staleness tests). release:validate clean.
armstrongl added a commit that referenced this pull request Apr 17, 2026
…mework and EveryInc#20 carrying cost

Build the two Phase 0 measurement tools that unlock all Phase 3 work:

EveryInc#14 Empirical Ablation Framework:
- Section parser (src/analysis/sections.ts): splits markdown by H2/H3 headers
- Variant generator (src/analysis/variants.ts): creates ablated skill copies
- Evaluator + quality scorer (src/analysis/evaluator.ts): single-prompt
  evaluation with four-axis scoring (coverage, precision, calibration, compliance)
- CLI scripts (scripts/ablation/run.ts, report.ts): orchestrate ablation runs
  with --dry-run, --section, --fixture flags and generate ranked reports
- 3 diff fixtures extracted from repo history
- 41 new tests

EveryInc#20 Carrying Cost Budgeting:
- scripts/skill/stats.ts: ranks 91 skills/agents by carrying cost
  (file_size x estimated_tool_calls) with recursive sub-agent cost
- Heuristic detects phase headers, tool instructions, agent dispatches,
  loop constructs, and bash code blocks
- Key finding: system cost ranking differs from file-size ranking;
  ce-optimize jumps from EveryInc#5 to #2 due to 146 estimated tool calls
- 20 new tests

Also re-created missing brainstorm docs for EveryInc#14 and EveryInc#20 (original
2026-04-13 artifacts were never committed).
@armstrongl
Copy link
Copy Markdown
Author

@copilot resolve the merge conflicts in this pull request

# Conflicts:
#	plugins/compound-engineering/skills/ce-code-review/SKILL.md
#	plugins/compound-engineering/skills/ce-compound-refresh/SKILL.md
#	plugins/compound-engineering/skills/ce-compound/SKILL.md

Co-authored-by: armstrongl <29762984+armstrongl@users.noreply.github.com>
Copy link
Copy Markdown

Copilot AI commented Apr 21, 2026

@copilot resolve the merge conflicts in this pull request

Resolved. The three conflicted skill files (ce-code-review/SKILL.md, ce-compound-refresh/SKILL.md, ce-compound/SKILL.md) had conflicts between the register-mismatch corrections in this branch and the updates that landed in main (renamed agents, updated instructions). Resolved by taking main's version for those files, then committed the merge in ca0eca6.

@armstrongl armstrongl merged commit b9266f2 into main Apr 21, 2026
armstrongl added a commit that referenced this pull request Apr 21, 2026
…yInc#21 for Batch 3

- Brainstorm #2 Lean Agent Dispatch: shared context dedup delivers
  ~140-195 KB/review by writing dispatch content to .context/ once
  instead of inlining ~16 KB per reviewer. Archetype cleanup deferred.
- Plan #3 Diff-Proportional Scaling: 5 units adding 4-tier caps
  (trivial/small/medium/large) with priority-ordered selection across
  Tier 1-4 reviewer categories.
- Plan EveryInc#8+EveryInc#21 Combined Dedup: 7 units covering AGENTS.md canonical
  sections, native tool guidance removal from 14 files, phase-scoping
  the critical "NEVER CODE" interference, semantic compression, and
  staleness checks.

Updates meta-plan tracking for all Batch 3 items.
armstrongl added a commit that referenced this pull request Apr 21, 2026
…ryInc#8+EveryInc#21, #2

Batch 3 (Phase 2 - Structural Dedup) execution:

- #3 Diff-Proportional Scaling: 5 units adding cap: arg, EXECUTABLE_LINES
  counting, tier table, priority ordering, and capped team announcement to
  ce-review SKILL.md. Trivial diffs now dispatch max 8 agents (was unbounded).

- EveryInc#8+EveryInc#21 Cross-Skill Pipeline Dedup: 7 units across 25 files. AGENTS.md
  canonical Cross-Platform Interaction Convention section, native tool guidance
  removal from 14 agent/skill files, phase-scoping "NEVER CODE" interference
  (ce-plan/ce-work/ce-review), semantic compression of 3 pipeline skills,
  question-tool drift staleness check, cross-reference matrix, and
  carrying-waste manifest (97% of pipeline content is phase-specific).

- #2 Lean Agent Dispatch: planned + executed 3 units. Write-once dispatch
  context deduplicates 15.7KB of shared template/schema/scope across N
  sub-agents. ~144KB savings per 10-reviewer dispatch. Lean prompts pass
  paths instead of inlining shared content.

All 701 tests pass (4 new staleness tests). release:validate clean.
armstrongl added a commit that referenced this pull request Apr 21, 2026
…mework and EveryInc#20 carrying cost

Build the two Phase 0 measurement tools that unlock all Phase 3 work:

EveryInc#14 Empirical Ablation Framework:
- Section parser (src/analysis/sections.ts): splits markdown by H2/H3 headers
- Variant generator (src/analysis/variants.ts): creates ablated skill copies
- Evaluator + quality scorer (src/analysis/evaluator.ts): single-prompt
  evaluation with four-axis scoring (coverage, precision, calibration, compliance)
- CLI scripts (scripts/ablation/run.ts, report.ts): orchestrate ablation runs
  with --dry-run, --section, --fixture flags and generate ranked reports
- 3 diff fixtures extracted from repo history
- 41 new tests

EveryInc#20 Carrying Cost Budgeting:
- scripts/skill/stats.ts: ranks 91 skills/agents by carrying cost
  (file_size x estimated_tool_calls) with recursive sub-agent cost
- Heuristic detects phase headers, tool instructions, agent dispatches,
  loop constructs, and bash code blocks
- Key finding: system cost ranking differs from file-size ranking;
  ce-optimize jumps from EveryInc#5 to #2 due to 146 estimated tool calls
- 20 new tests

Also re-created missing brainstorm docs for EveryInc#14 and EveryInc#20 (original
2026-04-13 artifacts were never committed).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants