Problem
Every CLI response enters the agent's context window and persists for the conversation. In a typical 12-scenario session with ~36 SpecLeft CLI calls, unoptimised responses accumulate ~10,800 tokens of JSON output — more than double the MCP declaration overhead. CLI response size is the largest controllable token cost in SpecLeft's footprint.
Goal
Make SpecLeft the most context-efficient developer tool for AI coding agents. Target: 49% total session token reduction compared to current unoptimised output.
Scope
1. TTY-aware default output format
Current: --format table is the default. Agents must pass --format json on every call.
Change: Auto-detect output format based on terminal attachment.
if sys.stdout.isatty():
default_format = "table" # Human at terminal
else:
default_format = "json" # Agent, pipe, subprocess, CI
- JSON is automatic when called from agents or scripts
- Table is automatic when a human is at the terminal
--format flag overrides auto-detection in either direction
--pretty flag outputs indented JSON for human-readable debugging
Impact: Removes --format json from every agent command invocation (~165 tokens/session). Eliminates a class of agent errors (forgetting the flag, getting unparseable table output). Reduces skill file size by ~57 tokens.
Apply to: All commands.
2. Compact JSON output (no indentation)
Current: JSON output uses json.dumps(result, indent=2) (or similar).
Change: Default JSON output uses minimal separators, no indentation.
json.dumps(result, separators=(',', ':'))
Example:
# Before: ~95 tokens
{
"next": [
{
"feature_id": "user-auth",
"scenario_id": "register-with-valid-email",
"priority": "critical",
"test_file": "tests/test_user_auth.py",
"test_function": "test_register_with_valid_email"
}
]
}
# After: ~45 tokens
{"next":[{"feature_id":"user-auth","scenario_id":"register-with-valid-email","priority":"critical","test_file":"tests/test_user_auth.py","test_function":"test_register_with_valid_email"}]}
--pretty flag available for indented JSON when needed
- Key names remain full-length (no abbreviation — clarity outweighs token savings)
Impact: ~53% reduction per response. ~1,800 tokens saved over a 36-call session.
Apply to: All commands with --format json.
3. Minimal success responses
Responses should include only information the agent needs to act on. On success, confirm the action. On failure, include detail for remediation.
specleft features validate
# Success: ~5 tokens
{"valid":true}
# Failure: full detail
{"valid":false,"errors":[{"file":"features/auth.md","line":12,"message":"Missing priority for scenario 'login-timeout'","fix":"Add 'priority: medium' to scenario metadata"}]}
Omit errors, warnings, features_checked, scenarios_checked when empty/irrelevant.
specleft coverage
# Pass (with --threshold): ~15 tokens
{"passed":true,"overall":100.0,"threshold":100}
# Fail: include only features below threshold
{"passed":false,"overall":64.6,"threshold":100,"below_threshold":[{"feature_id":"payments","coverage":37.5},{"feature_id":"sharing","coverage":50.0}]}
specleft features add / specleft features add-scenario
# Confirmation only: ~20 tokens
{"created":true,"feature_id":"user-auth","file":"features/user-auth.md"}
Do not echo back title, priority, description, or other fields the agent just sent.
specleft test skeleton
# Dry-run success: ~25 tokens
{"dry_run":true,"files_planned":3,"files":["tests/test_user_auth.py","tests/test_task_mgmt.py","tests/test_sharing.py"]}
# Write success: ~15 tokens
{"created":true,"files_written":3}
specleft init
# Success: ~40 tokens
{"success":true,"health":{"ok":true},"skill_file":".specleft/specleft_skill.md","skill_file_hash":"a1b2c3d4..."}
Only expand health detail on failure:
{"success":false,"health":{"ok":false,"python_version":"3.8.0","error":"Requires Python >= 3.10"}}
Impact: ~720 tokens saved over a session from reduced success response sizes.
Apply to: All commands.
4. --verbose flag on specleft status
Current: Status always returns full per-feature breakdown.
Change: Add --verbose flag for full output during implementation loops. Default status command will provide a summary
# Summary (default, for planning): ~30 tokens
{"features":3,"scenarios":12,"implemented":7,"skipped":5,"coverage_percent":58.3}
# Verbose (for full breakdown):
{"initialised":true,"summary":{"features":3,"scenarios":12,"implemented":7,"skipped":5,"coverage_percent":58.3},"by_priority":{...},"features":[...]}
Impact: ~170 tokens saved per status check during implementation. ~510 tokens over a session (3 mid-session checks).
5. Error responses with fix commands
When validation or enforcement fails, include the exact CLI command to fix the issue where possible.
{
"valid": false,
"errors": [
{
"file": "features/auth.md",
"line": 12,
"message": "Missing priority for scenario 'login-timeout'",
"fix_command": "specleft features add-scenario --feature auth --title 'login-timeout' --priority medium"
}
]
}
Rationale: Without fix_command, the agent reasons about how to fix the error (50-100 tokens of thinking, potential retry). With it, the agent executes directly. Net saving: ~30-80 tokens per error plus avoided incorrect fix attempts.
Apply to: features validate, enforce, contract test, doctor.
6. SPECLEFT_COMPACT environment variable
A single environment variable that enables all compact optimisations simultaneously.
export SPECLEFT_COMPACT=1
When set:
- JSON output uses minimal separators (same as default non-TTY behaviour)
- Success responses are minimal (omit empty arrays, zero counts)
status returns summary-only by default
next defaults to --limit 1
This allows the skill file to set the mode once at the top rather than passing optimisation flags to every command:
## Setup
export SPECLEFT_COMPACT=1
All commands below use compact output mode.
Implementation: Check os.environ.get("SPECLEFT_COMPACT") in the CLI output layer. Individual flags (e.g., --summary, --limit) still override when explicitly passed.
Impact: Zero per-command overhead. Single setup instruction in skill file. Ensures all optimisations are active without relying on the agent to remember flags.
7. Skill file updates for optimised workflow
Update the generated skill file to reflect all output optimisations:
## Setup
export SPECLEFT_COMPACT=1
## Workflow
1. specleft next --limit 1 → pick one scenario
2. Implement test logic
3. specleft features validate → exit code 0 = valid, only parse output on failure
4. pytest → run test
5. Repeat
## Quick checks
- Validation: check exit code first, parse JSON only on failure
- Coverage: specleft coverage --threshold 100 → exit code 0 = met
- Status: specleft status for progress during implementation
Teach the exit-code-first pattern: agents check $? before parsing JSON, saving ~25 tokens per successful validation call.
Out of scope (deferred)
| Item |
Reason |
Abbreviated key names (f instead of feature_id) |
Readability cost outweighs token savings |
--since-hash differential status |
Implementation complexity, --summary covers the need |
--after stateless pagination on next |
Quality-of-life, not token optimisation |
Response deduplication (grouped next output) |
--limit 1 already eliminates duplication |
Token impact summary
| Optimisation |
Per-session savings |
| Compact JSON (no indentation) |
~1,800 tokens |
--limit 1 default in COMPACT mode |
~2,160 tokens |
--summary on status checks |
~510 tokens |
| Minimal success responses |
~720 tokens |
| Confirmation-only on writes |
~480 tokens |
| TTY-aware JSON default (flag removal) |
~165 tokens |
| Skill file reduction |
~57 tokens |
| Total |
~5,892 tokens |
Baseline unoptimised CLI response cost: ~10,800 tokens/session.
Optimised CLI response cost: ~4,908 tokens/session.
Reduction: 55%.
Combined with MCP declaration overhead (~2,550) and one-time reads (~1,523), total SpecLeft footprint per 30-turn session: ~8,981 tokens.
Acceptance criteria
Problem
Every CLI response enters the agent's context window and persists for the conversation. In a typical 12-scenario session with ~36 SpecLeft CLI calls, unoptimised responses accumulate ~10,800 tokens of JSON output — more than double the MCP declaration overhead. CLI response size is the largest controllable token cost in SpecLeft's footprint.
Goal
Make SpecLeft the most context-efficient developer tool for AI coding agents. Target: 49% total session token reduction compared to current unoptimised output.
Scope
1. TTY-aware default output format
Current:
--format tableis the default. Agents must pass--format jsonon every call.Change: Auto-detect output format based on terminal attachment.
--formatflag overrides auto-detection in either direction--prettyflag outputs indented JSON for human-readable debuggingImpact: Removes
--format jsonfrom every agent command invocation (~165 tokens/session). Eliminates a class of agent errors (forgetting the flag, getting unparseable table output). Reduces skill file size by ~57 tokens.Apply to: All commands.
2. Compact JSON output (no indentation)
Current: JSON output uses
json.dumps(result, indent=2)(or similar).Change: Default JSON output uses minimal separators, no indentation.
Example:
--prettyflag available for indented JSON when neededImpact: ~53% reduction per response. ~1,800 tokens saved over a 36-call session.
Apply to: All commands with
--format json.3. Minimal success responses
Responses should include only information the agent needs to act on. On success, confirm the action. On failure, include detail for remediation.
specleft features validateOmit
errors,warnings,features_checked,scenarios_checkedwhen empty/irrelevant.specleft coveragespecleft features add/specleft features add-scenarioDo not echo back title, priority, description, or other fields the agent just sent.
specleft test skeletonspecleft initOnly expand
healthdetail on failure:{"success":false,"health":{"ok":false,"python_version":"3.8.0","error":"Requires Python >= 3.10"}}Impact: ~720 tokens saved over a session from reduced success response sizes.
Apply to: All commands.
4.
--verboseflag onspecleft statusCurrent: Status always returns full per-feature breakdown.
Change: Add
--verboseflag for full output during implementation loops. Defaultstatuscommand will provide a summaryImpact: ~170 tokens saved per status check during implementation. ~510 tokens over a session (3 mid-session checks).
5. Error responses with fix commands
When validation or enforcement fails, include the exact CLI command to fix the issue where possible.
{ "valid": false, "errors": [ { "file": "features/auth.md", "line": 12, "message": "Missing priority for scenario 'login-timeout'", "fix_command": "specleft features add-scenario --feature auth --title 'login-timeout' --priority medium" } ] }Rationale: Without
fix_command, the agent reasons about how to fix the error (50-100 tokens of thinking, potential retry). With it, the agent executes directly. Net saving: ~30-80 tokens per error plus avoided incorrect fix attempts.Apply to:
features validate,enforce,contract test,doctor.6.
SPECLEFT_COMPACTenvironment variableA single environment variable that enables all compact optimisations simultaneously.
export SPECLEFT_COMPACT=1When set:
statusreturns summary-only by defaultnextdefaults to--limit 1This allows the skill file to set the mode once at the top rather than passing optimisation flags to every command:
## Setup export SPECLEFT_COMPACT=1 All commands below use compact output mode.Implementation: Check
os.environ.get("SPECLEFT_COMPACT")in the CLI output layer. Individual flags (e.g.,--summary,--limit) still override when explicitly passed.Impact: Zero per-command overhead. Single setup instruction in skill file. Ensures all optimisations are active without relying on the agent to remember flags.
7. Skill file updates for optimised workflow
Update the generated skill file to reflect all output optimisations:
Teach the exit-code-first pattern: agents check
$?before parsing JSON, saving ~25 tokens per successful validation call.Out of scope (deferred)
finstead offeature_id)--since-hashdifferential status--summarycovers the need--afterstateless pagination onnextnextoutput)--limit 1already eliminates duplicationToken impact summary
--limit 1default in COMPACT mode--summaryon status checksBaseline unoptimised CLI response cost: ~10,800 tokens/session.
Optimised CLI response cost: ~4,908 tokens/session.
Reduction: 55%.
Combined with MCP declaration overhead (~2,550) and one-time reads (~1,523), total SpecLeft footprint per 30-turn session: ~8,981 tokens.
Acceptance criteria
separators=(',', ':')with no indentation by default--prettyflag available on all commands for indented JSON--format tableexplicitly overrides auto-detection for human usespecleft statusreturns summary object onlyspecleft status --verbosereturns full status of the projectSPECLEFT_COMPACT=1activates all compact defaultsfix_commandwhere a CLI fix is deterministicSPECLEFT_COMPACTsetup and--limit 1workflow