Skip to content

CLI Output Optimisation for Agent Context Efficiency #94

@Dimwiddle

Description

@Dimwiddle

Problem

Every CLI response enters the agent's context window and persists for the conversation. In a typical 12-scenario session with ~36 SpecLeft CLI calls, unoptimised responses accumulate ~10,800 tokens of JSON output — more than double the MCP declaration overhead. CLI response size is the largest controllable token cost in SpecLeft's footprint.

Goal

Make SpecLeft the most context-efficient developer tool for AI coding agents. Target: 49% total session token reduction compared to current unoptimised output.


Scope

1. TTY-aware default output format

Current: --format table is the default. Agents must pass --format json on every call.

Change: Auto-detect output format based on terminal attachment.

if sys.stdout.isatty():
    default_format = "table"   # Human at terminal
else:
    default_format = "json"    # Agent, pipe, subprocess, CI
  • JSON is automatic when called from agents or scripts
  • Table is automatic when a human is at the terminal
  • --format flag overrides auto-detection in either direction
  • --pretty flag outputs indented JSON for human-readable debugging

Impact: Removes --format json from every agent command invocation (~165 tokens/session). Eliminates a class of agent errors (forgetting the flag, getting unparseable table output). Reduces skill file size by ~57 tokens.

Apply to: All commands.


2. Compact JSON output (no indentation)

Current: JSON output uses json.dumps(result, indent=2) (or similar).

Change: Default JSON output uses minimal separators, no indentation.

json.dumps(result, separators=(',', ':'))

Example:

# Before: ~95 tokens
{
    "next": [
        {
            "feature_id": "user-auth",
            "scenario_id": "register-with-valid-email",
            "priority": "critical",
            "test_file": "tests/test_user_auth.py",
            "test_function": "test_register_with_valid_email"
        }
    ]
}

# After: ~45 tokens
{"next":[{"feature_id":"user-auth","scenario_id":"register-with-valid-email","priority":"critical","test_file":"tests/test_user_auth.py","test_function":"test_register_with_valid_email"}]}
  • --pretty flag available for indented JSON when needed
  • Key names remain full-length (no abbreviation — clarity outweighs token savings)

Impact: ~53% reduction per response. ~1,800 tokens saved over a 36-call session.

Apply to: All commands with --format json.


3. Minimal success responses

Responses should include only information the agent needs to act on. On success, confirm the action. On failure, include detail for remediation.

specleft features validate

# Success: ~5 tokens
{"valid":true}

# Failure: full detail
{"valid":false,"errors":[{"file":"features/auth.md","line":12,"message":"Missing priority for scenario 'login-timeout'","fix":"Add 'priority: medium' to scenario metadata"}]}

Omit errors, warnings, features_checked, scenarios_checked when empty/irrelevant.

specleft coverage

# Pass (with --threshold): ~15 tokens
{"passed":true,"overall":100.0,"threshold":100}

# Fail: include only features below threshold
{"passed":false,"overall":64.6,"threshold":100,"below_threshold":[{"feature_id":"payments","coverage":37.5},{"feature_id":"sharing","coverage":50.0}]}

specleft features add / specleft features add-scenario

# Confirmation only: ~20 tokens
{"created":true,"feature_id":"user-auth","file":"features/user-auth.md"}

Do not echo back title, priority, description, or other fields the agent just sent.

specleft test skeleton

# Dry-run success: ~25 tokens
{"dry_run":true,"files_planned":3,"files":["tests/test_user_auth.py","tests/test_task_mgmt.py","tests/test_sharing.py"]}

# Write success: ~15 tokens
{"created":true,"files_written":3}

specleft init

# Success: ~40 tokens
{"success":true,"health":{"ok":true},"skill_file":".specleft/specleft_skill.md","skill_file_hash":"a1b2c3d4..."}

Only expand health detail on failure:

{"success":false,"health":{"ok":false,"python_version":"3.8.0","error":"Requires Python >= 3.10"}}

Impact: ~720 tokens saved over a session from reduced success response sizes.

Apply to: All commands.


4. --verbose flag on specleft status

Current: Status always returns full per-feature breakdown.

Change: Add --verbose flag for full output during implementation loops. Default status command will provide a summary

# Summary (default, for planning): ~30 tokens
{"features":3,"scenarios":12,"implemented":7,"skipped":5,"coverage_percent":58.3}

# Verbose (for full breakdown):
{"initialised":true,"summary":{"features":3,"scenarios":12,"implemented":7,"skipped":5,"coverage_percent":58.3},"by_priority":{...},"features":[...]}

Impact: ~170 tokens saved per status check during implementation. ~510 tokens over a session (3 mid-session checks).


5. Error responses with fix commands

When validation or enforcement fails, include the exact CLI command to fix the issue where possible.

{
  "valid": false,
  "errors": [
    {
      "file": "features/auth.md",
      "line": 12,
      "message": "Missing priority for scenario 'login-timeout'",
      "fix_command": "specleft features add-scenario --feature auth --title 'login-timeout' --priority medium"
    }
  ]
}

Rationale: Without fix_command, the agent reasons about how to fix the error (50-100 tokens of thinking, potential retry). With it, the agent executes directly. Net saving: ~30-80 tokens per error plus avoided incorrect fix attempts.

Apply to: features validate, enforce, contract test, doctor.


6. SPECLEFT_COMPACT environment variable

A single environment variable that enables all compact optimisations simultaneously.

export SPECLEFT_COMPACT=1

When set:

  • JSON output uses minimal separators (same as default non-TTY behaviour)
  • Success responses are minimal (omit empty arrays, zero counts)
  • status returns summary-only by default
  • next defaults to --limit 1

This allows the skill file to set the mode once at the top rather than passing optimisation flags to every command:

## Setup
export SPECLEFT_COMPACT=1
All commands below use compact output mode.

Implementation: Check os.environ.get("SPECLEFT_COMPACT") in the CLI output layer. Individual flags (e.g., --summary, --limit) still override when explicitly passed.

Impact: Zero per-command overhead. Single setup instruction in skill file. Ensures all optimisations are active without relying on the agent to remember flags.


7. Skill file updates for optimised workflow

Update the generated skill file to reflect all output optimisations:

## Setup
export SPECLEFT_COMPACT=1

## Workflow
1. specleft next --limit 1 → pick one scenario
2. Implement test logic
3. specleft features validate → exit code 0 = valid, only parse output on failure
4. pytest → run test
5. Repeat

## Quick checks
- Validation: check exit code first, parse JSON only on failure
- Coverage: specleft coverage --threshold 100 → exit code 0 = met
- Status: specleft status for progress during implementation

Teach the exit-code-first pattern: agents check $? before parsing JSON, saving ~25 tokens per successful validation call.


Out of scope (deferred)

Item Reason
Abbreviated key names (f instead of feature_id) Readability cost outweighs token savings
--since-hash differential status Implementation complexity, --summary covers the need
--after stateless pagination on next Quality-of-life, not token optimisation
Response deduplication (grouped next output) --limit 1 already eliminates duplication

Token impact summary

Optimisation Per-session savings
Compact JSON (no indentation) ~1,800 tokens
--limit 1 default in COMPACT mode ~2,160 tokens
--summary on status checks ~510 tokens
Minimal success responses ~720 tokens
Confirmation-only on writes ~480 tokens
TTY-aware JSON default (flag removal) ~165 tokens
Skill file reduction ~57 tokens
Total ~5,892 tokens

Baseline unoptimised CLI response cost: ~10,800 tokens/session.
Optimised CLI response cost: ~4,908 tokens/session.
Reduction: 55%.

Combined with MCP declaration overhead (~2,550) and one-time reads (~1,523), total SpecLeft footprint per 30-turn session: ~8,981 tokens.


Acceptance criteria

  • All commands auto-detect TTY and default to JSON for non-TTY stdout
  • All JSON output uses separators=(',', ':') with no indentation by default
  • --pretty flag available on all commands for indented JSON
  • --format table explicitly overrides auto-detection for human use
  • Success responses contain only actionable fields (no empty arrays, no echoed input)
  • specleft status returns summary object only
  • specleft status --verbose returns full status of the project
  • SPECLEFT_COMPACT=1 activates all compact defaults
  • Error responses include fix_command where a CLI fix is deterministic
  • Skill file references SPECLEFT_COMPACT setup and --limit 1 workflow
  • MCP resource responses use compact JSON serialisation

Metadata

Metadata

Assignees

No one assigned

    Type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions