CLI Output Optimisation for Agent Context Efficiency


## Problem

Every CLI response enters the agent's context window and persists for the conversation. In a typical 12-scenario session with ~36 SpecLeft CLI calls, unoptimised responses accumulate ~10,800 tokens of JSON output — more than double the MCP declaration overhead. CLI response size is the largest controllable token cost in SpecLeft's footprint.

## Goal

Make SpecLeft the most context-efficient developer tool for AI coding agents. Target: **49% total session token reduction** compared to current unoptimised output.

---

## Scope

### 1. TTY-aware default output format

**Current:** `--format table` is the default. Agents must pass `--format json` on every call.

**Change:** Auto-detect output format based on terminal attachment.

```python
if sys.stdout.isatty():
    default_format = "table"   # Human at terminal
else:
    default_format = "json"    # Agent, pipe, subprocess, CI
```

- JSON is automatic when called from agents or scripts
- Table is automatic when a human is at the terminal
- `--format` flag overrides auto-detection in either direction
- `--pretty` flag outputs indented JSON for human-readable debugging

**Impact:** Removes `--format json` from every agent command invocation (~165 tokens/session). Eliminates a class of agent errors (forgetting the flag, getting unparseable table output). Reduces skill file size by ~57 tokens.

**Apply to:** All commands.

---

### 2. Compact JSON output (no indentation)

**Current:** JSON output uses `json.dumps(result, indent=2)` (or similar).

**Change:** Default JSON output uses minimal separators, no indentation.

```python
json.dumps(result, separators=(',', ':'))
```

**Example:**

```bash
# Before: ~95 tokens
{
    "next": [
        {
            "feature_id": "user-auth",
            "scenario_id": "register-with-valid-email",
            "priority": "critical",
            "test_file": "tests/test_user_auth.py",
            "test_function": "test_register_with_valid_email"
        }
    ]
}

# After: ~45 tokens
{"next":[{"feature_id":"user-auth","scenario_id":"register-with-valid-email","priority":"critical","test_file":"tests/test_user_auth.py","test_function":"test_register_with_valid_email"}]}
```

- `--pretty` flag available for indented JSON when needed
- Key names remain full-length (no abbreviation — clarity outweighs token savings)

**Impact:** ~53% reduction per response. ~1,800 tokens saved over a 36-call session.

**Apply to:** All commands with `--format json`.

---

### 3. Minimal success responses

Responses should include only information the agent needs to act on. On success, confirm the action. On failure, include detail for remediation.

#### `specleft features validate`

```bash
# Success: ~5 tokens
{"valid":true}

# Failure: full detail
{"valid":false,"errors":[{"file":"features/auth.md","line":12,"message":"Missing priority for scenario 'login-timeout'","fix":"Add 'priority: medium' to scenario metadata"}]}
```

Omit `errors`, `warnings`, `features_checked`, `scenarios_checked` when empty/irrelevant.

#### `specleft coverage`

```bash
# Pass (with --threshold): ~15 tokens
{"passed":true,"overall":100.0,"threshold":100}

# Fail: include only features below threshold
{"passed":false,"overall":64.6,"threshold":100,"below_threshold":[{"feature_id":"payments","coverage":37.5},{"feature_id":"sharing","coverage":50.0}]}
```

#### `specleft features add` / `specleft features add-scenario`

```bash
# Confirmation only: ~20 tokens
{"created":true,"feature_id":"user-auth","file":"features/user-auth.md"}
```

Do not echo back title, priority, description, or other fields the agent just sent.

#### `specleft test skeleton`

```bash
# Dry-run success: ~25 tokens
{"dry_run":true,"files_planned":3,"files":["tests/test_user_auth.py","tests/test_task_mgmt.py","tests/test_sharing.py"]}

# Write success: ~15 tokens
{"created":true,"files_written":3}
```

#### `specleft init`

```bash
# Success: ~40 tokens
{"success":true,"health":{"ok":true},"skill_file":".specleft/specleft_skill.md","skill_file_hash":"a1b2c3d4..."}
```

Only expand `health` detail on failure:

```bash
{"success":false,"health":{"ok":false,"python_version":"3.8.0","error":"Requires Python >= 3.10"}}
```

**Impact:** ~720 tokens saved over a session from reduced success response sizes.

**Apply to:** All commands.

---

### 4. `--verbose` flag on `specleft status`

**Current:** Status always returns full per-feature breakdown.

**Change:** Add `--verbose` flag for full output during implementation loops. Default `status` command will provide a summary

```bash
# Summary (default, for planning): ~30 tokens
{"features":3,"scenarios":12,"implemented":7,"skipped":5,"coverage_percent":58.3}

# Verbose (for full breakdown):
{"initialised":true,"summary":{"features":3,"scenarios":12,"implemented":7,"skipped":5,"coverage_percent":58.3},"by_priority":{...},"features":[...]}

```

**Impact:** ~170 tokens saved per status check during implementation. ~510 tokens over a session (3 mid-session checks).

---

### 5. Error responses with fix commands

When validation or enforcement fails, include the exact CLI command to fix the issue where possible.

```json
{
  "valid": false,
  "errors": [
    {
      "file": "features/auth.md",
      "line": 12,
      "message": "Missing priority for scenario 'login-timeout'",
      "fix_command": "specleft features add-scenario --feature auth --title 'login-timeout' --priority medium"
    }
  ]
}
```

**Rationale:** Without `fix_command`, the agent reasons about how to fix the error (50-100 tokens of thinking, potential retry). With it, the agent executes directly. Net saving: ~30-80 tokens per error plus avoided incorrect fix attempts.

**Apply to:** `features validate`, `enforce`, `contract test`, `doctor`.

---

### 6. `SPECLEFT_COMPACT` environment variable

A single environment variable that enables all compact optimisations simultaneously.

```bash
export SPECLEFT_COMPACT=1
```

When set:
- JSON output uses minimal separators (same as default non-TTY behaviour)
- Success responses are minimal (omit empty arrays, zero counts)
- `status` returns summary-only by default
- `next` defaults to `--limit 1`

This allows the skill file to set the mode once at the top rather than passing optimisation flags to every command:

```markdown
## Setup
export SPECLEFT_COMPACT=1
All commands below use compact output mode.
```

**Implementation:** Check `os.environ.get("SPECLEFT_COMPACT")` in the CLI output layer. Individual flags (e.g., `--summary`, `--limit`) still override when explicitly passed.

**Impact:** Zero per-command overhead. Single setup instruction in skill file. Ensures all optimisations are active without relying on the agent to remember flags.

---

### 7. Skill file updates for optimised workflow

Update the generated skill file to reflect all output optimisations:

```markdown
## Setup
export SPECLEFT_COMPACT=1

## Workflow
1. specleft next --limit 1 → pick one scenario
2. Implement test logic
3. specleft features validate → exit code 0 = valid, only parse output on failure
4. pytest → run test
5. Repeat

## Quick checks
- Validation: check exit code first, parse JSON only on failure
- Coverage: specleft coverage --threshold 100 → exit code 0 = met
- Status: specleft status for progress during implementation
```

Teach the exit-code-first pattern: agents check `$?` before parsing JSON, saving ~25 tokens per successful validation call.

---

## Out of scope (deferred)

| Item | Reason |
|---|---|
| Abbreviated key names (`f` instead of `feature_id`) | Readability cost outweighs token savings |
| `--since-hash` differential status | Implementation complexity, `--summary` covers the need |
| `--after` stateless pagination on `next` | Quality-of-life, not token optimisation |
| Response deduplication (grouped `next` output) | `--limit 1` already eliminates duplication |

---

## Token impact summary

| Optimisation | Per-session savings |
|---|---|
| Compact JSON (no indentation) | ~1,800 tokens |
| `--limit 1` default in COMPACT mode | ~2,160 tokens |
| `--summary` on status checks | ~510 tokens |
| Minimal success responses | ~720 tokens |
| Confirmation-only on writes | ~480 tokens |
| TTY-aware JSON default (flag removal) | ~165 tokens |
| Skill file reduction | ~57 tokens |
| **Total** | **~5,892 tokens** |

Baseline unoptimised CLI response cost: ~10,800 tokens/session.
Optimised CLI response cost: ~4,908 tokens/session.
**Reduction: 55%.**

Combined with MCP declaration overhead (~2,550) and one-time reads (~1,523), total SpecLeft footprint per 30-turn session: **~8,981 tokens**.

---

## Acceptance criteria

- [x] All commands auto-detect TTY and default to JSON for non-TTY stdout
- [x] All JSON output uses `separators=(',', ':')` with no indentation by default
- [x] `--pretty` flag available on all commands for indented JSON
- [x] `--format table` explicitly overrides auto-detection for human use
- [x] Success responses contain only actionable fields (no empty arrays, no echoed input)
- [x] `specleft status` returns summary object only
- [x] `specleft status --verbose` returns full status of the project
- [x] `SPECLEFT_COMPACT=1` activates all compact defaults
- [x] Error responses include `fix_command` where a CLI fix is deterministic
- [x] Skill file references `SPECLEFT_COMPACT` setup and `--limit 1` workflow
- [x] MCP resource responses use compact JSON serialisation

Item	Reason
Abbreviated key names (`f` instead of `feature_id`)	Readability cost outweighs token savings
`--since-hash` differential status	Implementation complexity, `--summary` covers the need
`--after` stateless pagination on `next`	Quality-of-life, not token optimisation
Response deduplication (grouped `next` output)	`--limit 1` already eliminates duplication

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CLI Output Optimisation for Agent Context Efficiency #94

Problem

Goal

Scope

1. TTY-aware default output format

2. Compact JSON output (no indentation)

3. Minimal success responses

`specleft features validate`

`specleft coverage`

`specleft features add` / `specleft features add-scenario`

`specleft test skeleton`

`specleft init`

4. `--verbose` flag on `specleft status`

5. Error responses with fix commands

6. `SPECLEFT_COMPACT` environment variable

7. Skill file updates for optimised workflow

Out of scope (deferred)

Token impact summary

Acceptance criteria

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Optimisation	Per-session savings
Compact JSON (no indentation)	~1,800 tokens
`--limit 1` default in COMPACT mode	~2,160 tokens
`--summary` on status checks	~510 tokens
Minimal success responses	~720 tokens
Confirmation-only on writes	~480 tokens
TTY-aware JSON default (flag removal)	~165 tokens
Skill file reduction	~57 tokens
Total	~5,892 tokens

CLI Output Optimisation for Agent Context Efficiency #94

Description

Problem

Goal

Scope

1. TTY-aware default output format

2. Compact JSON output (no indentation)

3. Minimal success responses

specleft features validate

specleft coverage

specleft features add / specleft features add-scenario

specleft test skeleton

specleft init

4. --verbose flag on specleft status

5. Error responses with fix commands

6. SPECLEFT_COMPACT environment variable

7. Skill file updates for optimised workflow

Out of scope (deferred)

Token impact summary

Acceptance criteria

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

`specleft features validate`

`specleft coverage`

`specleft features add` / `specleft features add-scenario`

`specleft test skeleton`

`specleft init`

4. `--verbose` flag on `specleft status`

6. `SPECLEFT_COMPACT` environment variable