Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion CLAUDE.md
Original file line number Diff line number Diff line change
Expand Up @@ -40,7 +40,7 @@ Commands with Teams Variant ship as `{name}.md` (parallel subagents) and `{name}

**Working Memory**: Four shell-script hooks (`scripts/hooks/`) provide automatic session continuity. Toggleable via `devflow memory --enable/--disable/--status` or `devflow init --memory/--no-memory`. UserPromptSubmit (`prompt-capture-memory`) captures user prompt to `.memory/.pending-turns.jsonl` queue. Stop hook captures `assistant_message` (on `end_turn` only) to same queue, then spawns throttled background `claude -p --model haiku` updater (skips if triggered <2min ago; concurrent sessions serialize via mkdir-based lock). Background updater uses `mv`-based atomic handoff to process all pending turns in batch (capped at 10 most recent), with crash recovery via `.pending-turns.processing` file. Updates `.memory/WORKING-MEMORY.md` with structured sections (`## Now`, `## Progress`, `## Decisions`, `## Modified Files`, `## Context`, `## Session Log`). SessionStart hook → injects previous memory + git state as `additionalContext` on `/clear`, startup, or compact (warns if >1h stale; injects pre-compact memory snapshot when compaction happened mid-session). PreCompact hook → saves git state + WORKING-MEMORY.md snapshot + bootstraps minimal WORKING-MEMORY.md if none exists. Disabling memory removes all four hooks. Use `devflow memory --clear` to clean up pending queue files across projects. Zero-ceremony context preservation.

**Ambient Mode**: Three-layer architecture for always-on intent classification. SessionStart hook (`session-start-classification`) reads lean classification rules (`~/.claude/skills/devflow:router/references/classification-rules.md`, ~30 lines) and injects as `additionalContext` — once per session, deterministic, zero model overhead. UserPromptSubmit hook (`preamble`) injects a one-sentence prompt per message triggering classification + router loading via Skill tool. Router SKILL.md is a pure skill lookup table (~50 lines) loaded on-demand only for GUIDED/ORCHESTRATED depth — maps intent×depth to domain and orchestration skills. Toggleable via `devflow ambient --enable/--disable/--status` or `devflow init`.
**Ambient Mode**: Three-layer architecture for always-on intent classification. SessionStart hook (`session-start-classification`) reads lean classification rules (`~/.claude/skills/devflow:router/classification-rules.md`, ~30 lines) and injects as `additionalContext` — once per session, deterministic, zero model overhead. UserPromptSubmit hook (`preamble`) injects a one-sentence prompt per message triggering classification + conditional router loading via Skill tool. Router SKILL.md is a pure skill lookup table (~50 lines) loaded on-demand only for GUIDED/ORCHESTRATED depth — maps intent×depth to domain and orchestration skills. Toggleable via `devflow ambient --enable/--disable/--status` or `devflow init`.

**Self-Learning**: A SessionEnd hook (`session-end-learning`) accumulates session IDs and triggers a background `claude -p --model sonnet` every 3 sessions (5 at 15+ observations) to detect **4 observation types** — workflow, procedural, decision, and pitfall — from batch transcripts. Transcript content is split into two channels by `scripts/hooks/lib/transcript-filter.cjs`: `USER_SIGNALS` (plain user messages, feeds workflow/procedural detection) and `DIALOG_PAIRS` (prior-assistant + user turns, feeds decision/pitfall detection). Detection uses per-type linguistic markers and quality gates stored in each observation as `quality_ok`. Per-type thresholds govern promotion (workflow: 3 required; procedural: 4 required; decision/pitfall: 2 required), each with independent temporal spread requirements. Observations accumulate in `.memory/learning-log.jsonl`; their lifecycle is `observing → ready → created → deprecated`. When thresholds are met, `json-helper.cjs render-ready` renders deterministically to 4 targets: slash commands (`.claude/commands/self-learning/`), skills (`.claude/skills/{slug}/`), decisions.md ADR entries, and pitfalls.md PF entries. A session-start feedback reconciler (`json-helper.cjs reconcile-manifest`) checks the manifest at `.memory/.learning-manifest.json` against the filesystem to detect deletions (applies 0.3× confidence penalty) and edits (ignored per D13). The reconciler also **self-heals** from render-ready crash-window states: when a knowledge file contains an ADR/PF anchor that is absent from the manifest *and* the section carries the `- **Source**: self-learning:` marker, the heal scans the log for `status: 'ready'` observations matching by normalized pattern (exactly one match = upgrade to `status: 'created'` and reconstruct manifest entry; zero or multiple matches = silently skipped). The marker check excludes pre-v2 seeded entries from the heal path so they cannot be falsely paired with a current ready obs. Loaded artifacts are reinforced locally (no LLM) on each session end. Single toggle mechanism: hook presence in `settings.json` IS the enabled state — no `enabled` field in `learning.json`. Toggleable via `devflow learn --enable/--disable/--status` or `devflow init --learn/--no-learn`. Configurable model/throttle/caps/debug via `devflow learn --configure`. Use `devflow learn --reset` to remove all artifacts + log + transient state. Use `devflow learn --purge` to remove invalid observations. Use `devflow learn --review` to inspect observations needing attention. Debug logs stored at `~/.devflow/logs/{project-slug}/`. The `knowledge-persistence` skill is a format specification only; the actual writer is `scripts/hooks/background-learning` via `json-helper.cjs render-ready`.

Expand Down
2 changes: 1 addition & 1 deletion scripts/hooks/preamble
Original file line number Diff line number Diff line change
Expand Up @@ -34,6 +34,6 @@ fi

# Minimal preamble — classification rules injected at SessionStart, not here.
# SYNC: must match tests/ambient.test.ts preamble drift detection
PREAMBLE="Classify this request's intent and depth, then load devflow:router via Skill tool."
PREAMBLE="Classify this request's intent and depth. If GUIDED or ORCHESTRATED, load devflow:router via Skill tool."

json_prompt_output "$PREAMBLE"
5 changes: 1 addition & 4 deletions scripts/hooks/session-start-classification
Original file line number Diff line number Diff line change
Expand Up @@ -15,12 +15,9 @@ INPUT=$(cat)
CWD=$(printf '%s' "$INPUT" | json_field "cwd" "")
if [ -z "$CWD" ]; then exit 0; fi

CLASSIFICATION_RULES="$HOME/.claude/skills/devflow:router/references/classification-rules.md"
CLASSIFICATION_RULES="$HOME/.claude/skills/devflow:router/classification-rules.md"
if [ -f "$CLASSIFICATION_RULES" ]; then
CONTEXT=$(cat "$CLASSIFICATION_RULES")
elif [ -f "$HOME/.claude/skills/devflow:router/SKILL.md" ]; then
# Fallback for upgrade window: old install without classification-rules.md
CONTEXT=$(awk '/^---$/{n++; next} n>=2' "$HOME/.claude/skills/devflow:router/SKILL.md")
else
exit 0
fi
Expand Down
70 changes: 63 additions & 7 deletions tests/ambient.test.ts
Original file line number Diff line number Diff line change
Expand Up @@ -7,8 +7,8 @@ import {
hasClassification,
extractIntent,
extractDepth,
hasDevFlowBranding,
hasSkillInvocations,
parseStreamEvent,
} from './integration/helpers.js';

/** Helper to create a StreamResult from text for unit-testing classification helpers. */
Expand Down Expand Up @@ -421,12 +421,68 @@ describe('classification helpers', () => {
expect(extractDepth(textResult('no classification here'))).toBeNull();
});

it('detects Devflow branding', () => {
expect(hasDevFlowBranding(textResult('Devflow: IMPLEMENT/GUIDED. Loading: devflow:patterns.'))).toBe(true);
it('CLASSIFICATION_PATTERN matches model output variations', () => {
// Canonical format (model instruction says "Devflow: INTENT/DEPTH")
expect(hasClassification(textResult('Devflow: IMPLEMENT/GUIDED'))).toBe(true);
// Lowercase (model might vary casing)
expect(hasClassification(textResult('devflow: implement/guided'))).toBe(true);
// No space after colon
expect(hasClassification(textResult('Devflow:CHAT/QUICK'))).toBe(true);
// Extra whitespace
expect(hasClassification(textResult('Devflow: PLAN / ORCHESTRATED'))).toBe(true);
});
});

it('returns false for non-Devflow branding', () => {
expect(hasDevFlowBranding(textResult('Some random text without branding.'))).toBe(false);
describe('parseStreamEvent', () => {
it('extracts skills from assistant tool_use events', () => {
const event = {
type: 'assistant',
message: { content: [
{ type: 'tool_use', name: 'Skill', input: { skill: 'devflow:router' } },
] },
};
const parsed = parseStreamEvent(event);
expect(parsed.skills).toEqual(['devflow:router']);
expect(parsed.textFragments).toEqual([]);
});

it('extracts text from assistant text blocks', () => {
const event = {
type: 'assistant',
message: { content: [
{ type: 'text', text: 'Devflow: IMPLEMENT/GUIDED' },
] },
};
const parsed = parseStreamEvent(event);
expect(parsed.textFragments).toEqual(['Devflow: IMPLEMENT/GUIDED']);
expect(parsed.skills).toEqual([]);
});

it('returns empty arrays for non-assistant events', () => {
expect(parseStreamEvent({ type: 'user', message: { content: [] } })).toEqual({ skills: [], textFragments: [] });
expect(parseStreamEvent({ type: 'system', message: { content: [] } })).toEqual({ skills: [], textFragments: [] });
});

it('returns empty arrays for malformed events', () => {
expect(parseStreamEvent(null)).toEqual({ skills: [], textFragments: [] });
expect(parseStreamEvent(undefined)).toEqual({ skills: [], textFragments: [] });
expect(parseStreamEvent({})).toEqual({ skills: [], textFragments: [] });
expect(parseStreamEvent({ type: 'assistant' })).toEqual({ skills: [], textFragments: [] });
expect(parseStreamEvent({ type: 'assistant', message: {} })).toEqual({ skills: [], textFragments: [] });
});

it('handles mixed content blocks', () => {
const event = {
type: 'assistant',
message: { content: [
{ type: 'text', text: 'Devflow: DEBUG/ORCHESTRATED' },
{ type: 'tool_use', name: 'Skill', input: { skill: 'devflow:debug:orch' } },
{ type: 'text', text: 'Loading debug orchestrator.' },
] },
};
const parsed = parseStreamEvent(event);
expect(parsed.skills).toEqual(['devflow:debug:orch']);
expect(parsed.textFragments).toEqual(['Devflow: DEBUG/ORCHESTRATED', 'Loading debug orchestrator.']);
});
});

Expand Down Expand Up @@ -489,7 +545,7 @@ function parseClassificationIntents(content: string): string[] {

describe('router structural validation', () => {
const routerPath = path.resolve(__dirname, '../shared/skills/router/SKILL.md');
const rulesPath = path.resolve(__dirname, '../shared/skills/router/references/classification-rules.md');
const rulesPath = path.resolve(__dirname, '../shared/skills/router/classification-rules.md');
const sharedSkillsDir = path.resolve(__dirname, '../shared/skills');

it('router covers all ORCHESTRATED intents (every non-CHAT intent has a row)', async () => {
Expand Down Expand Up @@ -594,7 +650,7 @@ describe('preamble drift detection', () => {
});

it('classification-rules.md contains required classification elements', async () => {
const rulesPath = path.resolve(__dirname, '../shared/skills/router/references/classification-rules.md');
const rulesPath = path.resolve(__dirname, '../shared/skills/router/classification-rules.md');
const rulesContent = await fs.readFile(rulesPath, 'utf-8');

// Must contain Intent Signals heading
Expand Down
5 changes: 5 additions & 0 deletions tests/integration/ambient-activation.test.ts
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@ import {
runClaudeStreaming,
runClaudeStreamingWithRetry,
hasSkillInvocations,
hasClassification,
getSkillInvocations,
hasRequiredSkills,
} from './helpers.js';
Expand Down Expand Up @@ -34,26 +35,30 @@ describe.skipIf(!isClaudeAvailable())('devflow classification', () => {
// "thanks" is ≤2 words — preamble's word-count filter skips it before classification runs
const result = await runClaudeStreaming('thanks', { timeout: 20000 });
expect(hasSkillInvocations(result)).toBe(false);
expect(hasClassification(result)).toBe(false);
console.log(`preamble filter (single-word): no skills (${result.durationMs}ms)`);
});

it('QUICK — explore: "where is the config?" loads no skills', async () => {
const result = await runClaudeStreaming('where is the config file?', { timeout: 20000 });
expect(hasSkillInvocations(result)).toBe(false);
expect(hasClassification(result)).toBe(false);
console.log(`QUICK explore: no skills (${result.durationMs}ms)`);
});

it('CHAT/QUICK — multi-word chat passes preamble but classified QUICK', async () => {
// Passes preamble's word-count filter (>2 words) but classified CHAT/QUICK — no skills loaded
const result = await runClaudeStreaming('sounds good, thanks for explaining that', { timeout: 20000 });
expect(hasSkillInvocations(result)).toBe(false);
expect(hasClassification(result)).toBe(false);
console.log(`CHAT/QUICK (multi-word): no skills (${result.durationMs}ms)`);
});

it('preamble filter — slash command prefix skipped before classification', async () => {
// Preamble filters prompts starting with "/" — no classification or skill loading
const result = await runClaudeStreaming('/help with something', { timeout: 20000 });
expect(hasSkillInvocations(result)).toBe(false);
expect(hasClassification(result)).toBe(false);
console.log(`preamble filter (slash command): no skills (${result.durationMs}ms)`);
});

Expand Down
98 changes: 53 additions & 45 deletions tests/integration/helpers.ts
Original file line number Diff line number Diff line change
Expand Up @@ -21,13 +21,48 @@ export function isClaudeAvailable(): boolean {
* Simulates SessionStart injection for integration tests.
*/
function loadRouterContext(): string {
const rulesPath = resolve(import.meta.dirname, '../../shared/skills/router/references/classification-rules.md');
const rulesPath = resolve(import.meta.dirname, '../../shared/skills/router/classification-rules.md');
return readFileSync(rulesPath, 'utf-8').trim();
}

// Simulates SessionStart injection (classification rules) + per-message preamble
const DEVFLOW_PREAMBLE = loadRouterContext() +
'\nClassify this request\'s intent and depth, then load devflow:router via Skill tool.';
'\nClassify this request\'s intent and depth. If GUIDED or ORCHESTRATED, load devflow:router via Skill tool.';

/** Parsed fields from a single streaming event */
export interface ParsedStreamEvent {
skills: string[];
textFragments: string[];
}

/**
* Extract skill invocations and text fragments from a single streaming event.
* Only processes assistant messages with content arrays.
*/
export function parseStreamEvent(event: unknown): ParsedStreamEvent {
const skills: string[] = [];
const textFragments: string[] = [];

if (
typeof event !== 'object' || event === null ||
(event as Record<string, unknown>).type !== 'assistant' ||
!Array.isArray((event as { message?: { content?: unknown } }).message?.content)
) {
return { skills, textFragments };
}

const msg = event as { type: string; message: { content: Record<string, unknown>[] } };
for (const block of msg.message.content) {
if (block.type === 'tool_use' && block.name === 'Skill' && typeof (block.input as Record<string, unknown>)?.skill === 'string') {
skills.push((block.input as Record<string, unknown>).skill as string);
}
if (block.type === 'text' && typeof block.text === 'string') {
textFragments.push(block.text as string);
}
}

return { skills, textFragments };
}

/** Result from a streaming claude invocation */
export interface StreamResult {
Expand Down Expand Up @@ -106,33 +141,14 @@ export function runClaudeStreaming(
if (!line.trim()) continue;
try {
const event: unknown = JSON.parse(line);
const parsed = parseStreamEvent(event);
skills.push(...parsed.skills);
textFragments.push(...parsed.textFragments);

// Detect Skill tool_use in assistant messages
if (
typeof event === 'object' && event !== null &&
(event as Record<string, unknown>).type === 'assistant' &&
Array.isArray((event as Record<string, unknown>).message?.content)
) {
const msg = event as { type: string; message: { content: Record<string, unknown>[] } };
for (const block of msg.message.content) {
// tool_use block for Skill
if (block.type === 'tool_use' && block.name === 'Skill' && typeof (block.input as Record<string, unknown>)?.skill === 'string') {
skills.push((block.input as Record<string, unknown>).skill as string);
}
// text block — capture for classification detection
if (block.type === 'text' && typeof block.text === 'string') {
textFragments.push(block.text);
}
}

// Once we have skills, give a brief window for more, then finish
if (skills.length > 0 && !graceTimer) {
graceTimer = setTimeout(() => {
finish(true);
}, 8000); // 8s grace for additional skill loads after first detection
}
// Once we have skills, give a brief window for more, then finish
if (skills.length > 0 && !graceTimer) {
graceTimer = setTimeout(() => finish(true), 8000);
}

} catch {
// Partial JSON line, skip
}
Expand Down Expand Up @@ -210,19 +226,6 @@ export function extractDepth(result: StreamResult): string | null {
return match ? match[2].toUpperCase() : null;
}

/**
* Check whether the result contains a Devflow classification tag.
*
* @see hasClassification — functionally identical after both helpers were
* unified on {@link CLASSIFICATION_PATTERN}. Kept as a distinct export so
* existing test assertions that describe "branding presence" (vs. "a
* classification exists") remain self-documenting at the call site.
*/
export function hasDevFlowBranding(result: StreamResult): boolean {
const text = result.textFragments.join(' ');
return CLASSIFICATION_PATTERN.test(text);
}

/**
* Check if required skills are present in the result.
* Uses bounded matching: exact match, namespace-suffixed, or devflow-prefixed.
Expand Down Expand Up @@ -360,13 +363,18 @@ function parsePreloadedSkills(transcriptPath: string): string[] {
}

/**
* Find the most recent subagent transcript written at or after `since` and
* return the preloaded skill names from its initial user message.
* Find all subagent transcripts written at or after `since` and return the
* preloaded skill names from each transcript's initial user message.
*
* Returns one string[] per transcript. The caller can assert that at least one
* transcript contains the expected skills — this avoids a race condition where
* Claude spawns auxiliary subagents (e.g., Git) alongside the target agent,
* and the auxiliary transcript has a later mtime.
*
* Returns an empty array if no transcript is found or the directory structure
* Returns an empty array if no transcripts are found or the directory structure
* has changed (graceful degradation).
*/
export function getLatestSubagentPreloadedSkills(since: Date): string[] {
export function getAllSubagentPreloadedSkills(since: Date): string[][] {
const homeDir = process.env.HOME ?? process.env.USERPROFILE ?? '';
const cwd = process.cwd();
// Claude Code encodes the project path by replacing / with -
Expand All @@ -379,7 +387,7 @@ export function getLatestSubagentPreloadedSkills(since: Date): string[] {

// Most recent transcript first
transcripts.sort((a, b) => b.mtime.getTime() - a.mtime.getTime());
return parsePreloadedSkills(transcripts[0].path);
return transcripts.map((t) => parsePreloadedSkills(t.path));
} catch {
// Project dir doesn't exist or structure changed — return empty gracefully
return [];
Expand Down
Loading
Loading