Skip to content

Add inline /wizard command for Auto Run document creation#112

Merged
pedramamini merged 58 commits intomainfrom
autorun-wizard
Jan 5, 2026
Merged

Add inline /wizard command for Auto Run document creation#112
pedramamini merged 58 commits intomainfrom
autorun-wizard

Conversation

@pedramamini
Copy link
Collaborator

Summary

Adds a new /wizard slash command that enables users to create or iterate on Auto Run documents directly within their existing AI session. This provides a faster, more contextual alternative to the full-screen onboarding wizard.

Key Features

  • Two wizard modes:

    • New mode - Create Auto Run documents from scratch through conversational discovery
    • Iterate mode - Add tasks to or modify existing Auto Run documents
  • Natural language intent parsing - The wizard intelligently detects user intent from command arguments:

    • /wizard alone prompts for mode selection (if docs exist)
    • /wizard add auth feature → iterate mode with goal
    • /wizard start fresh → new mode
  • Conversational interface with confidence gauge - AI asks clarifying questions and builds confidence (0-100%). "Let's Go" button appears at 80%+ confidence.

  • Streaming document generation - Watch documents generate in real-time with live preview

  • Document editing before save - Review and edit generated documents before finalizing

  • Iterate mode merging - Intelligently merges new tasks into existing phase documents

Components Added

  • WizardPill - Mode indicator pill (NEW/ITERATE)
  • WizardConfidenceGauge - Visual confidence progress indicator
  • WizardInputPanel - Custom input panel during wizard mode
  • WizardModePrompt - Modal for choosing between new/iterate modes
  • WizardMessageBubble - Chat bubble for wizard conversations
  • WizardConversationView - Full conversation view for wizard mode
  • WizardExitConfirmDialog - Confirmation when exiting wizard
  • DocumentGenerationView - Document generation and editing view
  • StreamingDocumentPreview - Live preview during generation
  • GenerationCompleteOverlay - Completion celebration with confetti

Services Added

  • wizardIntentParser - Parses natural language to determine new vs iterate mode
  • existingDocsDetector - Detects existing Auto Run documents in project
  • inlineWizardConversation - Manages AI conversation for inline wizard
  • Inline wizard prompts (wizard-inline-*.md) for session-aware planning

Hooks & Context

  • useInlineWizard - Hook for managing inline wizard state
  • InlineWizardContext - Cross-component wizard state provider

Test Plan

  • Unit tests for wizard intent parser (all intent patterns covered)
  • Unit tests for useInlineWizard hook (state management, mode selection)
  • Integration tests for full wizard flow (new mode, iterate mode, error handling)
  • Manual testing of /wizard command in AI mode
  • Manual testing of document generation and editing
  • Manual testing of iterate mode document merging

@claude
Copy link

claude bot commented Dec 28, 2025

Code Review - PR #112: Inline Wizard Feature

Completed thorough review. Overall: excellent work with comprehensive test coverage and clean architecture.

Strengths

Architecture & Code Quality

  • Clean separation of concerns (service layer vs UI)
  • Stateless service functions integrate well with React hooks
  • Comprehensive TypeScript typing with clear documentation
  • useInlineWizard hook provides clean API
  • InlineWizardContext eliminates prop drilling

Security

  • Path traversal prevention: sanitizeFilename() properly strips dangerous characters
  • No dangerous innerHTML/outerHTML usage
  • No eval or dynamic code execution
  • All template substitution uses safe string replacement

Testing

  • Excellent coverage: 68 tests for intent parser, 25+ for hook, 31 for context
  • Integration tests: InlineWizardFlow.test.tsx (983 lines) covers end-to-end scenarios
  • Edge cases covered: timeouts, errors, mode transitions, UI restoration

User Experience

  • Natural language intent parsing
  • Color-coded confidence gauge (red→yellow→green)
  • Streaming document preview with real-time feedback
  • Exit confirmation prevents data loss
  • UI state restoration works properly

Potential Issues

1. Race Condition in startWizard
Location: useInlineWizard.ts:startWizard()

  • Async function with multiple state updates
  • Could have race conditions if called rapidly (double-click)
  • Recommendation: Add guard to prevent concurrent calls

2. Large Document Handling
Location: inlineWizardDocumentGeneration.ts:splitIntoPhases()

  • No maximum total size limit?
  • Could malicious/broken AI generate gigabytes?
  • Recommendation: Add upper bound (e.g., 10MB)

3. Error Recovery UX

  • What happens to conversation history after retry?
  • Recommendation: Document retry behavior, preserve context

Minor Suggestions

Template Variables: Uses manual .replace() chains vs centralized substituteTemplateVariables. Consider consistency.

Accessibility: Good escape handling & focus management. Consider adding ARIA labels for icon-only buttons, live regions for confidence updates.

Memory Management: Verify process cleanup on unmount, refresh, session switch.

Test Plan

Completed:

  • Unit tests for intent parser ✓
  • Unit tests for useInlineWizard ✓
  • Integration tests ✓

TODO before merge:

  • Manual test /wizard command
  • Manual test document generation/editing
  • Manual test iterate mode merging (critical!)
  • Cross-platform testing (Windows paths)

Documentation

Excellent JSDoc and PR description. Minor gap: Update CLAUDE.md with inline wizard docs.

Security Summary

Secure ✓ - No critical issues. Good practices observed throughout.

Recommendation

Approve with minor suggestions

High-quality, production-ready code. Before merge:

  1. Add startWizard race condition guard
  2. Consider max size limit for documents
  3. Complete manual testing
  4. Update CLAUDE.md

Nice-to-haves (follow-up PRs):

  • Enhanced accessibility
  • Error recovery suggestions
  • Performance monitoring

Great work! Exemplary code quality and testing rigor. 🎉

@claude
Copy link

claude bot commented Dec 29, 2025

Code Review: PR #112 - Inline /wizard Command

Executive Summary

This PR adds a substantial new feature (~13,500 lines of code and tests) enabling inline Auto Run document creation. The code demonstrates excellent test coverage and follows project conventions well. However, there are critical security vulnerabilities, potential memory leaks, and race conditions that must be addressed before merge.


Critical Issues (Must Fix Before Merge)

1. Path Traversal Vulnerability - HIGH PRIORITY

Location: src/renderer/services/inlineWizardDocumentGeneration.ts:104-118

The current sanitizeFilename() function can be bypassed with inputs like ....// which becomes ../ after the first replace.

Attack vector: Input like ....//....//etc/passwd becomes ../../etc/passwd after first replace

Recommended fix: Iteratively remove .. sequences until none remain, and enforce length limits (255 chars max).

Also check: src/main/ipc/handlers/autorun.ts:204,244,289,370 - similar vulnerable patterns


2. Process Leak from Timeout Race Condition - HIGH PRIORITY

Location: src/renderer/services/inlineWizardConversation.ts:564-600

If the exit callback resolves before timeout fires, the spawned process may become orphaned. Add an isResolved guard flag and ensure the process is killed on timeout.

Same issue in: src/renderer/services/inlineWizardDocumentGeneration.ts:508-517


3. State Race Condition in sendMessage - MEDIUM PRIORITY

Location: src/renderer/hooks/useInlineWizard.ts:494-528

The function updates state via setState() but then reads from the stale closed-over state variable at line 528. This means the AI receives incomplete conversation history.

Recommended fix: Use refs to track latest state or restructure to avoid closure issues.


4. Unbounded Memory Growth - MEDIUM PRIORITY

Location: src/renderer/services/inlineWizardConversation.ts:588-592

Output buffer grows indefinitely without size limits. Add a MAX_OUTPUT_SIZE constant (10MB suggested) and abort if exceeded.

Same issue in: src/renderer/services/inlineWizardDocumentGeneration.ts:531-537


5. Debug Logging with Sensitive Data - MEDIUM PRIORITY

Location: src/main/ipc/handlers/autorun.ts:237-242

Document content may contain secrets or PII that should not be logged. Remove or gate behind process.env.DEBUG_AUTO_RUN flag.


Important Issues (Fix Soon After)

6. Event Listener Cleanup Edge Case

Location: src/renderer/services/inlineWizardConversation.ts:587-655

If spawn() throws synchronously before listener assignments complete, cleanup may fail. Add try-catch blocks to cleanup functions.


7. Missing Input Validation

Location: src/renderer/hooks/useInlineWizard.ts:354-465

No validation of startWizard() parameters (projectPath, sessionName, agentType). Add validation for:

  • projectPath must be absolute
  • sessionName length <= 100 chars
  • agentType must be available

8. Data URL Injection Risk

Location: src/renderer/components/InlineWizard/DocumentGenerationView.tsx:256-287

Data URLs from filesystem are inserted without validation - SVG data URLs could contain JavaScript. Validate image types against allowlist.


Code Quality Improvements

9. Duplicated Code

extractResultFromStreamJson() appears identically in inlineWizardConversation.ts:422-487 and inlineWizardDocumentGeneration.ts:328-393

Recommendation: Extract to src/renderer/utils/agentOutputParser.ts


10. Auto-save Effect Dependencies

Location: src/renderer/components/InlineWizard/DocumentGenerationView.tsx:847-876

The effect includes onContentChange and currentDoc which change frequently. Extract stable values (currentDoc.filename) to dependencies instead.


11. Type Safety Issues

Multiple uses of any type defeat TypeScript benefits. Replace with proper interfaces.

Example: src/renderer/services/inlineWizardConversation.ts:491


Test Coverage Gaps

Excellent Coverage:

  • Comprehensive integration tests (983 lines)
  • Edge case coverage for intent parsing
  • Proper mocking

Missing Tests:

  • Path traversal sanitization security tests
  • Timeout/cleanup behavior tests
  • Race condition tests
  • File I/O error handling tests
  • Performance/stress tests for large documents

Positive Aspects

  1. Excellent test coverage - 8,900+ lines of tests covering user flows
  2. Well-documented - Clear JSDoc comments explaining complex logic
  3. Follows project conventions - Consistent with existing wizard patterns
  4. Great UX - Confidence gauge, streaming preview, intelligent intent parsing
  5. Proper separation - Services, hooks, components well-organized
  6. Accessibility - Screen reader support, keyboard navigation

Summary

Critical Issues: 5
Important Issues: 3
Code Quality: 3
Test Gaps: 5

Overall Assessment: This is a well-implemented feature with excellent test coverage, but it has critical security and resource management issues that must be resolved before merge.

Recommended Action: Request changes to address the 5 critical issues (path traversal, process leaks, race conditions, unbounded memory, debug logging) before approval.

Once these are fixed, this will be a solid addition to Maestro!

@claude
Copy link

claude bot commented Dec 30, 2025

Pull Request Review: Inline Wizard for Auto Run Documents

Summary

This PR adds a comprehensive inline wizard system that enables users to create or iterate on Auto Run documents directly within their AI session using the /wizard command. This is an excellent addition that provides a more contextual alternative to the full-screen onboarding wizard.

Code Quality: ⭐⭐⭐⭐⭐ (Excellent)

Strengths

1. Excellent Architecture & Design Patterns

  • Separation of Concerns: The implementation follows a clean architecture with well-separated services, hooks, and components
    • useInlineWizard hook manages state
    • inlineWizardConversation.ts handles AI interactions
    • inlineWizardDocumentGeneration.ts manages document generation
    • wizardIntentParser.ts handles natural language parsing
  • Context Provider Pattern: Properly implements React context (InlineWizardContext) to avoid prop drilling
  • Stateless Services: Unlike the onboarding wizard's singleton pattern, this uses stateless functions that work with hook state - a more modern and testable approach

2. Security - Well Implemented

  • Path Traversal Prevention: The sanitizeFilename function (lines 104-118 in inlineWizardDocumentGeneration.ts) properly sanitizes filenames:
    • Removes path separators (/, \\)
    • Removes directory traversal sequences (..)
    • Removes null bytes and control characters
    • Removes leading dots
    • This prevents malicious AI-generated filenames from escaping the Auto Run directory
  • Input Validation: Proper validation of user input throughout the wizard flow
  • Template Variable Substitution: Uses the existing substituteTemplateVariables utility to safely handle variable replacement

3. Exceptional Test Coverage

  • 983 lines of integration tests covering the full wizard flow
  • 10 test files dedicated to inline wizard functionality
  • Test files cover:
    • Intent parsing (237 tests)
    • Hook state management (917 tests)
    • All UI components (WizardConfidenceGauge, WizardConversationView, WizardInputPanel, etc.)
    • Document generation service (360 tests)
    • Existing docs detection (379 tests)
  • Tests include edge cases, error handling, and accessibility concerns
  • This is exemplary test coverage - far exceeds typical standards

4. Intent Parsing - Smart Implementation

  • Natural language understanding for /wizard command arguments
  • Multi-tier keyword matching:
    1. Prefix matching (strict)
    2. Anywhere matching (loose)
    3. Pattern-based detection with regex
  • Handles ambiguous cases intelligently by prompting user
  • Goal extraction properly preserves user's original intent
  • Code in wizardIntentParser.ts is clean, well-documented, and thoroughly tested

5. Consistent with Codebase Patterns

  • Follows established conventions from CLAUDE.md:
    • Uses Tailwind for layout, inline styles for theme colors ✅
    • Proper IPC handler implementation ✅
    • Layer stack integration for modals ✅
    • Matches naming conventions and file organization ✅
  • Reuses existing utilities (templateVariables, austinFacts, fillerPhrases)
  • Properly integrates with existing Auto Run infrastructure

6. User Experience - Thoughtful Design

  • Confidence gauge with visual feedback (color transitions)
  • Streaming document preview with auto-scroll
  • "Let's Go" button appears at 80% confidence threshold
  • Previous UI state restoration when wizard ends
  • Error handling with retry functionality
  • Loading indicators and typing animations
  • Celebration confetti on completion (accessibility-aware with prefers-reduced-motion)

7. Documentation & Code Comments

  • Comprehensive JSDoc comments throughout
  • File-level documentation explaining purpose and patterns
  • Interface documentation with examples
  • Clear function signatures with TypeScript
  • Zero TODO/FIXME comments found (clean implementation)

Areas for Improvement

1. Performance Consideration ⚠️

Issue: The DocumentGenerationView component is 1,055 lines in a single file with multiple sub-components (DocumentEditor, DocumentSelector, StreamingDocumentPreview, etc.).

Recommendation: While this was intentionally done "for maintainability" per the commit message, consider:

  • The components were later extracted into separate files anyway
  • Large single files can impact IDE performance and code navigation
  • Consider breaking into smaller, focused components with clear responsibilities

Current State: The components were subsequently extracted (DocumentEditor.tsx, DocumentSelector.tsx, StreamingDocumentPreview.tsx), which is good. The DocumentGenerationView.tsx should now be refactored to primarily import and compose these components.

2. Error Handling - Minor Enhancement Opportunity ⚠️

Location: inlineWizardConversation.ts lines 285-350

Observation: The sendWizardMessage function has a 5-minute timeout and handles various error cases, but:

  • The timeout error message could be more actionable (suggest checking agent logs, network, etc.)
  • No exponential backoff for retries
  • Could benefit from more granular error classification (network vs. agent vs. parsing errors)

Recommendation: Consider adding error categorization:

export type WizardErrorType = 'timeout' | 'agent_unavailable' | 'parse_error' | 'network' | 'unknown';

export interface InlineWizardSendResult {
  success: boolean;
  response?: WizardResponse;
  error?: string;
  errorType?: WizardErrorType; // Add error categorization
  rawOutput?: string;
}

This would enable more specific UI feedback and recovery strategies.

3. Magic Numbers ⚠️

Location: Various files

Observation: Some hardcoded values that could be constants:

  • WizardConversationView.tsx line 22: const READY_CONFIDENCE_THRESHOLD = 80; (duplicated from service)
  • TypingIndicator line 106: 30ms character typing interval
  • Line 116: 5000ms phrase rotation interval
  • inlineWizardDocumentGeneration.ts line 29: GENERATION_TIMEOUT = 300000 (5 minutes)

Recommendation: Extract to a central configuration object or import from the service where appropriate.

4. Conversation History Memory Management 💭

Observation: The conversation history is stored in memory and grows unbounded during a wizard session.

Question: For very long wizard conversations (e.g., 100+ messages), should there be a practical limit or truncation strategy?

Current State: Not a critical issue since wizard sessions are typically short-lived, but worth considering for edge cases.

Potential Bugs/Issues

1. Session Cleanup ⚠️

Location: useInlineWizard.ts - endWizard and generateDocuments functions

Observation: The conversation session is tracked via conversationSessionRef, but cleanup happens in endWizard(). If generateDocuments() is called and errors occur before the user explicitly ends the wizard, there might be orphaned agent processes.

Recommendation: Ensure generateDocuments has proper cleanup in its error handlers:

catch (error) {
  // ... existing error handling ...
  // Ensure conversation session is cleaned up
  if (conversationSessionRef.current) {
    await endInlineWizardConversation(conversationSessionRef.current);
    conversationSessionRef.current = null;
  }
}

2. Race Condition Potential ⚠️

Location: useInlineWizard.ts - sendMessage function

Observation: Multiple concurrent calls to sendMessage could potentially overwrite isWaiting state or create duplicate agent processes.

Recommendation: Add a guard:

const sendMessage = useCallback(async (content: string) => {
  if (state.isWaiting) {
    console.warn('Already waiting for response, ignoring duplicate send');
    return;
  }
  // ... rest of implementation
}, [state.isWaiting, ...]);

Security Concerns

No critical security issues found

  • Path traversal properly mitigated
  • No SQL injection vectors (uses filesystem, not database)
  • No XSS vulnerabilities (React handles escaping, markdown rendering uses react-markdown)
  • Command injection prevented (uses window.maestro IPC, not shell execution)
  • Input validation present throughout

Performance Considerations

Good Practices

  • Lazy loading: Wizard components only mount when active
  • Memoization: Uses useCallback and useRef appropriately
  • Debouncing: Auto-scroll detection and resume button
  • Context optimization: Provider wraps only necessary components

💭 Potential Optimizations

  1. Large Document Parsing: The parseGeneratedDocuments function processes entire agent output in memory. For very large documents (>10MB), consider streaming parsing.

  2. File Reading in Iterate Mode: All existing documents are loaded into memory at once. For projects with 50+ Auto Run documents, this could be optimized with lazy loading.

  3. React Flow Integration: If the wizard is used while the Document Graph view is open, ensure the graph isn't unnecessarily re-rendering during wizard updates.

Test Coverage: ⭐⭐⭐⭐⭐ (Exceptional)

  • 983 lines of integration tests
  • 10 test files covering all major components
  • Edge cases well covered
  • Error scenarios tested
  • Accessibility considerations in tests
  • Mocking properly implemented

Outstanding test coverage - this is a model for the rest of the codebase.

Documentation & Maintainability

Excellent

  • Clear JSDoc comments throughout
  • Type definitions well documented
  • File-level documentation explains purpose
  • Follows established patterns from CLAUDE.md
  • Consistent naming conventions

Integration with Existing Code

Seamless Integration

  • Properly integrates with Auto Run infrastructure
  • Uses existing IPC handlers and extends where needed
  • Reuses existing utilities (templateVariables, confetti, etc.)
  • Follows established modal priority system
  • Correctly wired into App.tsx without disrupting existing flows

Accessibility

Good Accessibility Practices

  • Keyboard navigation supported (Escape to exit)
  • Confetti respects prefers-reduced-motion
  • Loading states with appropriate indicators
  • Error messages are user-friendly
  • Focus management in modals

Recommended Actions Before Merge

High Priority

  1. ✅ Verify CI/tests pass
  2. ⚠️ Add session cleanup in generateDocuments error handlers
  3. ⚠️ Add guard against concurrent sendMessage calls

Medium Priority

  1. 💭 Consider refactoring DocumentGenerationView.tsx to use extracted components
  2. 💭 Centralize magic numbers (typing intervals, timeouts)
  3. 💭 Add error type categorization for better user feedback

Low Priority (Future Enhancements)

  1. 💡 Add conversation history truncation for very long sessions
  2. 💡 Consider streaming parser for very large documents
  3. 💡 Add telemetry/analytics for wizard usage patterns

Final Verdict

APPROVE WITH MINOR RECOMMENDATIONS

This is exceptionally well-crafted code with:

  • ✅ Excellent architecture and separation of concerns
  • ✅ Outstanding test coverage (983 integration tests!)
  • ✅ Proper security measures (path traversal prevention)
  • ✅ Thoughtful UX with streaming, confidence gauges, and visual feedback
  • ✅ Seamless integration with existing codebase
  • ✅ Comprehensive documentation

The minor issues identified are not blockers and can be addressed in follow-up PRs if needed. The test coverage alone demonstrates the quality and thoroughness of this implementation.

Congratulations on an excellent feature implementation! 🎉


Compliance with CLAUDE.md

Follows all project guidelines:

  • Uses standardized vernacular
  • Proper TypeScript strict mode
  • Theme color usage (inline styles for theme, Tailwind for layout)
  • Proper IPC handler implementation
  • Layer stack integration
  • Correct commit message format
  • No CHANGELOG.md created (per guidelines)

Review conducted by Claude Code Assistant
Reviewed: 55 files changed, ~15,500 lines added
Time spent: Comprehensive analysis of architecture, security, tests, and integration

@pedramamini pedramamini self-assigned this Dec 30, 2025
@claude
Copy link

claude bot commented Jan 2, 2026

Comprehensive Code Review: PR #112

This is a well-architected feature with strong security practices and comprehensive test coverage (1,277+ lines of tests). Overall Assessment: Code Quality 8.5/10, Security 9/10, Test Coverage 9/10, Architecture 8/10.

Recommendation: Approve with required changes

Critical Issues (Must Fix Before Merge)

1. Race Condition in Conversation Session Management (HIGH)

File: src/renderer/hooks/useInlineWizard.ts:514-524
The sendMessage function reads conversationSessionRef.current but state updates happen asynchronously. If wizard is ended while a message is being sent, setState calls on inactive wizard can occur.
Fix: Add guards to check state.isActive before setState operations.

2. Stale Closure in sendMessage Callback (HIGH)

File: src/renderer/hooks/useInlineWizard.ts:528
The state.conversationHistory is captured at call time but may be outdated when async operation completes.
Fix: Use functional setState form to access most recent state.

3. Missing Cleanup on Component Unmount (MEDIUM)

File: src/renderer/hooks/useInlineWizard.ts:475-483
endWizard is called manually but no automatic cleanup on unmount. Listeners may leak.
Fix: Add useEffect cleanup calling endInlineWizardConversation on unmount.

4. Timeout Not Cleared on Early Exit (MEDIUM)

Files: src/renderer/services/inlineWizardConversation.ts:565-573, inlineWizardDocumentGeneration.ts:508-517
5-minute timeouts only cleared in success/error paths, continue running if user exits.
Fix: Track timeout IDs in refs and clear in cleanup.

Security Analysis - EXCELLENT

Path Traversal Protection ✅

File: src/renderer/services/inlineWizardDocumentGeneration.ts:104-118
Excellent sanitization with path separators, traversal sequences, control chars, leading dots removal.

Command Injection Protection ✅

All spawning uses safe IPC via window.maestro.process.spawn(). No shell execution.

XSS Protection ✅

ReactMarkdown used correctly with remarkGfm. No dangerouslySetInnerHTML.

Architecture & Code Quality

State Management: Well-designed hook-based pattern with good separation ✅
Type Safety: Strong TypeScript with documented interfaces ✅
Test Coverage: 917 lines hook tests, 360 lines doc generation tests, 237 lines parser tests ✅

Additional Recommendations (P1)

  1. Add document size limit (10MB) for streaming content
  2. Add existence check for UPDATE mode in iterate
  3. Simplify InlineWizardContext useMemo (just return wizardState)
  4. Add guards preventing wizard during active batch/agent execution
  5. Unify template variable substitution (mixed manual/utility usage at line 175)

Minor Performance Issues (P2)

  • String concatenation in streaming hot path (line 826-831) - consider array buffer
  • Template variable handling inconsistency in inlineWizardConversation.ts:175

Conclusion

High-quality work with excellent security and testing. Critical issues are fixable in hours without architectural changes. Once P0 items addressed, ready to merge. Great job! 🎉

@claude
Copy link

claude bot commented Jan 2, 2026

Comprehensive Code Review - PR #112: Inline Wizard Feature

Executive Summary

This is a large, well-structured PR (16,614 additions, 64 files) that adds a /wizard slash command for creating and iterating on Auto Run documents. The implementation demonstrates strong architectural patterns, comprehensive test coverage, and good adherence to the CLAUDE.md conventions. However, there are critical security issues that must be addressed before merging.

Overall Assessment: Request Changes ⚠️


1. Architecture & Code Quality ⭐⭐⭐⭐

Strengths

Excellent separation of concerns - State management, service layer, and UI components are cleanly separated
Strong TypeScript usage - Comprehensive interfaces with clear documentation
Follows CLAUDE.md conventions - Proper modal layer registration, settings persistence, theme handling
Well-organized components - Single-purpose components with clear responsibilities

Minor Concerns

File Path Construction (src/renderer/utils/existingDocsDetector.ts:34):

const normalizedPath = projectPath.endsWith('/') ? projectPath.slice(0, -1) : projectPath;
return `${normalizedPath}/${AUTO_RUN_FOLDER_NAME}`;

Consider using path.join() for cross-platform safety.

Magic Numbers (src/renderer/services/inlineWizardConversation.ts:479):

setTimeout(() => { ... }, 300000); // 5 minutes

Extract as named constants for maintainability.


2. Security ⚠️ CRITICAL ISSUES

🚨 Critical: Path Traversal Vulnerability

Location: src/renderer/services/inlineWizardDocumentGeneration.ts:115-129

export function sanitizeFilename(filename: string): string {
  return filename
    .replace(/[\/\\]/g, '-')      // ⚠️ Happens AFTER traversal check
    .replace(/\.\./g, '')          // ⚠️ Regex doesn't catch all cases
    .replace(/[\x00-\x1f\x7f]/g, '')
    .replace(/^\.+/, '')
    .trim()
    || 'document';
}

Issues:

  1. Incomplete .. filtering - doesn't prevent variants like ..., .%2e/, %2e%2e/
  2. Path separators replaced AFTER the .. check

Exploit Example:

const malicious = "....//etc/passwd";
// Could potentially traverse directories

Recommended Fix:

export function sanitizeFilename(filename: string): string {
  // Decode any URL encoding first
  let safe = decodeURIComponent(filename);
  
  // Remove ALL path separators and traversal attempts
  safe = safe
    .split(/[\/\\]/).join('')  // Remove ALL separators completely
    .replace(/\.{2,}/g, '')    // Remove 2+ consecutive dots
    .replace(/[\x00-\x1f\x7f]/g, '')
    .replace(/^\.+/, '')
    .trim();
  
  // Validate the result doesn't contain path components
  if (!safe || safe.includes('..') || safe.includes('/') || safe.includes('\\')) {
    return 'document';
  }
  
  return safe || 'document';
}

// Also add validation after sanitizing:
const resolvedPath = path.resolve(autoRunFolderPath, filename);
const expectedBase = path.resolve(autoRunFolderPath);
if (!resolvedPath.startsWith(expectedBase)) {
  throw new Error('Invalid file path detected');
}

Medium: Process Cleanup on Timeout

Location: src/renderer/services/inlineWizardConversation.ts:470-479

The timeout doesn't kill the agent process, which could lead to orphaned processes.

Fix:

const timeoutId = setTimeout(() => {
  console.log('[InlineWizard] TIMEOUT fired! Session:', session.sessionId);
  cleanupListeners();
  // Kill the agent process
  window.maestro.process.kill(session.sessionId).catch(() => {});
  resolve({ ... });
}, 300000);

Low: Link Opening Security

Location: src/renderer/components/InlineWizard/AustinFactsDisplay.tsx:111-116

if (!window.maestro?.shell?.openExternal?.(segment.url)) {
  window.open(segment.url, '_blank');  // ⚠️ Fallback could be exploited
}

Remove the window.open() fallback or validate URLs.


3. Performance 🔧 NEEDS IMPROVEMENT

Critical: Missing Cleanup in useEffect

Location: src/renderer/hooks/useInlineWizard.ts:943-988

useEffect(() => {
  const shouldSendGreeting = /* ... */;
  
  if (shouldSendGreeting) {
    setState((prev) => ({ ...prev, initialGreetingSent: true }));
    
    setTimeout(() => {
      sendMessage(initialMessage);
    }, 100);  // ⚠️ No cleanup
  }
}, [/* deps */]);

Fix:

useEffect(() => {
  // ... existing logic
  
  const timeoutId = setTimeout(() => {
    sendMessage(initialMessage);
  }, 100);
  
  return () => clearTimeout(timeoutId);
}, [/* deps */]);

Medium: Image Loading Performance

Location: src/renderer/components/InlineWizard/DocumentGenerationView.tsx:260-287

Every image triggers a separate IPC call. Add caching:

const imageCache = new Map<string, string>();

function MarkdownImage({ src, alt, folderPath, theme }: { ... }): JSX.Element | null {
  useEffect(() => {
    const cacheKey = `${folderPath}/${src}`;
    const cached = imageCache.get(cacheKey);
    
    if (cached) {
      setDataUrl(cached);
      setLoading(false);
      return;
    }
    
    // ... existing fetch logic, then cache result
  }, [src, folderPath]);
}

Low: String Concatenation Inefficiency

Location: src/renderer/services/inlineWizardConversation.ts:255-279

Replace += concatenation with array join for better performance.


4. Error Handling ⭐⭐⭐⭐

Strengths

Excellent error mapping - getUserFriendlyErrorMessage() provides great UX
Graceful degradation - Multiple fallback strategies for document parsing
Proper retry mechanism - Cleans history before retrying

Missing Error Boundary

Add error boundaries to prevent crashes from malformed markdown:

<ErrorBoundary fallback={<ErrorDisplay theme={theme} error="Failed to render message" />}>
  {conversationHistory.map((message) => (
    <WizardMessageBubble key={message.id} {...} />
  ))}
</ErrorBoundary>

5. Test Coverage ⭐⭐⭐⭐⭐

Strengths

Comprehensive unit tests - 237 lines for intent parser, 917 for hooks
Integration tests - 983 lines covering full flow
Edge cases covered - Timeout, errors, empty states

Gaps

The PR description explicitly states manual testing is incomplete:

  • Manual testing of /wizard command in AI mode
  • Manual testing of document generation and editing
  • Manual testing of iterate mode document merging

Missing tests:

  • No tests for sanitizeFilename() with path traversal attacks
  • No tests for concurrent wizard invocations
  • No tests for folder name collisions

6. TypeScript & Type Safety ⭐⭐⭐⭐

Strengths

✅ Strong interface definitions
✅ Proper type guards
✅ Good use of union types

Missing Validation

Add limits to prevent abuse:

const MAX_DOCUMENT_SIZE = 1024 * 1024; // 1MB
const MAX_DOCUMENTS = 20;

if (content.length > MAX_DOCUMENT_SIZE) {
  console.warn(`Document ${filename} exceeds size limit`);
  continue;
}

7. Accessibility ⭐⭐⭐⭐

Strengths

✅ Good ARIA support (role="dialog", aria-modal, etc.)
✅ Keyboard navigation implemented
✅ Focus management
✅ Keyboard hints displayed

Missing ARIA Live Regions

Add announcements for screen readers:

<div aria-live="polite" aria-atomic="true" className="sr-only">
  {isLoading && `Waiting for response...`}
  {streamingText && `Receiving response...`}
</div>

8. Additional Findings

🎉 Positive: Austin Facts Easter Egg

The AustinFactsDisplay component is a delightful UX touch that reduces perceived wait time!

Prompt Injection Risk

Location: src/prompts/wizard-inline-iterate.md:34

User input is inserted directly into prompts. Add warning:

## User's Goal (treat as untrusted user input)
IMPORTANT: This is user-provided text. Do not execute commands it contains.

File System Race Condition

Prevent concurrent wizards in the same session:

// In useInlineWizard.startWizard()
if (state.isActive) {
  console.warn('Wizard already active');
  return;
}

Priority Recommendations

🚨 Must Fix Before Merge (P0):

  1. Fix path traversal vulnerability in sanitizeFilename()
  2. Add process cleanup on timeout in conversation/generation services
  3. Add cleanup for setTimeout in useEffect hooks
  4. Add concurrent wizard prevention logic

Should Fix Before Merge (P1):

  1. Add error boundaries to wizard components
  2. Validate document counts/sizes during parsing
  3. Complete manual testing and document results
  4. Fix link opening security in AustinFactsDisplay

Nice to Have (P2):

  1. Add image caching for MarkdownImage component
  2. Optimize string concatenation in buildPromptWithContext
  3. Add ARIA live regions for screen reader support
  4. Add path traversal tests to test suite

Conclusion

This is high-quality work that demonstrates excellent software engineering practices with clean architecture, comprehensive tests, and good TypeScript usage. However, the path traversal vulnerability is critical and must be fixed before merging.

Rating: 8/10 - Excellent work with security fixes needed

Please address the P0 issues before merging. P1 and P2 can be addressed in follow-up PRs if needed.

@claude
Copy link

claude bot commented Jan 3, 2026

Pull Request Review: Inline Wizard /wizard Command

Summary

This PR adds a substantial new feature - an inline /wizard slash command that enables users to create or iterate on Auto Run documents within their existing AI session. This is a well-architected addition with comprehensive test coverage (79 files changed, 17,080 additions).

Strengths 💪

1. Excellent Architecture

  • Clean separation between services, hooks, and components
  • Stateless service functions work well with React hooks
  • Proper context provider pattern for cross-component state sharing
  • Well-defined TypeScript interfaces throughout

2. Comprehensive Test Coverage

  • 983 integration tests covering end-to-end flows
  • Unit tests for all key services (68 tests for intent parser, 30+ for hooks)
  • Edge cases well-covered (timeout handling, error states, confidence thresholds)

3. Intent Parsing Logic

The wizardIntentParser.ts is particularly well-designed with clear keyword-based detection for new vs iterate modes, fallback to ask mode when ambiguous, and pattern matching helpers.

4. User Experience

  • Confidence gauge provides clear feedback (0-100%)
  • Lets Go button only appears at 80%+ confidence threshold
  • Document generation with streaming preview
  • Exit confirmation to prevent accidental data loss

Issues & Recommendations 🔍

Critical Issues

1. Security: Process Spawning Pattern
Location: src/renderer/services/inlineWizardConversation.ts:577

Issue: Uses window.maestro.process.spawn() with directoryPath but lacks validation. A malicious path could potentially escape intended boundaries.

Recommendation: Add path validation before spawning to ensure absolute paths without traversal.

2. Timeout Handling - Potential Resource Leak
Location: src/renderer/services/inlineWizardConversation.ts:602-609

Issue: If process exits just before timeout, timeout is not cleared until after response parsing. Could trigger both callbacks.

Recommendation: Clear timeout in onDataHandler before resolving to prevent race condition.

High Priority Issues

3. Filename Sanitization Incomplete
Location: src/renderer/services/inlineWizardDocumentGeneration.ts:366

Issue: Sanitization allows filenames starting with - and does not prevent Windows reserved names.

Recommendation: Strip leading dashes and check for reserved names (CON, PRN, AUX, etc).

4. State Management - Race Condition Potential
Location: src/renderer/hooks/useInlineWizard.ts

Issue: conversationSessionRef lacks cleanup if startWizard is called while initializing. Could leave orphaned agent processes.

Recommendation: Add initialization guard and cleanup existing sessions before starting new ones.

Medium Priority Issues

5. Large File Handling
Location: src/renderer/utils/existingDocsDetector.ts:96

Issue: No size limit when loading existing documents. A 100MB+ file could freeze the UI.

Recommendation: Add max file size check (e.g., 10MB limit).

6. Error Messages Expose Internal Paths
Issue: Error messages may leak internal filesystem paths.

Recommendation: Sanitize error messages before displaying to users.

7. Empty Goal Extraction
Location: src/renderer/services/wizardIntentParser.ts:149-166

Issue: For input like add, goal becomes add which is not meaningful.

Recommendation: Return empty string for empty goals and fall back to ask mode.

Code Quality Notes ✨

Positive Patterns

  • Consistent error handling with try-catch and result objects
  • Excellent TypeScript usage with proper interfaces, type guards, generics
  • Proper React patterns with useCallback, useRef, useEffect dependencies
  • Good accessibility with screen reader announcements, keyboard shortcuts

Areas for Improvement

  • Magic numbers (80% confidence) could be named constants
  • Some console.log statements should be console.debug or removed
  • Mix of JSDoc and inline comments could be more consistent

Test Coverage 📊

Excellent coverage with 983 integration tests and 100+ unit tests covering edge cases. Missing end-to-end tests with actual agent processes (understandable).

Security Summary 🔒

Good security practices with IPC boundary isolation, filename sanitization, and timeout protection. Path validation could be stronger, and size limits missing for document loading.

Final Verdict ✅

This is a high-quality PR (8.5/10) with strong engineering practices. Architecture is sound, test coverage is excellent, and the feature adds significant value.

Required Before Merge:

  1. Fix timeout race condition (Critical Add Claude Code GitHub Workflow #2)
  2. Add path validation (Critical Internal Logging #1)
  3. Add size limits for document loading (Medium docs: enhance CLAUDE.md with work items and contribution workflow #5)

Recommended Before Merge:

  1. Improve filename sanitization (High Raychaser/claude path 01 #3)
  2. Add race condition guard (High Major Feature: Web Interface #4)
  3. Handle empty goal extraction (Medium Added Main Panel Tab Support #7)

Great work! The inline wizard will significantly improve UX for Auto Run workflows.


Review follows CLAUDE.md conventions and security best practices

@claude
Copy link

claude bot commented Jan 3, 2026

Code Review: PR #112 - Inline /wizard Command

This is an impressive feature with excellent architecture and comprehensive testing (7,200+ lines!). However, there are 3 critical security issues that must be addressed before merge.


🔴 Critical Issues (Must Fix)

1. Path Traversal Vulnerability in getAutoRunFolderPath()

File: src/renderer/utils/existingDocsDetector.ts:31-34

String concatenation without path normalization allows directory traversal. A malicious projectPath like '/project/../../etc' resolves to '/etc/Auto Run Docs', escaping the project directory.

Fix: Use path.resolve() and path.join() with validation.

2. Filename Sanitization Logic Flaw

File: src/renderer/services/inlineWizardDocumentGeneration.ts:115-129

Sanitization order is wrong - path separators replaced before '..' removal. Input '....//etc/passwd' becomes '....--etc-passwd' then Step 2 finds no '..' to remove.

Fix: Remove '..' FIRST before replacing separators.

3. Race Condition - Double Promise Resolution

File: src/renderer/services/inlineWizardConversation.ts:498-573

If process exits right before timeout fires, promise resolves twice.

Fix: Add 'resolved' flag guard.


⚠️ High Priority (Strongly Recommended)

  1. Memory Leak - Event listeners not cleaned up on unmount (useInlineWizard.ts:598-724)
  2. Concurrent Messages - No guard against concurrent sendMessage() calls
  3. Performance - useMemo with 36 dependencies defeats purpose (InlineWizardContext.tsx:80-116)
  4. Resource Leak - Process not killed on timeout (inlineWizardConversation.ts:500)
  5. UX - Silent failure when /wizard runs without active session (App.tsx:4588)

✅ Strengths

  1. Clean architecture with stateless services
  2. 7,200+ lines of comprehensive tests
  3. Strong TypeScript usage
  4. Excellent UX with confidence gauge
  5. Well-documented with JSDoc
  6. Proper accessibility
  7. Comprehensive error handling

Final Verdict

Conditionally Approve pending fixes for critical security issues #1-3.

Excellent work overall! The inline wizard is a great UX improvement. Just need to address those security gaps before shipping.

Estimated fix time: 2-4 hours for critical issues, 4-6 hours total.

@claude
Copy link

claude bot commented Jan 3, 2026

Code Review - PR #112: Inline /wizard Command

This is an impressive feature addition with excellent test coverage and generally strong code quality. The implementation demonstrates good security awareness with proper IPC patterns and safe command execution. However, I've identified 3 critical security issues that should be addressed before merging.


Critical Security Issues

1. Path Traversal Vulnerability in Filename Sanitization (HIGH)

Location: src/renderer/services/inlineWizardDocumentGeneration.ts:115-129

Issue: The sanitizeFilename function has incomplete protection against path traversal. The regex /../g only removes literal ".." sequences but doesn't handle encoded sequences like %2e%2e, multiple dots like ....//etc/passwd, or unicode variations.

Impact: AI-generated filenames could potentially bypass sanitization and write outside the intended directory.

Recommendation: Replace all dots with dashes, decode URL encoding first, and add shell special character filtering. While you have good defense-in-depth with validatePathWithinFolder in autorun.ts:197-201, sanitization should happen before path construction.


2. XSS Risk via Unsafe URL Handling (MEDIUM-HIGH)

Location: src/renderer/components/InlineWizard/DocumentGenerationView.tsx:635-646

Issue: The markdown link handler opens URLs without validation using window.maestro.shell.openExternal(href) directly.

Attack scenario: AI generates document with click me and user clicks it, potentially executing the JavaScript URI.

Recommendation: Validate URLs against an allowlist (http://, https://, mailto:) before calling openExternal. Render unsafe URLs with visual indication they're blocked.


3. Memory Leak - Event Listener Cleanup (HIGH)

Location: src/renderer/services/inlineWizardConversation.ts:520-573

Issue: Event listeners check session ID match but don't cleanup if IDs don't match. With many wizard invocations, listeners accumulate causing memory leaks.

Recommendation: Store all cleanup functions immediately in an array and ensure they're called on all code paths (timeout, errors, AND success).


Medium Priority Issues

4. Race Condition in Session Auto-Creation

Location: src/renderer/hooks/useInlineWizard.ts:621-656

If two messages are sent rapidly in 'ask' mode, both might pass the !session check before either completes, potentially creating duplicate sessions. Consider adding a session creation lock using a Map of promises.

5. Missing Input Validation

Location: src/renderer/services/wizardIntentParser.ts:183-237

No length limits on natural language parsing. A user could paste megabytes of text causing UI freeze. Add a MAX_INTENT_INPUT_LENGTH constant (suggest 10KB).


Excellent Practices Observed

  1. Proper command execution - Correctly uses window.maestro.* APIs (no shell injection risks)
  2. Settings persistence - Follows the documented pattern perfectly
  3. Theme colors - Uses inline styles for themed colors as required
  4. No dangerous HTML - No dangerouslySetInnerHTML usage
  5. Excellent test coverage - Comprehensive unit and integration tests
  6. Defense-in-depth - Path validation at IPC boundary
  7. Clean separation of concerns - Well-organized services, hooks, and components

Minor Improvements (Post-Merge)

  1. Type safety: Replace any types in markdown components with proper React.ComponentProps
  2. Performance: Add conversation history limits to prevent unbounded state growth
  3. Error handling: Propagate file operation errors to UI instead of silent failures

Summary

Overall assessment: This is a well-architected feature with excellent test coverage and good security awareness. The issues identified are primarily edge cases and hardening opportunities rather than fundamental design flaws.

Recommendation: Fix the 3 critical issues (#1, #2, #3), then this is ready to merge. The medium-priority issues can be addressed in follow-up PRs.

Security test suggestions:

  • Test filename sanitization with encoded sequences (%2e%2e%2f)
  • Test XSS attempts in markdown links (javascript:, data:, file:)
  • Test event listener cleanup with rapid wizard open/close cycles
  • Test large input handling (10MB+ conversation history)

Great work overall!

@claude
Copy link

claude bot commented Jan 4, 2026

Comprehensive Code Review: PR #112 - Inline Wizard Feature

Overview

This is a well-architected, thoughtfully implemented feature with strong code quality and excellent security practices. Overall assessment: APPROVE WITH CHANGES

Critical Issues (Must Fix Before Merge)

Issue 1: Stale Closure in Timeout Handler

File: src/renderer/services/inlineWizardConversation.ts:511-520
Problem: outputBuffer in timeout closure captures initial empty string instead of accumulated output
Fix: Use ref or move timeout logic to access current state

Issue 2: Missing File Watcher Cleanup in Error Path

File: src/renderer/services/inlineWizardDocumentGeneration.ts:867-875
Problem: Spawn error path doesn't call unwatchFolder(), leaving watcher active
Fix: Add unwatchFolder() to cleanupAll()

Issue 3: Memory Leak in Per-Tab Refs

File: src/renderer/hooks/useInlineWizard.ts:348-351
Problem: conversationSessionsMap and previousUIStateRefsMap accumulate orphaned entries when tabs close
Fix: Add useEffect cleanup to call endInlineWizardConversation for all sessions on unmount

Issue 4: Race Condition in File Watcher

File: src/renderer/services/inlineWizardDocumentGeneration.ts:758-802
Problem: readWithRetry() is fire-and-forget, could cause unhandled rejections
Fix: Add .catch() handler

High Priority Issues

Issue 5: Insufficient Validation After Sanitization

File: src/renderer/services/inlineWizardDocumentGeneration.ts:652-654
Problem: sanitizeFolderName() could return empty/invalid folder name
Fix: Validate baseFolderName is non-empty after sanitization

Issue 6: Inefficient Streaming State Updates

File: src/renderer/hooks/useInlineWizard.ts:1095-1107
Problem: Hundreds of chunks/sec trigger full state spread + re-render, causing UI lag
Fix: Batch updates using requestAnimationFrame

Medium Priority Issues

Issue 7: Synchronous Document Loading

File: src/renderer/hooks/useInlineWizard.ts:526-531
Problem: Sequential reads block wizard startup for projects with many docs
Fix: Use Promise.all() for parallel reads

Issue 8: Missing Agent Error Integration

Problem: Wizard doesn't integrate with session.agentError system
Fix: Add integration around inlineWizardConversation.ts:746

Issue 9: Missing Usage Stats Tracking

Problem: Wizard spawns don't record to stats, work log, or context usage
Fix: Add tracking after successful agent calls

Test Coverage Gaps

  • Timeout scenarios (use vi.useFakeTimers())
  • Per-tab state isolation
  • File watcher error handling

Strengths

  • Clean separation of concerns with stateless services
  • Per-tab wizard isolation using Map pattern
  • Robust path sanitization (path traversal, hidden files, etc.)
  • 68+ tests for intent parser, 25+ for hooks, integration tests
  • Reusable components extracted for maintainability

Overall Assessment

Excellent work! Issues are primarily edge cases rather than fundamental flaws. Fix the 4 critical issues, then merge. Other improvements can follow in subsequent PRs.

@claude
Copy link

claude bot commented Jan 4, 2026

Code Review: PR #112 - Inline /wizard Command

Overall Assessment

This is a well-architected feature with good test coverage and clear documentation (~20K additions). The code follows most project patterns correctly. However, there are critical security issues that must be addressed before merge.

Recommendation: Request Changes - Fix security and memory issues, then approve.


Critical Issues (Must Fix)

1. Security: Command Injection Vulnerability

File: src/renderer/services/inlineWizardConversation.ts:433-460

The buildArgsForAgent function spreads user-controlled args without validation. If agent.args comes from custom agent configurations, malicious arguments could be injected.

Fix: Validate args against a whitelist or use safe patterns from src/main/utils/execFile.ts per CLAUDE.md.

2. Security: Path Traversal Protection Incomplete

File: src/renderer/services/inlineWizardDocumentGeneration.ts:178-223

The sanitizeFilename function only removes "..", missing "....", "./..", etc. Add explicit validation to ensure paths stay within Auto Run folder.

3. Performance: Memory Leak Risk

File: src/renderer/hooks/useInlineWizard.ts:329-366

Per-tab wizard state Maps accumulate unbounded. When tabs close, their states remain in memory. Add cleanup in App.tsx tab close handler.

4. Error Handling: Unhandled Promise Rejections

File: src/renderer/services/inlineWizardConversation.ts:622-641

Callbacks in promise chains aren't wrapped in try-catch. If callbacks.onReceiving() throws, it could crash the renderer.


High Priority Issues (Should Fix)

5. Race Condition: State Synchronization

File: src/renderer/App.tsx:4423-4521

Wizard state sync effect has 20+ dependencies and runs frequently, risking race conditions. Use useRef pattern.

6. Performance: Inefficient Regex

File: src/renderer/services/inlineWizardDocumentGeneration.ts:373-419

Three separate regex matches per document block. Combine into single regex with named groups.

7. Architecture: Duplicated Agent Output Parsing

Parsing logic duplicated across 3 files. Extract to shared service delegating to src/main/parsers/.

8. Type Safety: Weak Typing

File: src/renderer/services/inlineWizardConversation.ts:433

Using any type for agent. Import proper Agent type.


Test Coverage

Strengths: Comprehensive unit tests, integration tests, hook tests

Gaps:

  • No tests for document generation streaming
  • No tests for concurrent wizard sessions
  • No tests for error recovery/retry

Summary

Must Fix Before Merge:

  1. Validate agent.args to prevent command injection
  2. Add path traversal validation
  3. Add cleanup for closed tab wizard states
  4. Wrap all callbacks in try-catch

Should Fix: Race condition, performance, architecture, type safety

Great work on this feature! The architecture is solid and test coverage is strong. Please address the security and memory issues, then this will be ready to merge.

@pedramamini pedramamini marked this pull request as draft January 4, 2026 13:39
@claude
Copy link

claude bot commented Jan 4, 2026

Code Review: PR #112 - Inline Wizard Command

Overall Assessment

Code Quality: 8/10 | Architecture: 9/10 | Security: 7/10

Excellent architectural design with per-tab wizard state management. However, some critical issues need addressing before merge.


Critical Issues (Must Fix)

1. Memory Leak Risk - Process Cleanup

File: src/renderer/hooks/useInlineWizard.ts:599-609

The endWizard function calls endInlineWizardConversation, but it's unclear if this terminates spawned AI processes. You'll leak processes when users exit wizard mid-conversation, switch tabs rapidly, or close tabs with active wizards.

Action Required: Verify that endInlineWizardConversation calls window.maestro.process.kill() on spawned PIDs. If not, track PIDs in wizard state and explicitly kill them.

2. Race Condition - Duplicate Session Creation

File: src/renderer/hooks/useInlineWizard.ts:626-687

The sendMessage function has a race condition when auto-creating conversation sessions. If called twice rapidly, both calls might create duplicate sessions, causing duplicate AI processes and orphaned processes.

Fix: Add a creation lock using a Set to track in-progress session creation.

3. Security - Filename Sanitization Insufficient

File: src/renderer/services/inlineWizardDocumentGeneration.ts:178-192

The sanitizeFilename function misses: Windows reserved names (CON, PRN, AUX, etc), length limits (255 chars), trailing dots/spaces, and 3+ dot directory traversal.

Fix: Add checks for Windows reserved names, implement MAX_LENGTH=200, and handle trailing dots/spaces.


High Priority Issues (Should Fix)

4. State Logic - currentTabId Confusion

File: src/renderer/hooks/useInlineWizard.ts:798-816

Line 800 has always-false condition. Simplify the fallback logic.

5. Error Handling - Silent Failures

File: src/renderer/hooks/useInlineWizard.ts:570-582

On Auto Run folder check failure, wizard silently defaults to new mode. User might unknowingly overwrite existing docs. Set mode to null and show error instead.

6. Intent Parser - Ambiguous Keywords

File: src/renderer/services/wizardIntentParser.ts:183-237

Input like '/wizard fix authentication bug' with no existing docs goes to new mode, but fix suggests iteration. Add ambiguous keywords (fix, bug, issue, problem, refactor) that trigger ask mode.

7. Missing Timeout Implementation

File: src/renderer/services/inlineWizardDocumentGeneration.ts:29-30

GENERATION_TIMEOUT is defined but not implemented. Wizard gets stuck if AI hangs.


Excellent Design Decisions

  1. Per-tab wizard state - Allows concurrent wizards, brilliant!
  2. Intent parser separation - Clean, testable architecture
  3. Streaming document preview - Great UX
  4. Austin Facts during generation - Fun + reduces perceived wait
  5. Template variable integration - Consistent patterns

Missing Test Coverage

Add integration tests for:

  • Process cleanup on rapid tab switching
  • Error recovery from failed generation
  • SSH remote Auto Run folder detection
  • Concurrent wizards on different tabs
  • Memory leak verification

Final Verdict

Approve with Required Changes

The architectural design is excellent. The per-tab wizard state is particularly clever.

However, issues #1-3 are critical and must be fixed before merge to prevent memory leaks, race conditions, and security vulnerabilities.

Estimated fix time: 2-4 hours
Risk if merged as-is: HIGH (memory leaks, process accumulation)

Once P0 issues are resolved, this will be a valuable addition to Maestro!

@claude
Copy link

claude bot commented Jan 4, 2026

Code Review: PR #112 - Inline /wizard Command

Executive Summary

This is a substantial, well-architected feature adding ~20k lines of code with comprehensive test coverage (2,855 test lines). The implementation follows established patterns from CLAUDE.md and demonstrates strong engineering practices. However, there are several critical issues that should be addressed before merging.


1. Code Quality & Best Practices

Strengths ✅

  • Excellent TypeScript usage - Comprehensive interfaces, proper type guards, and clear type definitions
  • Consistent patterns - Follows CLAUDE.md conventions (IPC patterns, theme usage, modal priorities)
  • Good separation of concerns - Clear boundaries between hooks, services, and components
  • Comprehensive documentation - Extensive JSDoc comments throughout
  • Test coverage - 2,855 lines of tests covering unit, integration, and edge cases

Issues

🔴 CRITICAL: Context Memo Dependencies Issue

File: src/renderer/contexts/InlineWizardContext.tsx:80-116

The useMemo dependency array includes mutable objects (conversationHistory, generatedDocuments, existingDocuments, state). This will cause constant re-renders because array/object references change on every state update.

const value = useMemo<InlineWizardContextValue>(() => wizardState, [
  // ...
  wizardState.conversationHistory,  // ❌ Array reference changes every message
  wizardState.generatedDocuments,   // ❌ Array reference changes
  wizardState.existingDocuments,    // ❌ Array reference changes
  wizardState.state,                // ❌ Object reference changes
  // ...
]);

Impact: Performance degradation, especially during conversation (every message triggers full context re-render)

Recommendation: Either:

  1. Remove the useMemo entirely (the hook already returns stable references via useCallback)
  2. Use only primitive values in dependencies and rely on the hook's internal memoization
  3. Use a shallow equality check library

🟡 MEDIUM: Potential Race Condition in sendMessage

File: src/renderer/hooks/useInlineWizard.ts:626-768

const sendMessage = useCallback(
  async (content: string, callbacks?: ConversationCallbacks): Promise<void> => {
    const tabId = currentTabId || 'default';
    // Guard against concurrent calls
    const currentState = tabStatesRef.current.get(tabId);
    if (currentState?.isWaiting) {  // ✅ Good guard
      console.warn('[useInlineWizard] Already waiting for response...');
      return;
    }
    
    // But then there's a window before setting isWaiting:
    setTabState(tabId, (prev) => ({
      ...prev,
      isWaiting: true,  // ⚠️ Multiple calls could slip through here
      // ...
    }));

Issue: Between the check and the state update, multiple rapid calls could pass the guard.

Recommendation: Use a ref-based lock:

const isWaitingRef = useRef(false);

if (isWaitingRef.current) return;
isWaitingRef.current = true;

try {
  // ... send logic
} finally {
  isWaitingRef.current = false;
}

🟡 MEDIUM: Logic Error in getEffectiveTabId

File: src/renderer/hooks/useInlineWizard.ts:805-811

const getEffectiveTabId = useCallback(() => {
  const tabId = currentTabId || 'default';
  if (tabId !== currentTabId) {  // ⚠️ This condition is always false
    setCurrentTabId(tabId);      // This code never runs
  }
  return tabId;
}, [currentTabId]);

Issue: Logic error - tabId is derived from currentTabId, so they're always equal.

Recommendation: Simplify:

const getEffectiveTabId = useCallback(() => {
  return currentTabId || 'default';
}, [currentTabId]);

2. Potential Bugs

🔴 CRITICAL: Timeout Cleanup Issues

File: src/renderer/services/inlineWizardConversation.ts:512-524

const timeoutId = setTimeout(() => {
  console.log('[InlineWizard] TIMEOUT fired!');
  cleanupListeners();
  window.maestro.process.kill(session.sessionId).catch((err) => {
    console.warn('[InlineWizard] Failed to kill timed-out process:', err);
  });
  resolve({
    success: false,
    error: 'Response timeout...',
  });
}, 300000);  // 5 minutes

Issue: If onExit fires first, it clears the timeout. But if timeout fires while exit handler is running, both will attempt to resolve the promise, potentially causing Promise already settled errors or cleanup race conditions.

Recommendation: Use a flag to prevent double-resolution:

let isResolved = false;
const safeResolve = (result) => {
  if (!isResolved) {
    isResolved = true;
    cleanupListeners();
    clearTimeout(timeoutId);
    resolve(result);
  }
};

🟡 MEDIUM: File Watcher Not Always Cleaned Up

File: src/renderer/services/inlineWizardDocumentGeneration.ts:725-810

Issue: If the initial watchFolder call is still pending when cleanupAll runs, the watcher might not be registered yet, causing a leak.

Recommendation: Track the watch promise:

let watchPromise: Promise<any> | null = null;

watchPromise = window.maestro.autorun.watchFolder(subfolderPath).then(...);

// In cleanup:
if (watchPromise) {
  watchPromise.then(() => window.maestro.autorun.unwatchFolder(subfolderPath));
}

🟡 MEDIUM: Unbounded Retry Loop

File: src/renderer/services/inlineWizardDocumentGeneration.ts:763-795

Issue: The setTimeout in retry delay isn't tracked. If cleanup happens during retry delay, the timeout continues and could read from a file that's been deleted or moved.

Recommendation: Use AbortController or track timeout IDs for cleanup.


3. Performance

🟢 GOOD: Efficient String Building

File: src/renderer/services/inlineWizardConversation.ts:279-294

const parts: string[] = [session.systemPrompt, ''];
// ... build array
return parts.join('\n');

✅ Uses array.join() instead of string concatenation - excellent for large conversation histories.


🟡 MEDIUM: Streaming Content Accumulation

File: src/renderer/hooks/useInlineWizard.ts:1108-1113

if (displayText) {
  setTabState(tabId, (prev) => ({
    ...prev,
    streamingContent: prev.streamingContent + displayText,  // ⚠️ String concat in hot path
  }));
}

Issue: For long AI responses, this creates a new string on every chunk, potentially thousands of times. This is a hot path during streaming.

Recommendation: Accumulate in a ref and only update state periodically:

const streamBufferRef = useRef('');
const flushTimer = useRef<number>();

// In onChunk:
streamBufferRef.current += displayText;
clearTimeout(flushTimer.current);
flushTimer.current = setTimeout(() => {
  setTabState(tabId, (prev) => ({
    ...prev,
    streamingContent: streamBufferRef.current,
  }));
}, 50); // Flush every 50ms max

4. Security

✅ EXCELLENT: Input Sanitization

File: src/renderer/services/inlineWizardDocumentGeneration.ts:178-223

export function sanitizeFilename(filename: string): string {
  return filename
    .replace(/[\/\\]/g, '-')        // ✅ Path traversal
    .replace(/\.\./g, '')           // ✅ Directory traversal
    .replace(/[\x00-\x1f\x7f]/g, '')// ✅ Control characters
    .replace(/^\.+/, '')            // ✅ Hidden files
    .trim()
    || 'document';                   // ✅ Empty string guard
}

Strengths:

  • Comprehensive path traversal prevention
  • Handles both Unix and Windows paths
  • Removes control characters and null bytes
  • Prevents hidden file creation
  • Good default fallback

🟡 MEDIUM: Command Injection Risk (Low Probability)

File: src/renderer/services/inlineWizardConversation.ts:433-460

Observation: The agent.args come from window.maestro.agents.get(). If that data is user-modifiable or comes from untrusted config, this could be exploited.

Recommendation: Validate or whitelist allowed args before spreading agent.args.


✅ GOOD: No Shell Execution

The code properly uses window.maestro.process.spawn() with args array (not shell strings). Follows CLAUDE.md security requirement.


5. Test Coverage

✅ EXCELLENT Coverage (2,855 test lines)

Breakdown:

  • useInlineWizard.test.ts: 1,114 lines (hook state management)
  • InlineWizardContext.test.tsx: 758 lines (context integration)
  • InlineWizardFlow.test.tsx: 983 lines (end-to-end integration)
  • Plus component tests, intent parser, docs detector

Coverage includes:
✅ Intent parsing edge cases
✅ Mode selection logic
✅ Confidence threshold behavior
✅ Error handling and retry
✅ Per-tab wizard isolation
✅ State restoration


🟡 MISSING: Timeout/Cancellation Tests

Gap: No tests for:

  • Timeout scenarios (5-minute conversation timeout, 20-minute generation timeout)
  • Cleanup during active generation
  • Race conditions when rapidly switching tabs with active wizards

Recommendation: Add integration tests using vi.useFakeTimers() for timeout scenarios.


6. Architecture & Design

✅ EXCELLENT: Follows CLAUDE.md Patterns

  1. Per-tab state isolation - Uses Map<tabId, WizardState> for multi-tab support
  2. IPC patterns - Proper use of window.maestro.* APIs
  3. Theme integration - Inline styles for theme colors (not Tailwind classes)
  4. Modal priorities - Registered in modalPriorities.ts
  5. Service layer - Clear separation between hooks and business logic

🟡 MEDIUM: Context Provider Over-Wrapping

File: src/renderer/contexts/InlineWizardContext.tsx

Issue: The context is a simple wrapper around useInlineWizard hook. Combined with the memo issue, this adds debugging complexity.

Recommendation: Consider whether the context is necessary. If keeping it, fix the memo dependencies.


🟢 GOOD: Service Architecture

The separation into three services is clean:

  • wizardIntentParser.ts - Pure logic, no side effects
  • inlineWizardConversation.ts - Agent communication
  • inlineWizardDocumentGeneration.ts - Document creation

7. Additional Observations

🟡 Inconsistent Error Messages

Some user-facing errors are too technical:

error: 'Failed to spawn agent: ${error.message}'

Recommendation: Map to user-friendly messages.


🟢 Good Accessibility

  • data-testid attributes for testing
  • Keyboard navigation support
  • Theme-aware styling

🟡 Large File Sizes

  • useInlineWizard.ts: 1,268 lines
  • inlineWizardDocumentGeneration.ts: 1,138 lines

Recommendation: Consider splitting into smaller modules for maintainability.


Priority Recommendations

🔴 MUST FIX Before Merge:

  1. Context memo dependencies - Remove mutable objects from dependency array
  2. Timeout cleanup race condition - Add resolution guard
  3. getEffectiveTabId logic error - Fix or remove dead code

🟡 SHOULD FIX Before Merge:

  1. sendMessage race condition - Add ref-based lock
  2. File watcher cleanup - Track watch promise
  3. Retry loop cleanup - Add abort mechanism
  4. Streaming performance - Batch state updates

🟢 NICE TO HAVE:

  1. Add timeout/cancellation tests
  2. Split large files for maintainability
  3. Improve user-facing error messages
  4. Validate agent.args whitelist

Summary

This is high-quality work with excellent test coverage and adherence to project standards. The architecture is sound and the feature is well-documented. However, there are critical performance and cleanup issues that must be addressed before merging.

Recommendation: Request changes - Fix the must-fix items (1-3), then this is good to merge. The should-fix items (4-7) can be addressed in follow-up PRs if time is limited.

Great job on this substantial feature! 🎉

@claude
Copy link

claude bot commented Jan 4, 2026

Pull Request Review: Inline /wizard Command

This is a comprehensive review of PR #112, which adds an inline wizard feature for creating Auto Run documents within existing AI sessions.


Overview

This PR adds substantial functionality with 20,392 additions across 115 files. The feature enables users to invoke /wizard to create or iterate on Auto Run documents directly in their AI session, providing a faster alternative to the full-screen onboarding wizard.

Key Strengths:

  • Well-architected multi-tab wizard state management
  • Comprehensive test coverage (unit + integration tests)
  • Thoughtful natural language intent parsing
  • Excellent code documentation and inline comments
  • Proper error handling and recovery mechanisms

Code Quality & Best Practices

Architecture (Excellent)

The hook design using Map<tabId, InlineWizardState> is excellent for allowing multiple independent wizards across tabs. The conversation and document generation services use stateless functions instead of singletons, which is cleaner than the onboarding wizard's class-based approach.

Testing (Strong)

Coverage is solid with unit tests for wizardIntentParser covering all keyword patterns, unit tests for useInlineWizard hook covering state transitions, and integration tests verifying end-to-end flows.


Potential Issues & Concerns

1. Race Condition in sendMessage (Medium Priority)

Location: useInlineWizard.ts:636

The guard against concurrent calls uses a stale ref check. There's a gap between checking isWaiting and setting it (line 654). If sendMessage is called twice in rapid succession before the state update propagates, both calls could pass the guard.

Recommendation: Use a synchronous lock pattern with a ref that's set immediately.

2. Memory Leak Risk with Conversation Sessions (Low Priority)

Location: useInlineWizard.ts:351

conversationSessionsMap.current is a ref that persists across renders. If a tab is closed via external means (not through endWizard or reset), the session remains in the map.

Recommendation: Add cleanup in a useEffect that monitors tabStates and removes orphaned sessions.

3. Intent Parser Edge Cases (Low Priority)

Location: wizardIntentParser.ts:145-166

If user types 'add' with no goal, it returns 'add' as the goal. Better to return empty string.

4. Hard-coded Timeouts (Low Priority)

The 20-minute generation timeout is fine, but consider making it configurable via settings for users with slower hardware or complex projects.


Security Concerns

Template Variable Substitution

Verify that user input in goal and projectName cannot inject prompts.

Recommendation: Add input validation to reject control characters or unusual Unicode that might confuse the AI agent.

File Path Validation

getAutoRunFolderPath concatenates user-controlled paths. Ensure projectPath is validated elsewhere to prevent path traversal.


Performance Considerations

Streaming Content Accumulation

For long-running document generation, repeatedly concatenating strings and updating state on every chunk could cause performance issues.

Recommendation:

  • Debounce UI updates (batch chunks every 100ms)
  • Consider limiting streamingContent to last N characters for preview purposes

Conversation History Growth

The entire conversation history is passed to the AI agent on each message. For long wizard sessions, this could exceed token limits.

Recommendation: Implement conversation summarization or truncation after N messages.


Test Coverage Gaps

Consider adding:

  1. Error recovery tests: agent crashes during generation, network timeout scenarios, disk full when saving documents
  2. Edge case tests: multiple rapid /wizard invocations, switching tabs mid-wizard, closing app during document generation
  3. Integration tests: end-to-end flow with actual file I/O, multi-tab wizard isolation verification

Recommendations for Merge

Before Merging (High Priority):

  1. Add race condition guard in sendMessage
  2. Add orphaned session cleanup in useInlineWizard
  3. Verify path traversal protection in file operations
  4. Add test for empty goal extraction edge case

After Merging (Low Priority):

  1. Monitor wizard completion rates and error logs
  2. Consider adding streaming content debouncing if performance issues arise
  3. Evaluate conversation history truncation after real-world usage

Final Assessment

Overall Quality: 4.5/5

This is a well-designed feature with solid architecture, good test coverage, and thoughtful UX considerations. The per-tab state management is particularly well-executed. The issues identified are mostly minor and can be addressed in follow-up PRs.

Recommendation: APPROVE with minor fixes

The codebase will benefit from this feature, and the implementation quality is high. Address the race condition and memory leak concerns, then this is ready to merge.


Review completed by Claude Code Assistant
Reviewed: 115 files, 20,392 additions, 1,298 deletions

@claude
Copy link

claude bot commented Jan 4, 2026

Code Review - PR #112: Add inline /wizard command

I've completed a thorough review of this substantial PR (20,400 additions across 116 files). This is an impressive implementation that adds inline wizard functionality for Auto Run document creation.

✅ Strengths

1. Excellent Architecture & Design

  • Well-structured separation of concerns: hooks, contexts, services, and components
  • Per-tab isolation: wizard state management handles multiple concurrent wizards across tabs using Map-based approach
  • Reusable components: Smart extraction of DocumentEditor and DocumentSelector to avoid duplication
  • Intent parsing: Natural language detection is well-designed with comprehensive keyword matching

2. Strong Test Coverage

  • Unit tests for core services (wizardIntentParser, existingDocsDetector)
  • Component tests for all UI components (WizardMessageBubble, WizardInputPanel, etc.)
  • Hook tests for useInlineWizard state management
  • Integration tests for the full wizard flow

3. Code Quality

  • Excellent TypeScript usage with well-defined interfaces
  • Comprehensive JSDoc comments
  • Proper error boundaries and user-facing error messages
  • Good keyboard navigation support and focus management

4. Alignment with Project Standards

  • Follows CLAUDE.md conventions
  • Adheres to CONSTITUTION.md principles
  • Consistent with existing patterns

(Review continues in next comment...)

@claude
Copy link

claude bot commented Jan 4, 2026

⚠️ Issues & Concerns

1. Memory Leak Potential (High Priority) ❗

Location: src/renderer/hooks/useInlineWizard.ts:660-680

The conversationSessionsMap ref is never cleaned up on unmount. If tabs are closed while wizard is active, the session remains in the map.

Recommendation: Add cleanup in endWizard and a useEffect cleanup on unmount:

useEffect(() => {
  return () => {
    conversationSessionsMap.current.forEach(async (session) => {
      try {
        await endInlineWizardConversation(session);
      } catch (e) {
        console.warn('Failed to cleanup session', e);
      }
    });
    conversationSessionsMap.current.clear();
  };
}, []);

2. Race Condition in State Updates (Medium Priority)

Location: src/renderer/hooks/useInlineWizard.ts:634-639

The guard against concurrent sendMessage calls uses tabStatesRef.current.get(tabId) which could be stale. Between the check and setState, another call could slip through.

Recommendation: Use a dedicated lock/semaphore or move to a queue-based approach.

3. Regex Vulnerabilities (Low Priority)

Location: inlineWizardDocumentGeneration.ts:439

Using exec in a while loop without resetting lastIndex can cause infinite loops.

Recommendation: Use matchAll instead:

for (const match of content.matchAll(phaseSectionPattern)) {
  // process match
}

4. File Watcher Cleanup (Medium Priority) ❗

Location: inlineWizardDocumentGeneration.ts

File watcher cleanup depends on cleanupAll() being called. If promise rejects before setting up error handlers, the watcher might leak.

Recommendation: Wrap watcher creation in try/catch and ensure it's always in the cleanup list.

5. Path Traversal Risk (Medium Priority)

Generated file paths include user input - should validate against path traversal.

Recommendation: Add path sanitization:

function sanitizePath(userPath: string, basePath: string): string {
  const normalized = path.normalize(userPath);
  const resolved = path.resolve(basePath, normalized);
  if (!resolved.startsWith(basePath)) {
    throw new Error('Path traversal detected');
  }
  return resolved;
}

@claude
Copy link

claude bot commented Jan 4, 2026

🔍 Code Quality & Performance

Positive Patterns

✅ Excellent use of TypeScript discriminated unions for wizard modes
✅ Proper memoization in React components (useMemo, useCallback)
✅ Good separation of streaming logic from document parsing
✅ Austin Facts feature is a nice touch for user engagement during generation
✅ Good use of requestAnimationFrame for typewriter effects

Areas for Improvement

⚠️ Some functions are quite long (e.g., generateInlineDocuments is 400+ lines) - consider breaking into smaller helpers
⚠️ The extractDisplayTextFromChunk function has repetitive agent-type checks - could use a strategy pattern
⚠️ Context memoization dependencies might cause excessive re-renders - consider splitting into multiple contexts if performance becomes an issue
⚠️ Document scanning in existingDocsDetector.ts is synchronous - could block on large Auto Run folders

Minor Issues

  • Hardcoded timeout of 20 minutes (inlineWizardDocumentGeneration.ts:30) - should be configurable
  • Some error messages are generic and don't provide actionable guidance
  • Conversation history grows unbounded - consider limiting to last N messages

🛡️ Security Assessment

Safe Practices

✅ Uses IPC for all file operations (not direct fs access)
✅ No eval() or Function() constructors
✅ Proper input sanitization in template variable substitution
✅ No SQL injection risks (uses electron-store)

Potential Risks

⚠️ User-provided content is inserted into AI prompts without escaping - could lead to prompt injection (inherent to the feature)
⚠️ Generated file paths include user input - validate against path traversal (see issue #5)
⚠️ Process spawning arguments construction could be more robust (though IPC usage is safe)

🧪 Testing Recommendations

Consider adding edge case tests:

  • Auto Run folder deleted during wizard
  • Disk full during document generation
  • Agent process crashes mid-generation
  • Large conversation histories (50+ messages)
  • Rapid tab switching during wizard
  • Multiple concurrent wizards (stress test)
  • SSH remote session wizard flow

@claude
Copy link

claude bot commented Jan 4, 2026

✅ Final Verdict

This is a high-quality PR that demonstrates strong engineering practices. The implementation is well-architected, properly tested, and aligns with project standards.

Must Fix Before Merge

  1. ❗ Memory leak in conversationSessionsMap cleanup
  2. ❗ File watcher cleanup guarantee in error cases

Should Fix Before Merge

  1. ⚠️ Race condition protection in sendMessage
  2. ⚠️ Regex exec infinite loop potential
  3. ⚠️ Path traversal validation for generated file paths

Nice to Have (Can Address in Follow-up)

  1. 💡 Configurable generation timeout
  2. 💡 Better error messages with recovery actions
  3. 💡 Performance optimization for large Auto Run folders
  4. 💡 Additional edge case test coverage
  5. 💡 "Restart Wizard" button in error states

🎯 Recommendation

APPROVE with minor fixes. The core implementation is solid, but the memory leak and cleanup issues should be addressed before merging. The race condition and path validation are important but lower priority.

Excellent work overall! This feature adds significant value to the Auto Run workflow. 🚀

Additional Notes

  • The code follows functional programming patterns well (pure functions, immutability)
  • The wizard state machine is implicit but clear from the code flow
  • Integration with existing onboarding wizard shows good code reuse
  • The two-mode approach (new vs iterate) is intuitive
  • Confidence gauge at 80% threshold is well-calibrated

Review Confidence: 95% (static analysis only - runtime behavior not tested)

If you'd like me to dive deeper into any specific area or help implement the recommended fixes, let me know!

@claude
Copy link

claude bot commented Jan 4, 2026

Code Review: Inline Wizard Implementation

This is an impressive feature implementation with solid architecture and comprehensive testing. However, there are several critical issues that must be addressed before merging.


🚨 Critical Issues (Must Fix)

1. Path Traversal Vulnerability - SECURITY CRITICAL

Location: Document generation and file saving logic

Issue: The filename sanitization happens after parsing but there's no validation that the final resolved path stays within the Auto Run folder. An AI-generated filename could potentially write files outside the intended directory.

Risk: HIGH - Could allow writing files anywhere on the filesystem

Fix Required:

// In saveDocument function, add path validation:
const fullPath = path.resolve(autoRunFolderPath, filename);
const resolvedFolder = path.resolve(autoRunFolderPath);
if (!fullPath.startsWith(resolvedFolder + path.sep)) {
  throw new Error('Invalid filename: path traversal detected');
}

2. Race Condition in Message Sending

Location: src/renderer/hooks/useInlineWizard.ts:626-639

Issue: The guard against concurrent calls checks isWaiting and then sets it in separate operations. Two rapid calls could both pass the guard before either sets the flag.

Impact: Duplicate AI requests, wasted API calls, state corruption

Fix Required: Use atomic state updates that check and set in a single operation.

3. Resource Leak in Timeout Handlers

Location: src/renderer/services/inlineWizardConversation.ts:516-528

Issue: The timeout is never cleared when the agent completes successfully - clearTimeout only happens in the exit handler.

Impact: Memory leak - timeout callbacks accumulate for every message sent

Fix Required: Add clearTimeout(timeoutId) in the data listener callback when response completes.


⚠️ High Priority Issues (Should Fix)

4. File Watcher Memory Leak

Location: src/renderer/services/inlineWizardDocumentGeneration.ts:745-810

The watcher is started asynchronously but cleanup is registered before the watcher promise resolves. If the process exits early, the watcher may not be cleaned up properly.

Impact: File watchers remain active after wizard ends, consuming resources

5. Stale Closure in Timeout Reset

Location: src/renderer/services/inlineWizardDocumentGeneration.ts:696-712

resetTimeout is defined inside the promise closure, capturing initial values of outputBuffer and cleanupAll. These may be stale when timeout fires.

6. Unbounded Tab State Storage

Location: src/renderer/hooks/useInlineWizard.ts:330-331

Per-tab wizard states are stored in a Map but there's no automatic cleanup when tabs are closed. This creates a gradual memory leak in long-running sessions.

Fix: Add cleanup when tabs are closed to remove their wizard state from the Map.


💡 Code Quality Issues

7. Magic Numbers

Multiple hardcoded values like 300000 (5 minutes) should be extracted to named constants:

const WIZARD_MESSAGE_TIMEOUT_MS = 5 * 60 * 1000; // 5 minutes
const GENERATION_TIMEOUT_MS = 20 * 60 * 1000; // 20 minutes

8. Inconsistent Error Handling

Mixed patterns across the codebase:

  • Some functions return { success: boolean, error?: string }
  • Some throw exceptions
  • Some use callbacks with onError

Recommendation: Standardize on one pattern (Result type for async operations).

9. Missing Input Validation

startWizard accepts many optional parameters but doesn't validate them:

  • projectPath could be empty string
  • sessionName could contain invalid characters
  • agentType could be invalid

10. Unbounded Conversation History

Conversation history grows unbounded. A very long conversation could accumulate hundreds of messages and impact performance.

Recommendation: Add pagination or limit (e.g., keep last 100 messages).


🔒 Security Observations

✅ Good: Restricted Tool Access During Conversation

// Restrict to read-only tools during wizard conversation
args.push('--allowedTools', 'Read', 'Glob', 'Grep', 'LS');

Excellent security practice!

⚠️ Concern: Write Access During Generation

Relying on prompt instructions to prevent writes outside Auto Run folder is insufficient. The agent could ignore instructions or be prompt-injected.

Recommendation: Add filesystem sandbox/jail to enforce write restrictions at the OS level, not just prompt level. At minimum, add the path validation mentioned in Critical Issue #1.


🧪 Test Coverage

Excellent Coverage ✅

  • 27 integration test cases
  • Good edge case coverage (confidence thresholds, error states, retry logic)
  • Comprehensive unit tests for key services

Missing Coverage

  • ❌ Path sanitization edge cases (security-critical!)
  • ❌ Memory cleanup (file watchers, timeouts)
  • ❌ Concurrent wizard sessions on different tabs
  • ❌ Very long conversations (performance testing)

Recommendation: Add security-focused tests for file operations and memory leak tests.


⚡ Performance Concerns

11. Inefficient Retry Logic

If many files are created rapidly during generation, each spawns 3 retry attempts with fixed 200ms delays. This could accumulate hundreds of pending promises.

Fix: Add exponential backoff:

const delayMs = 200 * Math.pow(2, attempt - 1); // 200ms, 400ms, 800ms

12. Context Re-render Optimization

The InlineWizardContext includes every field as a dependency, causing re-renders even when only internal state changes.

Recommendation: Split context into separate providers for read-heavy vs write-heavy state.


✨ Positive Observations

  1. Excellent Documentation - Comprehensive JSDoc comments throughout
  2. Strong Type Safety - Good TypeScript usage with proper interfaces
  3. Clean Architecture - Clear separation between conversation, generation, and UI layers
  4. Great UX - Good error messages, loading states, progress tracking
  5. Accessibility - Screen reader announcements, keyboard navigation
  6. Comprehensive Testing - Integration and unit tests covering major flows

📋 Recommendations

Must Fix Before Merge

Should Fix Before Merge

Nice to Have

  • Standardize error handling patterns
  • Add input validation
  • Optimize context memoization
  • Add conversation history pagination
  • Add filesystem sandbox for Write tool

Summary

This is a well-architected feature with thoughtful design, excellent test coverage, and great UX. The code demonstrates strong TypeScript skills and follows React best practices.

However, there are critical security and reliability issues that must be addressed:

  • 🔴 Path traversal vulnerability in document generation
  • 🔴 Race condition in message sending
  • 🔴 Resource leaks in timeout handlers and file watchers

Overall Assessment: 7.5/10 - Solid implementation that needs critical fixes before production.

I recommend addressing the "Must Fix" items before merging, as they pose real security and stability risks. The "Should Fix" items would improve reliability and should be prioritized for a follow-up PR if not addressed now.

Great work overall! The inline wizard is a powerful feature that will significantly improve the user experience. 🎉

@claude
Copy link

claude bot commented Jan 4, 2026

Comprehensive Code Review: PR #112 - Inline /wizard Command

Executive Summary

This PR implements a well-architected inline wizard feature. The implementation follows solid software engineering principles with proper separation of concerns, comprehensive error handling, and security measures. However, there are 3 critical issues that should be addressed before merging.


Critical Issues (Must Fix)

1. Memory Leak Risk in useInlineWizard.ts (CRITICAL)

Location: src/renderer/hooks/useInlineWizard.ts:342-345

The conversationSessionsMap and previousUIStateRefsMap are never cleaned up when tabs are closed outside of the wizard flow. If a user closes a tab while the wizard is active, orphaned processes will remain.

Fix: Add cleanup on unmount to prevent memory leaks.

2. Race Condition in Document Generation (HIGH)

Location: src/renderer/services/inlineWizardDocumentGeneration.ts:771-776

The file watcher can trigger multiple events for the same file. The current array lookup has a race condition if two events fire simultaneously.

Fix: Use a Set for O(1) lookup and atomic check-and-add.

3. Timeout Cleanup Issue (HIGH)

Location: src/renderer/services/inlineWizardConversation.ts:515-528

If timeout fires while exit listener fires, both will try to resolve the promise, causing unhandled rejection.

Fix: Implement safe resolve pattern to prevent double resolution.


Security Analysis ✅

Path Traversal Prevention: Excellent! The sanitizeFilename() and sanitizeFolderName() functions properly prevent path traversal attacks, null bytes, and provide safe fallbacks.

Agent Tool Restriction: Correct! The wizard restricts agents to read-only tools during discovery phase.


Performance Issues

Inefficient State Updates (MEDIUM): Every setTabState() call copies the entire Map. With frequent updates (streaming text, confidence changes), this becomes expensive.

Recommendation: Use Immer for immutable updates or implement batched updates.


Positive Highlights ✅

  1. Per-Tab Wizard Isolation - Multi-tab architecture is well-designed
  2. Streaming Document Preview - Real-time file watching provides excellent UX
  3. Confidence Gauge Pattern - Smart UX with clear threshold
  4. Structured Response Parsing - Robust JSON parsing with fallbacks
  5. Security-First Design - Strong security awareness throughout
  6. Comprehensive Test Coverage - 3000+ lines of test code
  7. Excellent Separation of Concerns - Clear boundaries between layers

Architecture ✅

The code is well-organized:

  • useInlineWizard.ts → State management (React)
  • InlineWizardContext.tsx → Cross-component state
  • inlineWizardConversation.ts → AI conversation logic
  • inlineWizardDocumentGeneration.ts → Document generation
  • wizardIntentParser.ts → Intent detection (pure functions)

Strong TypeScript usage with comprehensive interfaces and proper cleanup implementations.


Summary

Must Fix Before Merge

  1. ❌ Fix memory leak (add unmount cleanup)
  2. ❌ Fix race condition (use Set for processed files)
  3. ❌ Fix timeout cleanup (safe resolve pattern)

Should Fix (Medium Priority)

  1. Add state growth protection (max 10 wizards)
  2. Optimize Map copying performance
  3. Standardize error handling

Nice to Have (Low Priority)

  1. Replace deprecated substr() with substring()
  2. Extract magic numbers to constants
  3. Use logger instead of console.log

Final Verdict: APPROVE WITH CHANGES ✅

This is a solid implementation with excellent architecture and security practices. The critical issues are fixable with localized changes (est. 4-6 hours). The codebase demonstrates strong TypeScript skills, proper async handling, and commitment to testing.

With the critical fixes applied, this PR will be production-ready.

@claude
Copy link

claude bot commented Jan 4, 2026

Code Review: Inline /wizard Command (PR #112)

This is an excellent, well-architected PR that adds a powerful inline wizard feature. The implementation quality is high with comprehensive test coverage. Here's my detailed review:


✅ Strengths

1. Excellent Architecture & Design

  • Per-tab wizard isolation: The state management correctly handles multiple concurrent wizards on different tabs using a Map<tabId, WizardState> pattern. This is sophisticated and well-implemented.
  • Clean separation of concerns: Services (wizardIntentParser, inlineWizardConversation, inlineWizardDocumentGeneration) are properly separated from UI and state management.
  • Reusable patterns: Leverages existing wizard infrastructure (prompts, confidence scoring, document generation) while adapting it appropriately for inline use.
  • Context provider pattern: InlineWizardContext provides clean access to wizard state without prop drilling.

2. Comprehensive Test Coverage

  • 1,114 lines of hook tests in useInlineWizard.test.ts
  • 983 lines of integration tests in InlineWizardFlow.test.tsx
  • Tests cover intent parsing, mode detection, error handling, retry logic, and per-tab isolation
  • Good use of mocking and async testing patterns

3. Well-Documented Code

  • Excellent JSDoc comments throughout (see useInlineWizard.ts lines 1-327)
  • Clear interface definitions with inline comments
  • The CONSTITUTION.md addition is a nice philosophical grounding document

4. Security Considerations

  • ✅ No eval() or exec() usage found in new code
  • ✅ No dangerouslySetInnerHTML in new wizard components
  • ✅ Prompts include clear file access restrictions (lines 8-24 in wizard-inline-new.md)
  • Input validation through intent parser with keyword-based detection

5. User Experience

  • Natural language intent parsing (/wizard add auth vs /wizard start fresh)
  • Confidence gauge with visual feedback (0-100%)
  • Streaming document generation with live preview
  • Previous UI state restoration when wizard ends
  • Error handling with retry functionality

⚠️ Issues & Recommendations

High Priority

1. Race Condition Prevention (Medium Risk)

File: src/renderer/hooks/useInlineWizard.ts:636-639

The guard checking isWaiting has a potential TOCTOU race. If sendMessage is called twice in rapid succession, both calls might pass the guard before either sets isWaiting=true.

Recommendation: Use a more robust locking mechanism with a Set to track in-flight requests per tab.

2. Conversation Session Cleanup on Errors

File: src/renderer/hooks/useInlineWizard.ts:753-765

The error handling in sendMessage doesn't clean up the conversation session if it was auto-created (lines 661-680). If auto-creation succeeds but the message send fails, the session lingers in conversationSessionsMap.

Recommendation: Add cleanup in the catch block for auto-created sessions.

3. Memory Leak: Document Content Loading

File: src/renderer/hooks/useInlineWizard.ts:393-427

The loadDocumentContents function reads full document contents into memory without size limits. Large markdown files could cause performance issues.

Recommendation: Add size limits (e.g., 500KB) with truncation warnings.

Medium Priority

4. Type Safety: agentSessionId Handling

Line 728 uses || which treats empty string as falsy. Use nullish coalescing (??) if empty strings are valid.

5. Error Message Consistency

Error messages mix different formats. Define error constants for consistency and i18n readiness.

6. getEffectiveTabId Logic

File: src/renderer/hooks/useInlineWizard.ts:805-811

The condition tabId \!== currentTabId is always false (dead code) since tabId is derived from currentTabId.

Recommendation: Simplify or clarify intent.

Low Priority

  1. Console.log Statements: Use logger service consistently instead of console.log
  2. Magic Numbers: Document threshold values with rationale
  3. Accessibility: Add screen reader announcements for state changes (confidence updates, mode changes)

📋 Code Quality Observations

Positive Patterns

  1. ✅ Proper use of useCallback and useMemo for performance
  2. ✅ Ref usage for avoiding stale closures (tabStatesRef, conversationSessionsMap)
  3. ✅ Error boundaries with fallback states
  4. ✅ Progressive disclosure (ask → new/iterate → conversation → generation)
  5. ✅ Template variable substitution for prompts

Adherence to Project Conventions

✅ Follows CLAUDE.md standardized vernacular
✅ Keyboard-first design (aligns with Constitution)
✅ Layer stack integration for modals
✅ Theme color usage (inline styles, not Tailwind classes)
⚠️ Could improve Auto Run autonomy focus (Constitution Tenet #1) - consider automatic error recovery


🧪 Test Coverage Assessment

Coverage appears excellent:

  • Unit tests for intent parser (237 lines)
  • Unit tests for conversation manager (182 lines)
  • Unit tests for document generation (407 lines)
  • Integration tests for full flow (983 lines)
  • Hook tests including edge cases (1,114 lines)

Recommended additional tests:

  1. Concurrent wizard sessions on multiple tabs (race conditions)
  2. Very large document handling (performance)
  3. Network failure during document generation (resilience)
  4. Agent session ID continuity across messages

🔒 Security Assessment

Overall: GOOD

✅ No obvious injection vulnerabilities
✅ File access properly scoped to Auto Run folder (prompt restrictions)
✅ No unsafe DOM manipulation
✅ Input sanitization through structured intent parsing
⚠️ Template variable substitution uses basic .replace() - ensure no XSS if user input reaches prompts

Recommendation: Audit template variable paths to ensure user input is never directly interpolated without sanitization.


🎯 Performance Considerations

  1. State updates: Frequent setTabState calls could cause re-renders. The memo wrapper helps, but monitor in production.
  2. Document loading: Addressed in recommendation Raychaser/claude path 01 #3 (size limits)
  3. Streaming content: Accumulating in state is fine for typical documents but could grow large. Consider chunking for display.

Final Verdict

APPROVE with minor revisions recommended

This is high-quality work that significantly extends Maestro's capabilities. The architecture is sound, test coverage is comprehensive, and the feature aligns well with the project's constitution.

Must-fix before merge:

  1. Race condition prevention (recommendation Internal Logging #1)
  2. Conversation session cleanup on errors (recommendation Add Claude Code GitHub Workflow #2)

Should-fix before merge:
3. Document size limits (recommendation #3)
4. Remove dead code in getEffectiveTabId (recommendation #6)

Nice-to-have:
5-9. Other recommendations enhance code quality but aren't blockers


Estimated Risk: Low-Medium
Estimated Merge Readiness: 90% (after addressing recommendations #1-2)
Test Coverage: Excellent (✅)
Security Posture: Good (✅)

Great work! This feature will significantly improve the Auto Run workflow. 🎉

Created src/renderer/hooks/useInlineWizard.ts with:
- InlineWizardMode type ('new' | 'iterate' | 'ask' | null)
- InlineWizardMessage interface for wizard conversation
- PreviousUIState interface for UI restoration
- InlineGeneratedDocument interface for generated docs
- Full state management with startWizard, endWizard, sendMessage, setConfidence
- Additional helpers: setMode, setGoal, setGeneratingDocs, addAssistantMessage, etc.

Exported from hooks/batch/index.ts with all types.
Create wizardIntentParser.ts service that parses natural language input
to determine wizard mode (new, iterate, or ask). Uses keyword matching
to detect user intent:
- 'new', 'fresh', 'start', 'create', etc. → new mode
- 'add', 'update', 'modify', 'extend', etc. → iterate mode with goal
- ambiguous input → ask mode for clarification

Includes suggestsIterateIntent() and suggestsNewIntent() helpers for
pattern-based detection. Exports types and functions via services/index.ts.

68 tests added covering keyword detection, goal extraction, and edge cases.
- Add /wizard to slashCommands.ts with description for autocomplete
- Add onWizardCommand callback to useInputProcessing deps interface
- Handle /wizard command in useInputProcessing (extracts arg text after command)
- Create handleWizardCommand in App.tsx that captures UI state and calls startInlineWizard
- Use strict matching to prevent /wizardry from triggering handler
- Add 7 unit tests covering command interception, args passing, terminal bypass
Created InlineWizardContext.tsx that wraps useInlineWizard hook and
exposes all state and actions via React context. This enables components
throughout the app to access inline wizard state without prop drilling.

Changes:
- Created InlineWizardContext.tsx with InlineWizardProvider and
  useInlineWizardContext hook
- Added InlineWizardProvider to App.tsx as Phase 7 in provider chain
- Re-exported types for convenience
- Added 31 unit tests covering all functionality
Added per-session wizard state types to support the inline /wizard command:
- WizardMode type ('new' | 'iterate' | null)
- WizardMessage interface for conversation history
- WizardPreviousUIState interface for UI state restoration
- SessionWizardState interface combining all wizard state
- Optional wizardState field on Session interface
- Create InlineWizardFlow.test.tsx with 53 tests covering the complete
  inline wizard end-to-end flow
- Test coverage includes:
  - Intent parsing (/wizard command modes)
  - useInlineWizard hook lifecycle
  - Confidence updates and "Let's Go" button rendering
  - Error handling and retry flow
  - Document generation progress tracking
  - UI state restoration on wizard end
  - Streaming and loading states
  - Context provider behavior
- Update vitest.integration.config.ts to support React component tests
  in integration folder with jsdom environment and setup files
- Fix pre-existing test failures:
  - InlineWizardContext.test.tsx: Add missing state fields
    (streamingContent, generationProgress, lastUserMessageContent)
  - system.test.ts: Add new logger IPC handlers
    (getLogFilePath, isFileLoggingEnabled, enableFileLogging)
- Introduced explicit read/write access rules: Auto Run writes only! 🔒
- Updated all wizard prompt templates with clearer safety constraints 🧭
- Refined tests to validate new “WRITE Limited / READ Unrestricted” wording 🧪
- Synced inline wizard context into `session.wizardState` for stable UI 🧷
- Scoped inline wizard to the tab it started on for isolation 🧩
- Auto-sent an initial wizard greeting to kickstart discovery instantly 🚀
- Added “Exit wizard” support (Escape/pill) via new `onExitWizard` callback 🚪
- Fixed wizard loading indicator to use `wizardState.isWaiting` accurately ⏳
- Restricted wizard conversation tools to read-only operations for safety 📚
- Deduplicated parsing by reusing shared structured-output + confidence utilities 🧰
- Documented multi-pass context compaction with chunking, consolidation, progress indicators 📚
- Compaction now triggers via context percent *or* ~10k-token log estimation 🔍
- Added consolidation passes to keep compacted summaries under ~40k tokens 🧠
- Introduced comprehensive ContextSummarizationService test suite for compaction flows 🧪
- Remote SSH session git info now fetches in background, speeding app startup 🚀
- UI now flags SSH connection failures with clear session tooltip badges 🛑
- Wizard confidence gauge renamed and now displays a dimmed label for clarity 🎛️
- Wizard conversation empty state upgraded with onboarding, benefits, and exit hint 🧙‍♂️
- Wizard input auto-focuses when activated for faster first interaction ⚡
- Wizard document generation saves into sanitized project subfolders and auto-creates playbooks 🗂️
- Added Claude “prompt too long” detection as recoverable token exhaustion 🧩
- Rebranded wizard output from “action plans” to “Playbooks” everywhere 📘
- AutoRun Edit/Preview controls are now icon-only with clearer titles 🎛️
- Introduced Playbook Exchange button styling and richer tooltip copy 🧺
- Wizard now routes chat input directly into its own conversation stream 💬
- Wizard completion now appends conversation logs and a next-steps summary 🧾
- Wizard captures Claude `session_id` and switches tab for continuity 🔗
- Wizard mode becomes tab-scoped, preventing cross-tab activation leakage 🧭
- Confidence gauge now withholds green until readiness threshold (80+) 🎨
- Document generation shows centered Austin Facts during initial loading 🧠
- Claude agents now run read-only tools for safer Playbook generation 🔒
- Wizard pill shows “Thinking…” spinner and pauses pulse while waiting ⏳
- Claude parser now extracts embedded JSON errors from noisy stderr output 🔎
- Token-exhaustion messages show real token counts via dynamic regex capture 📊
- Context usage calculations now include cache-read tokens for accuracy 🧮
- Added IPC + preload API to cancel all active context grooming sessions 🛑
- Summarization cancel now actually kills grooming processes, safely ignores failures 🧯
- Grooming formatter strips full file dumps into file-path references, preserving snippets ✂️
- Grooming formatter strips images by token budget, removing oldest first 🖼️
- Inline Wizard is now truly per-tab, with independent state and accessors 🧩
- Ask-mode wizard auto-creates a session when user sends first message 🚀
- Wizard document generation creates unique project subfolders to avoid collisions 🗂️
- Generated subfolder name is stored and used to rename tabs “Project: …” 🏷️
- Wizard completion now resets hook state and clears input to prevent resync 🧹
- Moved wizard state storage from session-level to per-tab state ✨
- Updated InputArea to drive Wizard UI from active tab 🔮
- Fixed MainPanel wizard rendering to respect active tab wizardState 🧭
- Document generation view now reads progress and streams per-tab 📄
- Wizard completion now converts history using active tab context 🔁
- Wizard completion clears wizardState only on the affected tab 🧹
- Improved tab update flow using captured activeTabId for safety 🛡️
- Refactored tests to attach wizardState on aiTabs correctly 🧪
- Added helper utilities in tests for cleaner wizard session setup 🧰
- Updated wizard mode icon logic to follow per-tab activity 🎛️
- Added `/wizard` inline wizard docs for in-tab Auto Run creation flow 📘
- Inline Wizard now supports streaming “Thinking” with a handy toggle 🧠
- Wizard conversation typing animation upgraded to smooth RAF typewriter ⚡
- Document generation now writes files live via `Write` tool 🚀
- Real-time doc streaming added by watching Auto Run subfolder changes 👀
- Generation UX improved with centered loader, progress text, and cancel button 🛑
- Wizard tabs rename instantly to “Wizard”, then “Wizard: {Project}” 🏷️
- Process Monitor now groups wizard processes with WIZARD/GENERATING badges 🧙
- Context transfer now auto-sends on tab activate for instant injection 🔄
- New session lifecycle stats IPC APIs wired into `maestro.stats` 📊
- Toggle AI thinking instantly with ⌘⇧K, plus visible shortcut hint 🧠
- Thinking stream now uses dedicated callback, not fragile delta parsing 🔍
- Wizard Phase Review supports ⌘E/Ctrl+E edit–preview toggling for docs 📝
- Wizard modal stops ⌘E bubbling into app-level Auto Run shortcuts 🛑
- Closing wizard tabs now asks confirmation—progress can’t be restored ⚠️
- Wizard tabs skip closed-tab history, preventing accidental “reopen” attempts 🗂️
- New `hasActiveWizard()` helper cleanly detects active wizard tabs 🧩
- Auto Run write-mode inputs always queue, avoiding file-conflict races ⏳
- Wizard document generation now writes into `Initiation/` subfolder structure 📁
- DocumentEditor shares unified markdown rendering with slugs and link handling 🔗
- Expanded wizard write access to entire Auto Run folder structure 📂
- Simplified phase file paths by dropping the extra Initiation directory 🧭
- Moved Working scratch space directly under Auto Run for cleaner organization 🧰
- Added global Cmd+E capture handler so edit/preview toggles reliably anywhere ⌘
- Tightened Phase Review keyboard navigation by separating shortcut concerns ⌨️
- Refined DocumentEditor preview styling to avoid prose class side effects 🎨
- Wrapped Markdown rendering in dedicated prose container for consistent layout 🧱
- Exposed active Auto Run batch state to keyboard handler context for smarter gating 🧩
- Prevented Cmd+E markdown toggles while Auto Run is running and locked 🔒
- Blocks URL-encoded “..” filenames to stop sneaky path traversal attacks 🔒
- Prevents duplicate wizard sends while waiting, eliminating race conditions ⚡
- Kills timed-out agent processes automatically to avoid resource leaks 🧹
Two critical bug fixes:

1. Auto Run progress bar showing "0 of N tasks completed" even after
   all tasks finished:
   - Root cause: batchRunStatesRef was only updated on React re-render,
     but debounce callbacks read this ref to compare state changes
   - When dispatches happened faster than re-renders, the ref contained
     stale state, causing progress updates to be dropped
   - Fix: Wrap dispatch to synchronously update batchRunStatesRef
     immediately after each action, ensuring debounced callbacks
     always see current state

2. TourOverlay causing "Maximum update depth exceeded" React warning:
   - Root cause: Unstable callback references (onClose, onTourComplete,
     etc.) passed to useTour caused internalSkipTour to change every
     render, triggering registerLayer effect repeatedly
   - Fix: Use refs for callback props and create stable useCallback
     handlers that reference the refs

Added comprehensive test coverage:
- New batchReducer.test.ts with tests for all reducer actions
- State synchronization tests in useBatchProcessor.test.ts

Claude ID: 1606764d-c4e2-4f1c-a990-84d13438c263
Maestro ID: 4ebabf28-d602-42bf-bb1e-aeb10d2157ab
- Wizard now saves generated docs under `Auto Run Docs/Initiation/` for clarity 📁
- Onboarding copy refresh: “Auto Run Playbook” terminology replaces “document” consistently 📝
- Claude parser now extracts and streams extended “thinking” blocks reliably 🧠
- Added full test coverage for thinking, redaction, and tool-use parsing paths ✅
- Wizard modal gains a real focus trap for rock-solid keyboard navigation ⌨️
- Wizard buttons now show accessible focus rings matching theme colors 🎯
- Phase Review: Escape closes document dropdown before modal closes 🛡️
- Phase Review: cycle multiple generated documents with ⌘/Ctrl+Shift+[ ] shortcuts 🔁
- Directory step: Enter on Browse triggers browse, not accidental “Continue” 🗂️
- Auto Run prompts upgraded with token-efficient task grouping rules and examples ⚡
- Auto-run stop tooltips now say “stop auto-run” consistently everywhere 🛑
- AutoRun pill shows “AutoRun Stopping…” with warning color feedback ⏳
- Cmd/Ctrl+Shift+K thinking toggle now works globally in wizard modal ⌨️
- ConversationScreen thinking toggle state is lifted to MaestroWizard 📤
- Thinking toggle tests updated to click button, not keyboard shortcut 🖱️
- BatchRunner modal rebranded as “Auto Run Configuration” for clarity 🏷️
- Auto-run prompt placeholder clarifies it’s a system prompt 🧠
- “Run batch processing” copy replaced with clearer “Start auto-run” 🚀
- Batch processor now bypasses debouncing for direct progress updates ⚡
- Stop auto-run requests always honored; extra logging aids debugging 🔍
- Launch the in-tab Wizard straight from Auto Run with one click! 🪄
- Wizard “show thinking” now displays live tool executions, not just text! 🧰
- Prevent confusing JSON flashes in wizard thinking stream for cleaner UX! 🧼
- Auto-complete wizard after document generation with a helpful summary log! ✅
- Redesigned document generation view: progress, elapsed time, and drafted files list! 📄
- Auto Run batch processing now fully supports SSH remotes end-to-end! 🔐
- Tab hover menu upgrades: instantly move tabs to first/last position! 🧭
- Keyboard safety: disable Cmd+J mode toggling while an active wizard tab runs! ⌨️
- Session persistence now drops incomplete wizard tabs to avoid broken restores! 💾
- Improved debugging for thinking-chunk streaming with richer ProcessManager logs! 🔎
- SSH command builder is now async and resolves real ssh path 🚀
- Packaged Electron SSH launches now use expanded PATH environment 🧭
- Remote git and shell execution now await SSH command construction 🧰
- IPC process handler updated to await SSH wrapper command generation ⚙️
- Integration SSH test runner updated for async SSH command building 🌐
- Added SSH CLI detection with caching, sync getter, and reset hook 🔎
- Test suite now mocks SSH path resolution for deterministic runs 🧪
- Process IPC tests include robust async ssh-command-builder mock 🎭
- SSH builder logging now reports resolved binary path and full command 📝
- Prompts now enforce structured Markdown artifacts with front matter links 📚
- Stabilized ProcessMonitor duration tests using fixed system time snapshots ⏱️
- Wrapped React renders in `act()` to eliminate async update race conditions ⚛️
- Tightened assertions to exact `2m 5s` formatting for reliable checks ✅
- Improved coverage for hours/minutes and days/hours time displays 🗓️
@claude
Copy link

claude bot commented Jan 5, 2026

Code Review for PR #112: Add inline /wizard command

This is a substantial and well-architected feature that brings in-session Auto Run document creation to Maestro. Excellent work!

✅ Strengths

Architecture & Design

  • Excellent separation of concerns across hooks, contexts, services, and UI components
  • Per-tab wizard isolation using Map pattern prevents state collisions
  • Smart intent parsing with keyword matching and pattern-based detection
  • File access restrictions properly enforced via prompts

Code Quality

  • Comprehensive test coverage (1114 lines for useInlineWizard, 983 lines integration tests)
  • Proper error handling with retry mechanisms
  • Memory-safe cleanup in endWizard and reset functions
  • Strong TypeScript typing throughout

Security

  • No command injection risks - uses safe IPC calls
  • Write access properly scoped to Auto Run folders
  • Input validation and normalization

⚠️ Issues & Concerns

1. Race Condition in sendMessage (Medium Priority)
Location: src/renderer/hooks/useInlineWizard.ts:636-639
The isWaiting check uses tabStatesRef which updates asynchronously. Use a synchronous ref-based lock instead.

2. Unbounded Document Loading (Low-Medium Priority)
Location: src/renderer/hooks/useInlineWizard.ts:393-427
loadDocumentContents reads all documents without size limits. Add 1MB per-doc and 10MB total limits.

3. Missing Cleanup on Unmount (Medium Priority)
Location: src/renderer/contexts/InlineWizardContext.tsx:73-123
InlineWizardProvider needs cleanup effect calling wizardState.reset() on unmount.

4. Tautological Condition (Low Priority)
Location: src/renderer/hooks/useInlineWizard.ts:805-811
The condition tabId !== currentTabId is always false since tabId derives from currentTabId.

5. Streaming Content Growth (Low Priority)
Location: src/renderer/hooks/useInlineWizard.ts:1108-1113
streamingContent accumulates indefinitely. Add 500KB limit with truncation.

6. Error Propagation (Low Priority)
Location: src/renderer/hooks/useInlineWizard.ts:1152-1156
generateDocuments should throw errors for caller handling.

🎯 Performance

  • Large dependency arrays (30+) in InlineWizardContext could cause re-renders
  • Multiple sequential state updates cause multiple renders - consider batching

🧪 Test Coverage Suggestions

  • Concurrent wizards on multiple tabs
  • Large document handling (50+ docs, 10MB+ files)
  • Tab closure cleanup

✨ Final Verdict

APPROVE with minor recommendations

The identified issues are minor and can be addressed in follow-up PRs. This is production-ready code with excellent architecture, testing, and security practices. 🎉

- Select unavailable Claude Code to set a custom binary path 🔧
- Enable “Create Agent” once a custom path makes agent usable ✅
- Treat custom-path agents as available during form validation 🧠
- Always select supported agents on click, even if missing 📌
- Keep expand-and-configure flow working for unavailable agents 🧩
- Transfer pending SSH remote config when switching selected agent 🔁
- Add coverage ensuring immediate selection of unavailable Claude Code 🧪
- Add coverage verifying custom path is passed through onCreate 📤
@claude
Copy link

claude bot commented Jan 5, 2026

Code Review: Inline Wizard Implementation

Thanks for this substantial feature addition! I've completed a comprehensive review of the inline wizard implementation. Overall, the code quality is high with excellent architecture and separation of concerns. Below are my findings:


Strengths

  1. Clean Architecture

    • Excellent separation of concerns: state management (Context), business logic (services), and UI (components)
    • Per-tab state isolation using Map<string, InlineWizardState> prevents cross-tab interference
    • Stateless service functions align well with the per-tab model
  2. Robust Error Handling

    • Comprehensive timeout handling (5 min conversation, 20 min generation)
    • Retry functionality with proper state management
    • Process cleanup prevents orphaned processes
    • User-friendly error messages with technical details available
  3. Race Condition Prevention

    • Guard in sendMessage prevents duplicate sends while isWaiting is true
    • tabStatesRef pattern avoids stale closure issues
    • Proper event listener cleanup in conversation manager
  4. Strong Testing

    • Comprehensive unit tests for intent parser covering edge cases
    • Integration tests for the full wizard flow
    • Good coverage of state management logic
  5. Accessibility & UX

    • Screen reader support
    • Keyboard navigation
    • Visual confidence gauge with clear progress indicators

⚠️ Issues & Recommendations

1. Memory Leak Risk: Tab Close Cleanup (High Priority)

Issue: When a tab with an active wizard is closed, the wizard state isn't explicitly cleaned up from the Map<string, InlineWizardState>.

Location: useInlineWizard.ts manages state per-tab, but I don't see integration with tab close events in App.tsx or TabBar.tsx.

Recommendation:

// In App.tsx or TabBar.tsx handleCloseTab
const handleCloseTab = (tabId: string) => {
  const tab = session.aiTabs.find(t => t.id === tabId);
  
  // Clean up wizard state if this tab has an active wizard
  if (tab?.wizardState?.isActive) {
    // Either call endInlineWizard() or add a new cleanup method
    cleanupWizardForTab(tabId);
  }
  
  // ... existing close logic
};

2. Silent Mode Switching (Medium Priority)

Issue: Lines 661-679 in useInlineWizard.ts auto-switch from 'ask' mode to 'new' mode when user sends a message without selecting a mode:

if (currentState?.mode === 'ask' && \!conversationSessionsMap.current.has(tabId)) {
  session = startInlineWizardConversation({ mode: 'new', ... });
  setTabState(tabId, (prev) => ({ ...prev, mode: 'new' }));
}

Recommendation: Either:

  • Show a toast notification when auto-switching modes
  • Prevent message sending until user explicitly selects a mode in the modal
  • Document this behavior in user-facing help text

3. Streaming Content Accumulation (Medium Priority)

Issue: Line 1109 in useInlineWizard.ts continuously accumulates streaming content:

streamingContent: prev.streamingContent + displayText,

For very long AI responses, this could cause memory/performance issues.

Recommendation:

  • Add a max length limit (e.g., 100KB) and truncate older content
  • Clear streamingContent when switching documents/phases
  • Or use a circular buffer for large streams

4. Write Permission Enforcement (Medium Priority)

Question: During the conversation phase, agents are restricted to read-only via CLI args (inlineWizardConversation.ts:449-453). But during document generation, are write permissions properly scoped to only the Auto Run folder?

Recommendation: Verify that the document generation service in inlineWizardDocumentGeneration.ts properly enforces the Auto Run folder restriction, or add explicit validation before saving files.


5. Constant Deduplication (Low Priority)

Issue: READY_CONFIDENCE_THRESHOLD = 80 is defined in multiple places:

  • inlineWizardConversation.ts:148
  • WizardConversationView.tsx:22
  • Prompts reference {{READY_CONFIDENCE_THRESHOLD}}

Recommendation: Export from a single constants file to prevent drift.


6. Error Message Parsing Fragility (Low Priority)

Issue: WizardConversationView.tsx:370-426 uses string matching for error categorization:

if (lowerError.includes('timeout') || lowerError.includes('timed out')) { ... }

Recommendation: Consider structured error types/codes from the service layer for more robust error handling.


📋 Testing Recommendations

While unit tests are comprehensive, consider adding:

  1. E2E Tests for the complete user flow:

    • /wizard → mode selection → conversation → generation → save
    • Iterate mode document merging
    • Exit/cancel scenarios
  2. Integration Tests for:

    • Tab close during active wizard (cleanup verification)
    • Concurrent wizards on multiple tabs (state isolation)
    • Error recovery scenarios (timeout, process crash, etc.)
  3. Manual Testing Checklist (mentioned in PR but incomplete):

    • Multiple concurrent wizards on different tabs
    • Tab close during conversation phase
    • Tab close during generation phase
    • Very long AI responses (streaming content limits)
    • Write permission enforcement (try writing outside Auto Run folder)

🎯 Suggested Action Items Before Merge

Must Fix:

  1. Add tab close cleanup handler to prevent memory leaks

Should Fix:
2. Add bounds checking for streaming content accumulation
3. Document or improve UX for 'ask' → 'new' auto-switch behavior

Nice to Have:
4. Consolidate confidence threshold constant
5. Add debug mode state consistency checks
6. Structured error codes instead of string matching


Overall Assessment

Code Quality: High
Architecture: Excellent
Test Coverage: Good for core logic, UI could use E2E tests
Documentation: Comprehensive (inline docs + CLAUDE.md)
Ship-blocking issues: None (with tab cleanup fix)

This is a well-engineered feature with clear patterns and good attention to edge cases. The identified issues are mostly minor with straightforward fixes. Great work on the comprehensive test suite and documentation!

Recommendation: Approve with requested changes (address item #1, consider items #2-3)

- Wizard outputs now land in date-stamped `Wizard-YYYY-MM-DD` folders automatically 📅
- Dropped project-name folder sanitization in favor of consistent wizard base naming 🧭
- Inline wizard no longer auto-completes after generation; you decide when to exit 🛑
- Document generation screen now shows exact destination folder on completion 📂
- “Continue” button renamed to clear “Exit Wizard” action for better flow 🚪
- Main panel messaging updated to match the new Exit Wizard completion step 🧩
- File explorer “Retry Connection” now disables and shows “Connecting…” while refreshing 🔄
- Refresh icon now spins during reconnect attempts for instant status feedback 🌀
- Tests revamped to validate date-based wizard folder naming and zero-padding ✅
@pedramamini pedramamini marked this pull request as ready for review January 5, 2026 21:11
@pedramamini pedramamini merged commit 0cd2295 into main Jan 5, 2026
3 checks passed
@pedramamini pedramamini deleted the autorun-wizard branch January 5, 2026 21:12
@claude
Copy link

claude bot commented Jan 5, 2026

Comprehensive Code Review - PR #112: Inline Wizard Feature

Executive Summary

This is a well-architected, feature-complete PR with excellent test coverage and strong adherence to the project's patterns. The inline wizard implementation is sophisticated, handling complex state management across multiple tabs while maintaining security and performance. However, there are several critical security issues, potential race conditions, and memory leak risks that must be addressed before merging.

Recommendation: Request changes for security and concurrency issues.


🎯 Overall Rating: ⭐⭐⭐⭐☆ (4/5)

Category Rating Notes
Code Quality 4/5 Excellent architecture, minor race conditions
Security 2/5 Critical path traversal vulnerability
Performance 4/5 Good optimization, minor retry logic concerns
Testing 5/5 Outstanding coverage (1,114 lines of tests!)
Architecture 5/5 Excellent design patterns and consistency
Error Handling 3/5 Good try-catch, but silent error swallowing

🚨 Critical Issues (MUST FIX Before Merge)

1. HIGH SEVERITY: Path Traversal Vulnerability

File: src/renderer/services/inlineWizardDocumentGeneration.ts:178-192

Issue: The sanitizeFilename() function is insufficient and allows path traversal via URL encoding, Unicode homoglyphs, or NTFS alternate data streams.

Current Code:

export function sanitizeFilename(filename: string): string {
  return filename
    .replace(/[\/\\]/g, '-')
    .replace(/\.\./g, '')
    .replace(/[\x00-\x1f\x7f]/g, '')
    .replace(/^\.+/, '')
    .trim()
    || 'document';
}

Attack Vector:

// AI generates: "Phase-01%2F..%2F..%2Fetc%2Fpasswd.md"
// After sanitization: unchanged (URL encoding bypasses regex)
// When passed to writeDoc, could escape sandbox

Fix Required:

export function sanitizeFilename(filename: string): string {
  return filename
    // Decode any URL encoding first
    .replace(/%[0-9A-F]{2}/gi, '')
    // Remove path separators (both Unix and Windows)
    .replace(/[\/\\]/g, '-')
    // Remove directory traversal sequences
    .replace(/\.\./g, '')
    // Remove null bytes and control characters
    .replace(/[\x00-\x1f\x7f]/g, '')
    // Remove Unicode directional override characters
    .replace(/[\u202A-\u202E]/g, '')
    // Remove leading dots (hidden files / relative paths)
    .replace(/^\.+/, '')
    // Remove NTFS alternate data streams
    .replace(/:/g, '-')
    // Limit to safe characters only
    .replace(/[^a-zA-Z0-9\s\-_.]/g, '')
    .trim()
    || 'document';
}

2. MEDIUM SEVERITY: Race Condition in Message Sending

File: src/renderer/hooks/useInlineWizard.ts:635-639

Issue: Check-then-act race condition. Two rapid calls could both pass the isWaiting check before either sets it to true.

Current Code:

const currentState = tabStatesRef.current.get(tabId);
if (currentState?.isWaiting) {
  console.warn('[useInlineWizard] Already waiting for response, ignoring duplicate send');
  return;
}

Fix Required:

// Add atomic state update
const wasPending = tabStatesRef.current.get(tabId)?.isWaiting;
setTabState(tabId, prev => ({ ...prev, isWaiting: true }));
if (wasPending) {
  setTabState(tabId, prev => ({ ...prev, isWaiting: false }));
  return;
}

3. MEDIUM SEVERITY: Event Listener Memory Leak

File: src/renderer/services/inlineWizardConversation.ts:514-595

Issue: If the Promise never resolves (process hangs indefinitely), listeners are never cleaned up.

Current Code:

const result = await new Promise<InlineWizardSendResult>((resolve) => {
  // Set up listeners
  dataListenerCleanup = window.maestro.process.onData(...);
  // Timeout cleanup happens inside Promise
});

Fix Required:

const cleanup = () => {
  clearTimeout(timeoutId);
  dataListenerCleanup?.();
  exitListenerCleanup?.();
  thinkingListenerCleanup?.();
  toolExecutionListenerCleanup?.();
};

// Add failsafe cleanup after extended timeout
const failsafeTimeout = setTimeout(() => {
  console.error('[InlineWizard] Failsafe cleanup triggered');
  cleanup();
}, 600000); // 10 minutes hard cap

try {
  const result = await new Promise<InlineWizardSendResult>(...);
  return result;
} finally {
  clearTimeout(failsafeTimeout);
  cleanup();
}

4. HIGH SEVERITY: Server-Side Write Path Validation Needed

File: src/renderer/services/inlineWizardDocumentGeneration.ts:552

Issue: While the prompt restricts writes to Auto Run folder, a malicious or buggy prompt could allow writes elsewhere. The Write tool has no server-side path validation.

Current Code:

args.push('--allowedTools', 'Read', 'Glob', 'Grep', 'LS', 'Write');

Recommendation: Implement server-side path validation in the Write tool handler to enforce Auto Run folder restriction. This should be done in the main process, not just trusted to the AI agent.


⚠️ Should Fix (High Priority)

5. Silent Error Swallowing

Multiple Locations:

  • useInlineWizard.ts:996-999 - Reset cleanup
  • inlineWizardDocumentGeneration.ts:693, 706, 728 - Process kill failures

Issue: Empty catch blocks make debugging impossible and hide orphaned processes.

Fix: Replace all instances of .catch(() => {}) with logging:

.catch((err) => {
  console.error('[useInlineWizard] Failed to cleanup session:', err);
});

6. File Watcher Duplicate Processing Race

File: src/renderer/services/inlineWizardDocumentGeneration.ts:759-763

Issue: Two rename events could fire simultaneously and both pass the duplicate check.

Current Code:

const alreadyProcessed = documentsFromWatcher.some(d => d.filename === filenameWithExt);
if (alreadyProcessed) return;

Fix:

const processedFiles = new Set<string>();
// In watcher callback:
if (processedFiles.has(filenameWithExt)) return;
processedFiles.add(filenameWithExt);

7. Input Length Validation Missing

File: src/renderer/services/wizardIntentParser.ts

Issue: No input length limits could cause regex DoS in matchesKeywordAnywhere().

Fix:

export function parseWizardIntent(input: string, hasExistingDocs: boolean) {
  if (input.length > 10000) { // 10KB max
    return { mode: hasExistingDocs ? 'ask' : 'new' };
  }
  // ... rest of function
}

8. Runtime Type Validation for IPC

File: src/renderer/services/existingDocsDetector.ts:99-102

Issue: result.files is typed as any[] from IPC. Runtime type could be wrong.

Fix:

if (!Array.isArray(result.files) || !result.files.every(f => typeof f === 'string')) {
  throw new TypeError('Invalid files array from listDocs');
}

✅ Outstanding Strengths

Architecture (5/5)

  • Per-tab wizard state allows multiple independent wizards on different tabs
  • Stateless services with pure functions instead of singletons
  • Context provider pattern for centralized state management
  • Separation of generation modes with distinct prompts for new vs iterate
  • Perfect consistency with existing onboarding wizard patterns

Testing (5/5)

  • 1,114 lines of hook tests with comprehensive coverage
  • 983 lines of integration tests
  • All edge cases covered: empty input, ambiguous input, mode transitions
  • 68 intent parser tests covering all 60+ keywords
  • Proper mocking of IPC APIs and agent responses

UI/UX (5/5)

  • Typewriter effect for natural conversation feel
  • Confidence gauge with clear progress feedback
  • Streaming document preview with real-time generation
  • Austin facts for entertainment during long operations
  • Confetti celebration on completion
  • Excellent accessibility with ARIA labels and keyboard navigation

📝 Nice to Have (Future Enhancements)

  1. Use crypto.randomUUID() instead of Math.random() for ID generation (useInlineWizard.ts:262)
  2. Implement exponential backoff in file read retries
  3. Add Error Boundaries around wizard components
  4. Implement orphan process detection and cleanup
  5. Add performance benchmarks to documentation

🎬 Action Items

Before Merge:

  1. ✅ Fix path sanitization vulnerability (Internal Logging #1)
  2. ✅ Add server-side Write tool path validation (Major Feature: Web Interface #4)
  3. ✅ Fix race condition in sendMessage (Add Claude Code GitHub Workflow #2)
  4. ✅ Add failsafe cleanup for event listeners (Raychaser/claude path 01 #3)
  5. ✅ Replace empty catch blocks with error logging (docs: enhance CLAUDE.md with work items and contribution workflow #5)
  6. ✅ Add input length validation (Added Main Panel Tab Support #7)
  7. ✅ Add runtime type validation for IPC (Achievements Support and Pedram's List 11/29 #8)

After Merge (Can be separate PR):

  • File watcher deduplication (Laundry List of Fixes #6)
  • Error boundaries
  • Crypto-secure ID generation
  • Performance optimizations

🏆 Conclusion

This is exceptional work with a sophisticated architecture and excellent test coverage. The implementation demonstrates deep understanding of React patterns, async state management, and the project's conventions. The security vulnerabilities are addressable with the fixes outlined above.

Estimated Fix Time: 4-6 hours for critical issues, 2-3 hours for should-fix items.

Once the critical security and concurrency issues are resolved, this will be an excellent addition to Maestro! 🎉

Great job on the comprehensive test coverage and thoughtful architecture! 👏

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant

Comments