refactor(ai): LangChain/LangGraph + qortex — unified AI layer#253
refactor(ai): LangChain/LangGraph + qortex — unified AI layer#253
Conversation
4 Icelandic scripts for FaceTime-style onboarding: - odin-intro: Introduction and first question - odin-response-1: Follow-up about learning progress - odin-response-2: Assessment wrap-up - odin-farewell: Welcome to Interlinear Pipeline: ElevenLabs → Supabase → RunPod SONIC → MP4
Code Review - PR #253
|
Create lib/ai/ infrastructure that all subsequent phases depend on: - Model factory: getModel(task) returns configured ChatOpenAI instances with per-task routing (model, temperature, maxTokens) - Cost tracker: LangChain BaseCallbackHandler that logs token usage and calculates costs per generation - Prompt extraction: All inline prompts from tutor-tools.ts (8 tools), onboarding routes, Odin service, and translate route extracted into dedicated prompt files under lib/ai/prompts/ - Shared utilities: detectLanguage() and retryWithBackoff()/invokeWithTimeout() extracted from tutor-tools.ts into lib/ai/tools/shared/ No behavior changes — this is pure infrastructure addition. Existing code untouched; new code imports will be wired in Phase 3+. Closes phase 1 of #252 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
PR Review: LangChain/LangGraph + qortex — unified AI layerOverviewThis PR represents a significant architectural improvement by consolidating the AI layer. Based on my analysis, this refactoring addresses critical technical debt by extracting modular components and establishing a unified model factory pattern. ✅ Strengths1. Excellent Code Organization
2. Unified Model Factory Pattern
3. Strong TypeScript Practices
4. Cost Tracking Architecture
|
…ts, utils Test coverage for all Phase 1 infrastructure: - models.test.ts (35 tests): getModelConfig for all 18 tasks, pricing calculations, getModel factory, getTrackedModel, CostTracker callback handler with event recording, duration measurement, metadata - prompts.test.ts (61 tests): all 8 tutor prompt generators across 3 languages (es/la/is), Odin system prompt, translation prompt, content verification (language-specific labels, JSON format, etc.) - language-detection.test.ts (34 tests): mirrors existing tutor-tools tests to verify extracted function behaves identically - retry.test.ts (14 tests): retryWithBackoff exponential backoff, invokeWithTimeout, error propagation, edge cases All 1126 tests pass (982 existing + 144 new), 0 failures. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
PR Review: LangChain/LangGraph + qortex RefactoringThank you for this comprehensive AI layer refactoring. I've reviewed the changes and have feedback on code quality, architecture, testing, and concerns. ✅ Strengths1. Excellent Centralized Model Configuration
2. Strong Test Coverage
3. Good Separation of Concerns
4. Documentation
|
…ty, error classification Critical: - Add XML delimiter wrapping (wrapUserContent/sanitizeInput) around all user-supplied content in prompt templates to mitigate prompt injection - New lib/ai/tools/shared/sanitize.ts with wrapUserContent() and sanitizeInput() Major: - Fix timer leak in invokeWithTimeout — clearTimeout in finally block - Add isRetryableError() — skip retries for 400/401/403/404/422 client errors - Replace any[] with BaseCallbackHandler[] in GetModelOptions - Add console.warn for unknown model pricing fallback - Test NODE_ENV development/production branches in CostTracker - Test llmOutput undefined branch in CostTracker - Add expect.assertions(N) to all try/catch and rejection tests - Test clearTimeout called in invokeWithTimeout (timer leak regression test) - Add onboarding prompt re-export test coverage - New sanitize.test.ts (15 tests) Tests: 195 passing (up from 144) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
PR Review: LangChain/LangGraph + qortex — unified AI layerOverviewThis is a well-structured foundational refactor that adds a new ✅ Strengths1. Excellent Architecture & Organization
2. Comprehensive Test Coverage
3. Security Best Practices
4. Cost Tracking & Observability
5. Error Handling & Resilience
|
Break lib/tutor-tools.ts (1,649 lines) into individual tool files under
lib/ai/tools/tutor/. Each tool imports prompts from Phase 1 and uses
the model factory. Backward-compat shim preserves all existing imports.
New files:
- lib/ai/tools/tutor/{start-dialog,continue-dialog,analyze-errors,
generate-overview,analyze-message,professor-review,start-roleplay,
continue-roleplay}.ts — 8 individual LangChain tools
- lib/ai/tools/tutor/{types,schemas,index}.ts — shared types and barrel
- lib/ai/tools/shared/parse-json.ts — JSON response parser utility
- lib/ai/__tests__/{parse-json,tutor-schemas,tutor-tools}.test.ts — 48 tests
lib/tutor-tools.ts reduced from 1,649 to 34 lines (re-export shim).
243 tests passing in lib/ai/, 92 existing tests still passing.
Zero type errors in lib/ai/ and lib/tutor-tools.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Code Review: LangChain/LangGraph Migration (PR #253)Overall AssessmentThis is a well-structured architectural refactor that consolidates multiple AI SDK paths into a unified LangChain/LangGraph layer. The PR demonstrates excellent code organization, thorough testing, and clear migration strategy. ✅ Strengths
🔍 Issues & RecommendationsCritical Issues1. Incomplete Migration - Old Dependencies Still Present The PR description claims to remove Mastra and Vercel AI SDK, but:
Action Required:
High Priority Issues2. Language Detection Logic Inconsistency In const language = session.dialog_id ? 'la' : 'es' // Simple heuristic for nowThis fragile heuristic assumes dialog_id presence = Latin, no dialog_id = Spanish, and ignores Icelandic entirely. Recommendation:
3. Missing Error Handling in JSON Parsing In 4. Timer Memory Leak Risk in Retry Logic In 5. Cost Tracking Never Persisted
Medium Priority Issues6. Model Fallback Warning Could Cause Noise
7. Timeout Not Configurable Per Tool All tools hardcode 30 second timeouts. Some operations (e.g., generate-exercises) may need longer. Consider adding optional 8. Schema Validation Only at Tool Boundary The schemas validate input params but not LLM output before parsing. LLM could return data that passes JSON parsing but fails business logic. Consider validating LLM responses with Zod schemas. 9. Transcript Labels Language Heuristic In Low Priority / Nice to Have10. Test Coverage Gaps
11. Documentation
12. Magic Numbers 13. Inconsistent Naming Performance Considerations✅ Good:
Security Assessment✅ Secure:
Test Plan VerificationBased on PR description checklist:
Recommendations for Next Steps
ConclusionThis is high-quality refactoring work with excellent code organization and testing. The main blocker is the incomplete removal of Mastra dependencies. Once those stale imports are addressed and the critical issues fixed, this PR will significantly improve the codebase maintainability and consistency. Recommendation: Request changes to address Mastra cleanup and language detection logic before merge. Reviewed by: Claude Code (Sonnet 4.5) |
Phase 4: LangGraph StateGraphs + Processor Migration (Updated Plan)Key Change: Microservice-Ejectable ArchitectureThe Zero monolith imports inside
Dependency injection via adapter interfaces:
What CAN be imported: Implementation StepsStep 1: Processor migration with adapter injection
Step 2: Graph state types
Step 3: Content generation graph
Step 4: Onboarding assessment graph
Step 5: Word of day graph
Step 6: Tutor turn graph with injected persistence
Step 7: Barrel exports + upgrade langchain tools
Step 8: Ollama integration tests
Testing Strategy (per step)
Verification Checklist
|
…a live tests Replace Mastra workflow orchestration with 4 deterministic LangGraph StateGraphs: - Content generation (vocab → grammar → exercises with startFrom routing) - Onboarding assessment (chat/evaluate conditional routing) - Word of day (deterministic word selection + sentence generation) - Tutor turn (loadContext → analyze → respond → persist with DI) Migrate processors to lib/ai/processors/ with dependency injection adapters, original files become re-export shims. Upgrade langchain content tools to use getModel() factory. 395 tests total (383 unit + 12 Ollama live LLM tests). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Code Review: LangChain/LangGraph RefactorThis is a substantial and well-executed refactor that consolidates 4 different AI SDK approaches into unified LangChain/LangGraph architecture. Strengths
Areas for Improvement1. Error Handling - Missing Timeout ProtectionLLM calls have no timeout (lib/ai/graphs/tutor-turn.ts:129). A slow API call could block indefinitely. Recommendation: Use invokeWithTimeout from lib/ai/tools/shared/retry.ts consistently. 2. Security - Prompt Sanitization CoverageSanitization utilities exist but not used consistently across all prompts. Recommendation: Audit all prompt templates to ensure user data is wrapped with wrapUserContent(). 3. Type Safety - Loose Type AssertionsType assertions like (dialogLang as LanguageCode) bypass validation (lib/ai/graphs/tutor-turn.ts:93). Recommendation: Use Zod parsing for runtime validation instead. 4. Performance - Potential Memory LeaksCostTracker events array grows unbounded (lib/ai/models/cost-tracker.ts:65). Recommendation: Add size limit or periodic flush. 5. DocumentationMissing migration guide, architecture diagrams, and adapter configuration examples. Recommendation: Add docs/ai/ARCHITECTURE.md and docs/ai/MIGRATION.md Critical IssuesPersistenceAdapter Not Configured GuardIf configureTutorGraph() is never called, runtime errors occur deep in request handling. Recommendation: Add startup validation in app/layout.tsx to ensure adapters are configured. SummaryOverall Assessment: Approve with minor revisions This is high-quality work with excellent test coverage and clean architecture. Required before merge:
Estimated effort: 2-4 hours Great work on this refactor! The architecture is solid and the migration path is clear. |
…anup Add console.error with tagged prefixes to all 5 bare catch blocks in graph nodes (onboarding, word-of-day, tutor-turn). Add afterEach vi.unstubAllEnvs() to all 4 graph test files for proper environment cleanup. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Pull Request Review: LangChain/LangGraph + qortex — Unified AI LayerOverview AssessmentThis is an ambitious and well-executed architectural refactoring that consolidates 4 different SDK patterns (Mastra, LangChain, Vercel AI SDK, Direct OpenAI) into a unified LangChain/LangGraph orchestration layer. The PR demonstrates strong software engineering practices with excellent test coverage (395 tests), modular design, and security considerations. Overall Grade: 🟢 Strong Approval with Minor Recommendations ✅ Strengths1. Excellent Architecture & Separation of Concerns
2. Security: Prompt Injection Mitigation ⭐
3. Cost Tracking & Observability
4. Test Coverage 🎯
5. Error Handling & Resilience
6. Backward Compatibility
🟡 Areas for Improvement1. Missing Test Plan CompletionIssue: PR description shows 6 unchecked test plan items: Recommendation:
Location: PR #253 test plan 2. Potential Type Safety Issues in Graph StateIssue: In language = text.language as LanguageCode // Line 86Similarly on line 93: language = (dialogLang as LanguageCode) || undefined // Line 93Risk: If database contains invalid language codes, runtime errors could occur downstream Recommendation: Add runtime validation using the import { LanguageCodeSchema } from '@/lib/ai/types/languages'
const parsed = LanguageCodeSchema.safeParse(text.language)
language = parsed.success ? parsed.data : undefinedLocations:
3. Hard-Coded Turn LimitIssue: const shouldEnd = turnNumber >= 10 // End after 10 turnsConcern: This business logic is embedded in the graph implementation rather than being configurable Recommendation: Make this configurable via the graph state or a constant: // At top of file
const DEFAULT_MAX_TURNS = 10
// In configureTutorGraph()
export function configureTutorGraph(
adapter: PersistenceAdapter,
options?: { maxTurns?: number }
): void {
persistenceAdapter = adapter
maxTurns = options?.maxTurns ?? DEFAULT_MAX_TURNS
}
// In loadContextNode
const shouldEnd = turnNumber >= maxTurnsLocation: 4. Silent Failure in Cost TrackerIssue: if (!usage) return // No log, no trackingConcern: Cost tracking failures are invisible in production, making budget monitoring unreliable Recommendation: Log a warning when usage is missing: if (!usage) {
console.warn(`[CostTracker] ${this.taskType} — no token usage in LLM response`)
return
}Location: 5. Processor Factory Error HandlingIssue: Recommendation: Review
Location: 6. Mastra Dependency Still PresentIssue: "@mastra/core": "^0.24.0",Question: The PR claims to "Remove Mastra and Vercel AI SDK dependencies," but Recommendation:
Location: 🔴 Critical IssuesNone Identified ✅No blocking issues found. All concerns are minor improvements that can be addressed post-merge or in follow-up PRs. 🔍 Code Quality DetailsNaming Conventions: ✅ Excellent
Documentation: ✅ Strong
Type Safety: 🟡 Good with caveats
Performance: ✅ Optimized
📋 Recommendations Summary
Estimated Total Effort: ~1.5 hours 🎯 Final VerdictApprove and Merge ✅ This PR represents a major step forward in codebase quality:
The identified issues are minor and can be addressed in follow-up PRs without blocking this merge. The foundational work here is solid, well-tested, and follows CLAUDE.md conventions. Great work! This refactor sets a strong precedent for future AI layer improvements. 🔗 Reference Files Reviewed
|
Phase 5 Checkpoint: Complete Route Migration — Zero Direct SDK ImportsPlanMigrating 15 files that still instantiate LLM clients directly, bypassing Steps (exhaustive testing + gauntlet after each):
Completion criteria: Also preparing for: Files modified: 17 source files, 2 deleted, 8+ new test files Phase 5 of AI Layer Refactor (#252) |
…+ workflows migration Step 1: Add 5 new ModelTask types (icelandic-lookup, ai-chat, feedback-analysis, tutor-generate-examples, image-prompt) with configs and pricing for gpt-4, gpt-3.5-turbo. Step 2: Migrate content-generation tools (identify-grammar, generate-exercises, generate-dialogs) from Vercel AI SDK generateObject() to getModel() + withStructuredOutput(zodSchema). Structured output is now provider-agnostic — works on both ChatOpenAI and ChatOllama without manual JSON parsing. Step 3: Migrate Mastra workflows (overviewGeneration, wordOfDayGeneration) from direct OpenAI SDK to getModel(). wordOfDayGeneration uses withStructuredOutput for JSON responses; overviewGeneration uses plain invoke() for free-form text. Model factory: OLLAMA_BASE_URL env var switches getModel() from ChatOpenAI to ChatOllama. Return type narrowed to BaseChatModel for provider neutrality. Tests: 486 unit tests pass, 8/8 Ollama live integration tests pass (NO MOCKS). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…test hygiene - Wrap user content with wrapUserContent() in identify-grammar, generate-exercises, generate-dialogs prompts to prevent prompt injection - Sanitize word/definitions input in wordOfDayGeneration with sanitizeInput() - Fix overviewGeneration error logging: extract message instead of logging raw object - Replace silent `if (!ollamaAvailable) return` with describe.runIf() in both Ollama test files — tests now show as skipped, not falsely green - Add Ollama provider switch unit tests to models.test.ts (ChatOllama returned when OLLAMA_BASE_URL set, model name mapping verified) - Assert Zod schema passed to withStructuredOutput() in all 4 migration test files Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…on, Ollama live tests - Migrate onboarding/chat and onboarding/assess from direct OpenAI SDK to runOnboardingChat() and runOnboardingAssessment() graph wrappers - Add sanitizeInput() to graph convenience wrappers (gauntlet: prompt injection) - Fix OnboardingChatInput to import LanguageCode instead of inline literal union - Replace em dashes with colons/semicolons in onboarding-assessment.ts - Rewrite route tests to mock @/lib/ai/graphs instead of openai SDK (16 pass) - Add 2 Ollama live tests for graph wrappers (14/14 total) - Add fallback reasoning assertion to onboarding graph test - File GH #254 for unauthenticated onboarding route auth gap Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Translate, tutor/generate-examples, ai/chat, and feedback routes now use getModel() factory instead of direct ChatOpenAI instantiation. Added wrapUserContent sanitization, Zod input validation, parseJsonResponse. Gauntlet fixes: error detail leak in feedback 500, unused import, Zod schema. 12 Ollama live tests passing, 19 mock tests, 392 AI unit tests green. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Icelandic lookup: Vercel AI SDK generateText → getModel('icelandic-lookup'),
added wrapUserContent + parseJsonResponse. Odin speak: OpenAI SDK streaming →
getModel('odin').stream() via LangChain, added sanitizeInput, fixed error
message leak in SSE. 11 mock tests, 2 Ollama live tests added. Gauntlet clean.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Deleted 4 files with zero consumers: - lib/content-generation/mastra.config.ts (Vercel AI SDK provider) - lib/mastra/providers/openai.ts (OpenAI SDK wrapper) - lib/mastra/workflows/contentGeneration.ts (old workflow engine) - scripts/test-mastra.ts (test script for deleted barrel exports) Simplified lib/mastra/index.ts to type-only re-exports. Removed dead test blocks for deleted modules. 427 unit tests + 77 route tests + 14 Ollama live tests all passing. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Move OpenAI (DALL-E) and @fal-ai/client SDK imports exclusively into
lib/ai/models/image.ts. The T2I service becomes a thin consumer that
delegates to getImageModel('flashcard-image'). Zero direct SDK imports
outside the AI plane.
- New image-config.ts: routing table, pricing, availability checks
- New image.ts: DallEImageModel, FALImageModel, getImageModel() factory
- Rewrite T2I service (197→116 lines) as thin consumer with PROVIDER_MAP
- Simplify T2I types (94→29 lines) to deprecated aliases
- Delete dalle.ts (-126), fal.ts (-162), dalle.test.ts (-140)
- 21 new AI plane image tests, 5 rewritten T2I service tests
- Tighten API route Zod schema to only supported providers
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
PR Review: LangChain/LangGraph + qortex — unified AI layerOverviewThis is an ambitious and well-executed consolidation PR that unifies 4 different SDK paths into a single LangChain/LangGraph orchestration layer. The refactoring demonstrates strong architectural discipline and significantly improves code organization. Scope: 13,674 additions / 6,356 deletions across 90+ files ✅ Strengths1. Excellent Architectural Separation
2. Strong Type Safety
3. Modular Tool Organization
4. Observability & Cost Tracking
5. Comprehensive Test Coverage
6. Model Configuration
|
Summary
langchain-qortexto TypeScript and integrates it for knowledge graph-backed vocabulary and grammartutor-tools.tsmonolith into modular tool filesCloses #252
Test plan
npm run buildpasses after each phasenpm run type-checkpasses after each phase@mastra,tutor-tools,lib/langchain,content-generation)🤖 Generated with Claude Code