refactor(tutor): decouple tools from Supabase via TutorPersistence by Peleke · Pull Request #260 · Peleke/interlinear

Peleke · 2026-02-22T20:39:12Z

Summary

Extract TutorPersistence interface + TutorPersistenceSupabase class (lib/services/tutor-persistence.ts) — all DB queries in clean, injectable methods
Each of the 7 DB-touching tutor tools now exports a pure core function (accepts TutorPersistence + userId) alongside the existing LangChain tool() shim
tutor-supabase.ts rewired to use core functions with injected persistence — no more tool.invoke() pass-through, ServiceContext actually used now
All 8 API routes in app/api/tutor/ migrated from direct tool imports to createServices() service bus

Why

Tools were hardwired to Supabase (createClient() + auth.getUser() in every tool), blocking backend switching (Qortex), making LLM logic untestable without Supabase, and duplicating auth across 8 files. The service bus wrapper was a no-op pass-through.

Test plan

npm run type-check — zero errors in changed files
npx vitest run lib/services/__tests__/tutor.integration.test.ts — 16 tests pass (added mock persistence tests)
Manual E2E: tutor dialog start/continue, roleplay start/continue, analyze, overview, review
Verify Qortex backend still works (JWT forwarding unchanged)

Closes #256

🤖 Generated with Claude Code

4 Icelandic scripts for FaceTime-style onboarding: - odin-intro: Introduction and first question - odin-response-1: Follow-up about learning progress - odin-response-2: Assessment wrap-up - odin-farewell: Welcome to Interlinear Pipeline: ElevenLabs → Supabase → RunPod SONIC → MP4

Create lib/ai/ infrastructure that all subsequent phases depend on: - Model factory: getModel(task) returns configured ChatOpenAI instances with per-task routing (model, temperature, maxTokens) - Cost tracker: LangChain BaseCallbackHandler that logs token usage and calculates costs per generation - Prompt extraction: All inline prompts from tutor-tools.ts (8 tools), onboarding routes, Odin service, and translate route extracted into dedicated prompt files under lib/ai/prompts/ - Shared utilities: detectLanguage() and retryWithBackoff()/invokeWithTimeout() extracted from tutor-tools.ts into lib/ai/tools/shared/ No behavior changes — this is pure infrastructure addition. Existing code untouched; new code imports will be wired in Phase 3+. Closes phase 1 of #252 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…ts, utils Test coverage for all Phase 1 infrastructure: - models.test.ts (35 tests): getModelConfig for all 18 tasks, pricing calculations, getModel factory, getTrackedModel, CostTracker callback handler with event recording, duration measurement, metadata - prompts.test.ts (61 tests): all 8 tutor prompt generators across 3 languages (es/la/is), Odin system prompt, translation prompt, content verification (language-specific labels, JSON format, etc.) - language-detection.test.ts (34 tests): mirrors existing tutor-tools tests to verify extracted function behaves identically - retry.test.ts (14 tests): retryWithBackoff exponential backoff, invokeWithTimeout, error propagation, edge cases All 1126 tests pass (982 existing + 144 new), 0 failures. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…ty, error classification Critical: - Add XML delimiter wrapping (wrapUserContent/sanitizeInput) around all user-supplied content in prompt templates to mitigate prompt injection - New lib/ai/tools/shared/sanitize.ts with wrapUserContent() and sanitizeInput() Major: - Fix timer leak in invokeWithTimeout — clearTimeout in finally block - Add isRetryableError() — skip retries for 400/401/403/404/422 client errors - Replace any[] with BaseCallbackHandler[] in GetModelOptions - Add console.warn for unknown model pricing fallback - Test NODE_ENV development/production branches in CostTracker - Test llmOutput undefined branch in CostTracker - Add expect.assertions(N) to all try/catch and rejection tests - Test clearTimeout called in invokeWithTimeout (timer leak regression test) - Add onboarding prompt re-export test coverage - New sanitize.test.ts (15 tests) Tests: 195 passing (up from 144) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Break lib/tutor-tools.ts (1,649 lines) into individual tool files under lib/ai/tools/tutor/. Each tool imports prompts from Phase 1 and uses the model factory. Backward-compat shim preserves all existing imports. New files: - lib/ai/tools/tutor/{start-dialog,continue-dialog,analyze-errors, generate-overview,analyze-message,professor-review,start-roleplay, continue-roleplay}.ts — 8 individual LangChain tools - lib/ai/tools/tutor/{types,schemas,index}.ts — shared types and barrel - lib/ai/tools/shared/parse-json.ts — JSON response parser utility - lib/ai/__tests__/{parse-json,tutor-schemas,tutor-tools}.test.ts — 48 tests lib/tutor-tools.ts reduced from 1,649 to 34 lines (re-export shim). 243 tests passing in lib/ai/, 92 existing tests still passing. Zero type errors in lib/ai/ and lib/tutor-tools. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…a live tests Replace Mastra workflow orchestration with 4 deterministic LangGraph StateGraphs: - Content generation (vocab → grammar → exercises with startFrom routing) - Onboarding assessment (chat/evaluate conditional routing) - Word of day (deterministic word selection + sentence generation) - Tutor turn (loadContext → analyze → respond → persist with DI) Migrate processors to lib/ai/processors/ with dependency injection adapters, original files become re-export shims. Upgrade langchain content tools to use getModel() factory. 395 tests total (383 unit + 12 Ollama live LLM tests). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…anup Add console.error with tagged prefixes to all 5 bare catch blocks in graph nodes (onboarding, word-of-day, tutor-turn). Add afterEach vi.unstubAllEnvs() to all 4 graph test files for proper environment cleanup. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…+ workflows migration Step 1: Add 5 new ModelTask types (icelandic-lookup, ai-chat, feedback-analysis, tutor-generate-examples, image-prompt) with configs and pricing for gpt-4, gpt-3.5-turbo. Step 2: Migrate content-generation tools (identify-grammar, generate-exercises, generate-dialogs) from Vercel AI SDK generateObject() to getModel() + withStructuredOutput(zodSchema). Structured output is now provider-agnostic — works on both ChatOpenAI and ChatOllama without manual JSON parsing. Step 3: Migrate Mastra workflows (overviewGeneration, wordOfDayGeneration) from direct OpenAI SDK to getModel(). wordOfDayGeneration uses withStructuredOutput for JSON responses; overviewGeneration uses plain invoke() for free-form text. Model factory: OLLAMA_BASE_URL env var switches getModel() from ChatOpenAI to ChatOllama. Return type narrowed to BaseChatModel for provider neutrality. Tests: 486 unit tests pass, 8/8 Ollama live integration tests pass (NO MOCKS). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…test hygiene - Wrap user content with wrapUserContent() in identify-grammar, generate-exercises, generate-dialogs prompts to prevent prompt injection - Sanitize word/definitions input in wordOfDayGeneration with sanitizeInput() - Fix overviewGeneration error logging: extract message instead of logging raw object - Replace silent `if (!ollamaAvailable) return` with describe.runIf() in both Ollama test files — tests now show as skipped, not falsely green - Add Ollama provider switch unit tests to models.test.ts (ChatOllama returned when OLLAMA_BASE_URL set, model name mapping verified) - Assert Zod schema passed to withStructuredOutput() in all 4 migration test files Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…on, Ollama live tests - Migrate onboarding/chat and onboarding/assess from direct OpenAI SDK to runOnboardingChat() and runOnboardingAssessment() graph wrappers - Add sanitizeInput() to graph convenience wrappers (gauntlet: prompt injection) - Fix OnboardingChatInput to import LanguageCode instead of inline literal union - Replace em dashes with colons/semicolons in onboarding-assessment.ts - Rewrite route tests to mock @/lib/ai/graphs instead of openai SDK (16 pass) - Add 2 Ollama live tests for graph wrappers (14/14 total) - Add fallback reasoning assertion to onboarding graph test - File GH #254 for unauthenticated onboarding route auth gap Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Translate, tutor/generate-examples, ai/chat, and feedback routes now use getModel() factory instead of direct ChatOpenAI instantiation. Added wrapUserContent sanitization, Zod input validation, parseJsonResponse. Gauntlet fixes: error detail leak in feedback 500, unused import, Zod schema. 12 Ollama live tests passing, 19 mock tests, 392 AI unit tests green. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Icelandic lookup: Vercel AI SDK generateText → getModel('icelandic-lookup'), added wrapUserContent + parseJsonResponse. Odin speak: OpenAI SDK streaming → getModel('odin').stream() via LangChain, added sanitizeInput, fixed error message leak in SSE. 11 mock tests, 2 Ollama live tests added. Gauntlet clean. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Deleted 4 files with zero consumers: - lib/content-generation/mastra.config.ts (Vercel AI SDK provider) - lib/mastra/providers/openai.ts (OpenAI SDK wrapper) - lib/mastra/workflows/contentGeneration.ts (old workflow engine) - scripts/test-mastra.ts (test script for deleted barrel exports) Simplified lib/mastra/index.ts to type-only re-exports. Removed dead test blocks for deleted modules. 427 unit tests + 77 route tests + 14 Ollama live tests all passing. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Move OpenAI (DALL-E) and @fal-ai/client SDK imports exclusively into lib/ai/models/image.ts. The T2I service becomes a thin consumer that delegates to getImageModel('flashcard-image'). Zero direct SDK imports outside the AI plane. - New image-config.ts: routing table, pricing, availability checks - New image.ts: DallEImageModel, FALImageModel, getImageModel() factory - Rewrite T2I service (197→116 lines) as thin consumer with PROVIDER_MAP - Simplify T2I types (94→29 lines) to deprecated aliases - Delete dalle.ts (-126), fal.ts (-162), dalle.test.ts (-140) - 21 new AI plane image tests, 5 rewritten T2I service tests - Tighten API route Zod schema to only supported providers Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Swap remaining imports to lib/ai/ across 8 tutor routes, overview-cache, odin client, content-generation workflow, and wordOfDay tool. Delete orphaned files: langchain tools/agents, mastra tools, prompts, debug scripts, and the v1 content-generate route. Strip dead test code from OdinLiveHead and useOdinSpeak tests. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Introduce typed service bus (`createServices()`) that creates a single Supabase client and auth check per request, then fans out to domain services. Includes `getDomainConfig()` for per-domain backend routing via `BACKEND_<DOMAIN>` env vars (defaults to 'supabase'). - `lib/services/types.ts` — ServiceContext, BackendProvider, error classes, UserPreferences, StreakResult - `lib/services/bus.ts` — createServices() factory - `lib/services/user.ts` — UserService with getPreferences(), updatePreferences(), checkStreak() (consolidates duplicated streak logic from preferences + flashcard review routes) - Refactored `user/preferences` route: 145 → 30 lines - Refactored `user/streak-check` route: 97 → 15 lines Closes nothing — part of #255 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Introduce `dictionary_entries` as single source of truth for all words. New tables: `user_words` (user encounters), `lesson_words` (lesson edges), `reading_words` (reading edges). Adds `word_id` FK to flashcards table. Migration: - Expand dictionary_entries language CHECK to include 'is' - Create user_words, lesson_words, reading_words tables with RLS - Migrate lesson_vocabulary_items → dictionary_entries - Migrate lesson_vocabulary → lesson_words - Truncate old vocabulary table (user data blown away per decision) Service: `createDictionaryService(ctx)` with lookup, lookupOrCreate, search, recordEncounter, getUserWords, getUserWordByWord, linkToLesson, getLessonWords, bulkUpsert, getStats. New routes: - POST /api/dictionary/encounter — reader word-click → dictionary - GET /api/dictionary/user-words — list user's word encounters Refactored routes: - GET/PUT /api/lessons/[lessonId]/vocabulary → dictionary service - POST /api/lessons/[lessonId]/vocabulary/approve → dictionary service - GET /api/vocabulary/search → dictionary.search() - POST /api/workflows/content-generation → dictionary.bulkUpsert + dictionary.linkToLesson (replaces 100-line inline auto-save) - TextRenderPanel word-click → /api/dictionary/encounter endpoint Part of #255 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Add createFlashcardService(ctx) with 14 methods covering all flashcard domain operations: deck CRUD, card CRUD, due cards (RPC), review with XP/streak, stats, gamification toggle. Key improvements: - recordReview() uses user.checkStreak() instead of duplicating 50-line streak calculation inline - recordReview() records dictionary encounter when card has word_id, connecting SRS reviews to the universal dictionary - createCard() accepts optional word_id for dictionary linking - All cloze text parsing reuses existing utility functions Routes not yet refactored — service extraction only in this commit. Route refactoring will follow after integration tests. Part of #255 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…vices Real integration tests that hit actual Supabase — no mocks. - integration-helpers.ts: creates service-role admin client, test contexts with fake user IDs, and cleanup utilities - user.integration.test.ts: getPreferences defaults, updatePreferences roundtrip, unknown field filtering, empty update rejection, streak creation + maintain on same day - dictionary.integration.test.ts: lookup miss/hit, lookupOrCreate, search by prefix, recordEncounter + increment, getUserWords, getUserWordByWord, bulkUpsert + idempotency, getStats - flashcards.integration.test.ts: deck CRUD (create/list/update/delete), card CRUD (basic + cloze), stats, gamification toggle, cleanup All tests use describe.runIf(canRunIntegration) to skip gracefully when Supabase project is paused or env vars are missing. Part of #255 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

TutorService interface with 9 methods: startDialog, continueDialog, analyzeErrors, generateOverview, generateReview, startRoleplay, continueRoleplay, analyzeMessage, generateExamples. Two backends: - tutor-supabase.ts: delegates to existing lib/ai/tools/tutor/ tools - tutor-qortex.ts: HTTP client forwarding JWT as Bearer token to QORTEX_URL/tutor/<endpoint> Bus routing: BACKEND_TUTOR=qortex switches backend. Defaults to supabase. Tests (7/7 passing): - Supabase backend: all 9 methods exist on factory output - Qortex backend: JWT forwarding, correct URL construction, error handling - Bus routing: env var switching, default behavior Part of #255 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…ples Hits real local Ollama (qwen2.5:14b) to verify the full LLM pipeline: prompt construction → model invocation → JSON parsing → validation. Tests: - generates 2-3 Spanish examples for "casa" with definition - generates examples for "perro" without definition - Validates each example contains the target word Skips automatically when Ollama is not running (e.g., CI). describe.runIf checks localhost:11434 reachability at setup time. Part of #255 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…rse_id - Create lib/services/lessons.ts: factory-pattern LessonService with create, getById, update, delete, list, complete methods - Create lib/services/courses.ts: factory-pattern CourseService with full CRUD, lesson management (add/remove/reorder), enrollment (enroll/unenroll/getEnrolled), publish/unpublish, getOrCreateCourseDeck - Wire both into bus.ts (Services now includes all 6 domains) - Migration: migrate remaining lessons.course_id data to lesson_course_ordering, then drop the legacy column - Refactor all lesson routes (POST/GET/PATCH/DELETE, complete, incomplete) to use createServices() instead of static LessonService class - Refactor all course routes (CRUD, lessons, enroll, publish) to use createServices() instead of static CourseService class - Update publish + validate-for-publish routes to get course_id from lesson_course_ordering instead of lessons.course_id - Integration tests: 7 lesson tests + 8 course tests (real Supabase) - Unit tests: 6 wiring tests (factory shape + domain config) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

- Consolidate streak calculation: lessons.complete() now delegates to user.checkStreak({ xpDelta }) instead of reimplementing streak logic. Single profile write handles streak + XP atomically. - checkStreak() gains optional xpDelta param for callers that add XP. - createLessonService() now accepts { user: UserService } dependency. - Expand Ollama integration tests from 2 → 11 covering: - Dialog loop: startDialog, continueDialog, shouldEnd wrap-up - Error analysis: analyzeErrors (JSON), analyzeMessage (correct + incorrect) - Roleplay loop: startRoleplay, continueRoleplay, shouldEnd wrap-up All tests pass against local qwen2.5:14b via Ollama. Gauntlet majors addressed: #2 (streak dedup), #3 (Ollama tutor tests). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…t fix - Integration test helpers now create real auth.users via admin API, fixing FK constraint violations on lessons.author_id / courses.created_by that appeared after Supabase project restore. - createTestContext() is now async, returns Promise<ServiceContext>. - deleteTestUser() cleans up auth users in afterAll. - Fix CourseService.reorderLessons: delete-then-insert instead of upsert to avoid unique constraint violation on (course_id, display_order) during intermediate upsert state. - All 5 test files updated to use async context + cleanup pattern. Results: 77/91 pass. 14 remaining failures are pre-existing schema gaps (dictionary_entries table, preferred_image_provider column not yet migrated). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…ns, JWT validation - courses.ts: update/delete now enforce created_by === user.id (was relying on RLS alone, breaks when backend switches to Qortex) - flashcards.ts: recordReview verifies card belongs to user via deck ownership join before recording review + awarding XP - lessons.ts: complete() uses upsert with ignoreDuplicates to prevent TOCTOU race condition that could double-award XP on concurrent requests - tutor-qortex.ts: fail fast if session JWT is missing instead of silently sending unauthenticated requests to Qortex Gauntlet result: 0 criticals, 0 majors, 4 minors (accepted tech debt). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…terface Extract TutorPersistence interface + TutorPersistenceSupabase impl to decouple LLM logic from database access. Each of the 7 DB-touching tutor tools now exports a pure core function (persistence-injected, no auth) alongside the LangChain tool() shim. Service bus (tutor-supabase.ts) uses the core functions with injected persistence, eliminating the tool.invoke() pass-through. All 8 API routes migrated to createServices(). Closes #259 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

claude · 2026-02-22T20:44:20Z

Code Review: refactor(tutor): decouple tools from Supabase via TutorPersistence

Overall assessment: The architectural direction is sound and the motivation is clear, but the PR as submitted has several issues that need resolution before merging.

Scope Mismatch

The title and description describe extracting TutorPersistence (the final commit), but the branch contains 25 commits spanning 6+ phases: a full lib/ai/ rewrite, LangGraph migration, processor refactoring, a complete service bus for all API routes, and only finally the TutorPersistence extraction. This makes atomic review difficult and increases risk considerably.

The core files described in the PR summary — lib/services/tutor-persistence.ts, lib/services/tutor-supabase.ts, lib/services/__tests__/tutor.integration.test.ts — don't appear in the 100-file diff because they were introduced in earlier commits in the branch and aren't deltas relative to staging. Reviewers can't verify the central claimed change without checking the branch tree directly.

Recommendation: Consider splitting this into smaller PRs scoped to each phase, or at minimum update the PR description to accurately document the full scope of changes.

CI Failure

The Quality Checks job is failing. The local test results mentioned in the PR test plan (npm run type-check clean, 16 tests pass) don't reflect CI state. This needs to be green before merging.

Test Coverage Regression — Odin Subsystem

Two test files have been nearly entirely deleted without explanation:

hooks/__tests__/useOdinSpeak.test.ts: 856 lines deleted, 5 remaining
components/odin/__tests__/OdinLiveHead.test.tsx: 708 lines deleted, 5 remaining

That's ~1,564 lines of test coverage for the Odin/audio subsystem removed. The PR description doesn't mention this. Are these being replaced elsewhere? Moved? Deprecated because the component changed? This needs explanation. If the tests are genuinely obsolete (e.g., the components were refactored), that should be documented explicitly. If they're not, restoring coverage is a prerequisite for merging.

Integration Test Gaps

The commit history notes 77/91 integration tests passing, with 14 failures attributed to schema gaps (dictionary_entries table, preferred_image_provider column). These are described as "pre-existing" but they're failing against code in this PR. Options:

Fix the schema gaps in this PR (preferred)
Add explicit .skip with a linked tracking issue for each failure
Document them as blocked by a specific pending migration PR

Merging with known test failures — even "pre-existing" ones — sets a bad precedent and makes future regressions harder to detect.

Positive Observations

Good fixes included in this PR:

Auth hardening in tutor-qortex.ts: Failing fast when session JWT is missing prevents silently sending unauthenticated requests to Qortex. This was a real vulnerability.
Race condition fix in lessons.ts: Using upsert with ignoreDuplicates for lesson completion prevents TOCTOU double-awarding of XP on concurrent requests. Correct pattern.
Architectural direction: The TutorPersistence interface + injectable persistence is the right design. Decoupling LLM logic from Supabase auth calls makes the tutor testable without a live DB and enables backend switching (Qortex).
Service bus consolidation: The route simplifications (e.g., lesson routes going from 127 to 19 lines) show the service bus is doing real work, not just indirecting.
Test additions: The new lib/ai/__tests__/ suite adds substantial coverage for the model factory, processors, graphs, and prompt rendering.

Minor Issues

app/api/v1/content/generate/route.ts is entirely deleted (97 lines). If this endpoint was public or used anywhere, callers need to be notified or redirected. Confirm no external consumers.
The docs/odin/VIDEO_SCRIPTS.md addition (+84 lines) appears unrelated to the stated refactor — consider whether this belongs in a separate PR or commit.
components/reader/TextRenderPanel.tsx (+33/-18) is also unrelated to the tutor persistence decoupling.

Summary

Area	Status
Architectural direction	✅ Correct
PR scope vs. description	❌ Significant mismatch
CI	❌ Quality Checks failing
Odin test deletions	❌ ~1,564 lines unexplained
Integration test failures	⚠️ 14 known failures need resolution/tracking
Security fixes (JWT, race condition)	✅ Good
Base branch (`staging`)	✅ Correct

Blockers before merge:

Fix CI failures
Explain and resolve the Odin test deletions (restore or document retirement)
Resolve the 14 integration test failures or add tracked skips with linked issues

🤖 Generated with Claude Code

Peleke and others added 26 commits January 23, 2026 09:47

This was referenced Feb 23, 2026

Decouple tutor tools from Supabase — extract TutorPersistence interface #259

Closed

test: Comprehensive AI layer test coverage — 12 untested routes, 6 critical gaps, E2E blind spots #268

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor(tutor): decouple tools from Supabase via TutorPersistence#260

refactor(tutor): decouple tools from Supabase via TutorPersistence#260
Peleke wants to merge 26 commits intostagingfrom
feature/decouple-tutor-persistence

Peleke commented Feb 22, 2026 •

edited

Loading

Uh oh!

claude bot commented Feb 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Peleke commented Feb 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Why

Test plan

Uh oh!

claude bot commented Feb 22, 2026

Code Review: refactor(tutor): decouple tools from Supabase via TutorPersistence

Scope Mismatch

CI Failure

Test Coverage Regression — Odin Subsystem

Integration Test Gaps

Positive Observations

Minor Issues

Summary

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Peleke commented Feb 22, 2026 •

edited

Loading