Conversation
📝 WalkthroughWalkthroughThis PR introduces a RAG (Retrieval Augmented Generation) quality evaluation framework, including semantic retrieval with Supabase vector store integration, embedding pipelines with fallback support, knowledge ingestion capabilities, and evaluation metrics (precision and faithfulness) to assess RAG response quality against defined thresholds. Changes
Sequence DiagramssequenceDiagram
participant User
participant Runner as Eval Runner
participant Pipeline as Message Pipeline
participant LLM as ChatOpenAI (Judge)
participant VectorStore as Vector Store
User->>Runner: Run evaluation suite
loop For each EVAL_DATASET case
Runner->>Pipeline: Execute processMessagePipeline(agentState)
Pipeline->>VectorStore: Semantic search for context
VectorStore-->>Pipeline: Retrieved documents + confidence
Pipeline-->>Runner: Final answer + context
Runner->>Runner: Compute Precision (keyword match)
Runner->>LLM: Judge faithfulness (context + answer)
LLM-->>Runner: YES/NO response
Runner->>Runner: Compute Faithfulness score
Runner->>Runner: Accumulate metrics
end
Runner->>Runner: Compare avg Precision (≥0.8?), avg Faithfulness (≥0.95?)
alt Thresholds pass
Runner-->>User: Exit code 0 ✓
else Thresholds fail
Runner-->>User: Exit code 1 ✗
end
sequenceDiagram
participant Source as IngestionSource
participant Ingestion as ingestKnowledge()
participant Embeddings as getEmbeddings()
participant Database as Supabase DB
participant VectorStore as Vector Store
Source->>Ingestion: source (content, sourceType, title)
Ingestion->>Ingestion: loadDocument(source)
Ingestion->>Database: singleDocumentUpsert(doc)
Database-->>Ingestion: stored document with id
Ingestion->>Database: clearDocumentChunks(documentId)
Ingestion->>Ingestion: RecursiveCharacterTextSplitter (1000/200)
Ingestion->>Embeddings: getEmbeddings()
Embeddings-->>Ingestion: embeddings instance
Ingestion->>VectorStore: getVectorStore(embeddings)
Ingestion->>VectorStore: addChunksToVectorStore(chunks, documentId)
VectorStore->>Database: Store chunks + embeddings
Ingestion-->>Source: Result {documentId, chunksAdded, version}
sequenceDiagram
participant Query as Query Input
participant Retrieval as retrievalChain
participant VectorStore as Semantic Search
participant Confidence as confidenceChain
participant Policy as policyChain
Query->>Retrieval: normalizedInput
Retrieval->>VectorStore: similaritySearch (threshold 0.7)
VectorStore-->>Retrieval: [Document, score][] (top 4)
Retrieval-->>Retrieval: Extract context + confidence score
Retrieval->>Confidence: state {intent: RAG, retrievalConfidence}
alt retrievalConfidence < 0.7
Confidence-->>Confidence: confidence = 0.3 (fallback)
else retrievalConfidence >= 0.7
Confidence-->>Confidence: confidence = 0.9
end
Confidence->>Policy: Updated state
alt confidence < 0.7 path triggered
Policy-->>Policy: finalResponse = "I need clarification"
Policy-->>Policy: intent = CLARIFICATION
end
Policy-->>Query: Updated state with safe response
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Poem
🚥 Pre-merge checks | ✅ 1 | ❌ 2❌ Failed checks (1 warning, 1 inconclusive)
✅ Passed checks (1 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 12
🧹 Nitpick comments (7)
packages/llm/src/rag/ingestion.ts (1)
17-33:loadDocumentis unnecessarily async.The function contains no
awaitexpressions. Removingasyncwould simplify the signature and make the synchronous nature explicit.♻️ Suggested simplification
-export async function loadDocument(source: IngestionSource) { +export function loadDocument(source: IngestionSource) {🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/llm/src/rag/ingestion.ts` around lines 17 - 33, The function loadDocument is marked async but contains no await, so remove the async keyword and make it a synchronous function; update its signature (export function loadDocument(source: IngestionSource)) and ensure any callers expecting a Promise are adjusted to accept the plain return value (or wrap in Promise.resolve at callsites if needed), keeping the returned docs array and metadata construction unchanged.packages/llm/src/langchain/types.ts (1)
13-15: Consider defining a stricter type forcitations.
Record<string, unknown>[]is quite permissive. A dedicatedCitationinterface would improve type safety and self-documentation:interface Citation { documentId: string; chunkId?: string; source: string; score?: number; }This would make the expected shape of citation objects explicit and catch invalid usages at compile time.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/llm/src/langchain/types.ts` around lines 13 - 15, Add a stricter Citation type and use it for the citations field: create an exported interface named Citation (with at least documentId: string, source: string, and optional chunkId?: string and score?: number) and replace the loose type Record<string, unknown>[] on the citations property with Citation[] (update any usages/imports of citations to match the new interface name).packages/llm/package.json (1)
14-23: Note:pdf-parseappears unused in the current implementation.The
ingestion.tsfile mentions PDF handling in comments but only implements text-based loading. If PDF support is planned for a future iteration, consider deferring this dependency until needed to reduce bundle size.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/llm/package.json` around lines 14 - 23, The package.json currently lists "pdf-parse" as a dependency but ingestion.ts doesn't use it (only text loaders are implemented); remove "pdf-parse" from the dependencies in package.json to avoid unnecessary bundle size or, if PDF support is intended soon, add a clear TODO in ingestion.ts referencing pdf-parse and move the dependency to an optional/dev or feature-specific install so it isn't shipped by default; update package.json to reflect the removal and ensure no import of pdf-parse exists in the codebase.packages/llm/src/langchain/chains/__tests__/confidence.test.ts (1)
16-34: Add the exact0.7boundary case.The production branch is
retrievalConfidence < 0.7, so the equality case is the one most likely to regress silently. A dedicated0.7assertion here would lock the threshold behavior down.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/llm/src/langchain/chains/__tests__/confidence.test.ts` around lines 16 - 34, Add a test case that verifies the boundary retrievalConfidence == 0.7 behavior: create an AgentState with intent 'RAG' and retrievalConfidence set to 0.7, call confidenceChain.invoke(state) and assert the returned result.confidence matches the expected branch (either 0.9 or 0.3 per production rule); place this alongside the existing tests that use confidenceChain.invoke and AgentState so the equality case is explicitly covered.packages/llm/src/langchain/chains/__tests__/retrieval.test.ts (1)
26-56: Cover thenormalizedInputshort-circuit explicitly.Both cases here go through semantic search.
packages/llm/src/langchain/chains/retrieval.ts:6-24also has a guard that should skipsearchKnowledgeentirely and return empty defaults whennormalizedInputis missing, and that branch is currently untested.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/llm/src/langchain/chains/__tests__/retrieval.test.ts` around lines 26 - 56, Add a test that explicitly covers the short-circuit when normalizedInput is absent: create an AgentState with normalizedInput undefined or empty, call retrievalChain.invoke(state), assert that searchSpy (or searchKnowledge) was not called, and that the returned result has retrievedContext === '' and retrievalConfidence === 0; reference retrievalChain.invoke, searchSpy/searchKnowledge and AgentState to locate where to add the test.packages/llm/src/rag/__tests__/ingestion.test.ts (1)
9-19: Use a sentinel vector store instead of assertingundefined.With the current mock, this test still passes even if
ingestKnowledgebreaks thegetVectorStore(...)→addChunksToVectorStore(...)hand-off entirely. Returning a sentinel object here makes that wiring observable.Suggested change
+const mockVectorStore = { __brand: 'mock-vector-store' }; + // Mock the vectorstore module vi.mock('../vectorstore.js', () => ({ - getVectorStore: vi.fn(), + getVectorStore: vi.fn().mockReturnValue(mockVectorStore), singleDocumentUpsert: vi.fn().mockResolvedValue({ id: 'mock-doc-id', @@ expect(vectorstoreModule.addChunksToVectorStore).toHaveBeenCalledWith( - undefined, // getVectorStore is mocked to return undefined empty fn + mockVectorStore, expect.arrayContaining([ expect.objectContaining({ pageContent: source.content, @@ expect(vectorstoreModule.addChunksToVectorStore).toHaveBeenCalledWith( - undefined, + mockVectorStore, expect.arrayContaining([expect.objectContaining({ pageContent: expect.any(String) })]), 'mock-doc-id', );Also applies to: 57-65, 86-90
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/llm/src/rag/__tests__/ingestion.test.ts` around lines 9 - 19, Replace the undefined-returning mocks with a sentinel vector-store object so the hand-off from getVectorStore(...) to addChunksToVectorStore(...) is observable: have getVectorStore return a simple sentinel object (e.g., { addChunksToVectorStore: vi.fn().mockResolvedValue('SENTINEL') }) and update any mock blocks that currently set clearDocumentChunks/addChunksToVectorStore to mockResolvedValue(undefined) (including the other similar vi.mock blocks) to use that sentinel method; keep singleDocumentUpsert as-is but assert the sentinel's addChunksToVectorStore was called to verify wiring between getVectorStore and addChunksToVectorStore.packages/llm/src/langchain/chains/__tests__/policy.test.ts (1)
19-55: Add a combined unsafe + low-confidence RAG regression case.These tests cover the safety path and the low-confidence RAG path separately, but
packages/llm/src/langchain/chains/policy.ts:10-24evaluates them in sequence. A single case exercising both conditions would lock down the intended precedence and catch future regressions.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@packages/llm/src/langchain/chains/__tests__/policy.test.ts` around lines 19 - 55, Add a new test in policy.test.ts that constructs an AgentState with intent 'RAG', a low retrievalConfidence (e.g. 0.5), and a composedResponse containing inappropriate content, then invoke policyChain.invoke and assert that safety takes precedence: expect result.isSafe toBe(false), result.finalResponse toBe('I cannot fulfill this request due to policy restrictions.'), and that the intent was not changed to 'CLARIFICATION' (e.g. expect(result.intent).toBe('RAG')) so the combined unsafe+low-confidence path is locked down; reference policyChain.invoke and AgentState when locating where to add the test.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@packages/llm/src/evals/runner.ts`:
- Around line 41-75: The runner currently omits a correctness check comparing
the produced answer to the expected answer; add a correctness evaluation that
compares answer against testCase.expectedAnswer (either via a direct normalized
string comparison or, preferably, by asking the llm to judge correctness similar
to faithfulness). Create a correctnessPrompt (mirroring faithfulnessPrompt)
submitting CONTEXT, ANSWER, and EXPECTED_ANSWER, call
llm.invoke(correctnessPrompt), parse the judge response (expecting "YES"/"NO"),
compute a correctnessScore (1 or 0), add it to a new totalCorrectness
accumulator and include it in logs (e.g., logging "- Correctness Score: ..."),
and ensure final aggregated metrics incorporate this new correctness metric
alongside precision and faithfulness.
- Around line 78-86: The averages are incorrectly divided by EVAL_DATASET.length
even when some items are skipped after errors; update the logic to track the
number of successful evaluations (e.g., introduce a successfulCount incremented
inside the loop where totalPrecision and totalFaithfulness are accumulated) and
compute avgPrecision and avgFaithfulness by dividing by successfulCount instead
of EVAL_DATASET.length, guarding against successfulCount === 0 (return 0 or
NaN/appropriate default) to avoid divide-by-zero; adjust any variable names
(totalPrecision, totalFaithfulness, avgPrecision, avgFaithfulness, EVAL_DATASET)
in runner.ts to use the new successfulCount.
In `@packages/llm/src/langchain/chains/confidence.ts`:
- Around line 14-20: The comment in the RAG branch is inconsistent with the
actual threshold: state.intent === 'RAG' checks state.retrievalConfidence < 0.7
but the comment says "Below 0.6 will route to clarification_path"; update either
the numeric threshold or the comment so they match. Locate the RAG branch where
retrievalConfidence is evaluated (state.retrievalConfidence) and either change
the comparison to < 0.6 to match the comment or revise the comment to state "<
0.7 will route to clarification_path" so the condition and explanation are
consistent.
In `@packages/llm/src/langchain/chains/policy.ts`:
- Around line 17-24: The low-confidence branch currently overwrites the safety
enforcement; ensure the safety block wins by checking isSafe before applying the
RAG low-confidence override or by returning immediately when the unsafe policy
response is set. Specifically, in the section that inspects state.intent ===
'RAG' and state.retrievalConfidence, only set finalResponse = 'I need
clarification' if isSafe is true (or move that block after an early return when
the unsafe response has been assigned), so the policy safety response (isSafe /
finalResponse) cannot be replaced by the low-confidence handler.
In `@packages/llm/src/langchain/chains/retrieval.ts`:
- Around line 15-29: Wrap the semantic retrieval call in a try/catch so
transient failures don't reject retrievalChain: around the
getVectorStore/getEmbeddings and searchKnowledge call (the block that reads
state.normalizedInput and calls getVectorStore(getEmbeddings()) then await
searchKnowledge(...)), catch any error, log it (use available logger or
console.error), and set highestConfidence = 0, retrievedContext = '', and
citations = [] so the pipeline continues with empty RAG context; keep the
existing successful-path behavior that maps results into highestConfidence,
retrievedContext, and citations when searchKnowledge returns results.
In `@packages/llm/src/rag/embeddings.ts`:
- Around line 9-52: The class PrimaryFallbackEmbeddings declares a fallback
provider but never uses it; remove the unused GoogleGenerativeAIEmbeddings
allocation and rename the class to OpenAIEmbeddingsWrapper (or similar) so
intent matches implementation: delete the fallback property and any comments
about dimension mismatch, keep only the primary (OpenAIEmbeddings)
initialization, and ensure embedDocuments and embedQuery call primary as they
currently do; update the class name and constructor accordingly (references:
PrimaryFallbackEmbeddings, primary, fallback, embedDocuments, embedQuery).
In `@packages/llm/src/rag/ingestion.ts`:
- Around line 46-52: singleDocumentUpsert is being called with a
non-deterministic source id `manual-${Date.now()}`, which breaks upsert
deduplication; replace that with a deterministic identifier (e.g., compute a
stable content-based hash of `source.content` such as SHA-256 or
require/validate `source.sourceUrl`) and pass that hash (or the validated
`sourceUrl`) as the `source` parameter so `(source, version)` remains a stable
conflict key when re-ingesting; update the call site that builds `source`
(referencing singleDocumentUpsert, source.sourceUrl, source.content, docVersion,
docMetadata) and ensure any hashing utility is imported/used consistently.
In `@packages/llm/src/rag/vectorstore.ts`:
- Around line 7-13: The module currently creates supabaseClient at import time
using supaUrl/supaKey which will throw if env vars are missing; instead
initialize the client lazily and validate config inside a getter. Remove the
top-level createClient call and implement a getSupabaseClient (or move logic
into getVectorStore) that: 1) reads process.env.SUPABASE_URL and
SUPABASE_SERVICE_ROLE_KEY, 2) throws a clear error if either is missing, and 3)
creates and caches a singleton via createClient (assigning to a module-scoped
supabaseClient) the first time it’s called; update getVectorStore to call this
getter. Reference symbols: supaUrl, supaKey, supabaseClient, createClient, and
getVectorStore.
- Around line 28-50: The upsert currently casts the Supabase row to
KnowledgeDocument which hides snake_case mismatches; in singleDocumentUpsert,
explicitly map the returned data fields to a KnowledgeDocument object instead of
using "as KnowledgeDocument": take data.id, data.source, data.title,
data.content, data.version, data.metadata and convert data.created_at ->
createdAt and data.updated_at -> updatedAt (and any other snake_case DB fields)
before returning; ensure types align with the KnowledgeDocument interface and
remove the unsafe cast.
- Around line 71-90: The addChunksToVectorStore implementation nests document_id
and chunk_index inside metadata but then calls SupabaseVectorStore.addDocuments,
which only writes content/embedding/metadata/id and will leave the required
top-level NOT NULL columns null, causing constraint violations; modify
addChunksToVectorStore to perform a direct insert into the knowledge_chunks
table using the Supabase client (instead of vectorStore.addDocuments) and supply
explicit column values for content, embedding (compute or reuse embeddings from
vectorStore if available), metadata, id, document_id, and chunk_index so the
database receives top-level document_id and chunk_index columns and respects the
unique constraint on (document_id, chunk_index).
In `@packages/shared/src/rag.ts`:
- Around line 12-20: The exported type KnowledgeChunk is orphaned and its
camelCase fields (documentId, chunkIndex) don't match the DB snake_case schema;
either delete the KnowledgeChunk type if unused, or rename its properties to
document_id and chunk_index to match the database and add a brief comment above
the type stating it mirrors DB column names for schema alignment; search for
KnowledgeChunk to ensure no imports exist before removing, or update any uses to
the new snake_case field names if you choose to keep it.
---
Nitpick comments:
In `@packages/llm/package.json`:
- Around line 14-23: The package.json currently lists "pdf-parse" as a
dependency but ingestion.ts doesn't use it (only text loaders are implemented);
remove "pdf-parse" from the dependencies in package.json to avoid unnecessary
bundle size or, if PDF support is intended soon, add a clear TODO in
ingestion.ts referencing pdf-parse and move the dependency to an optional/dev or
feature-specific install so it isn't shipped by default; update package.json to
reflect the removal and ensure no import of pdf-parse exists in the codebase.
In `@packages/llm/src/langchain/chains/__tests__/confidence.test.ts`:
- Around line 16-34: Add a test case that verifies the boundary
retrievalConfidence == 0.7 behavior: create an AgentState with intent 'RAG' and
retrievalConfidence set to 0.7, call confidenceChain.invoke(state) and assert
the returned result.confidence matches the expected branch (either 0.9 or 0.3
per production rule); place this alongside the existing tests that use
confidenceChain.invoke and AgentState so the equality case is explicitly
covered.
In `@packages/llm/src/langchain/chains/__tests__/policy.test.ts`:
- Around line 19-55: Add a new test in policy.test.ts that constructs an
AgentState with intent 'RAG', a low retrievalConfidence (e.g. 0.5), and a
composedResponse containing inappropriate content, then invoke
policyChain.invoke and assert that safety takes precedence: expect result.isSafe
toBe(false), result.finalResponse toBe('I cannot fulfill this request due to
policy restrictions.'), and that the intent was not changed to 'CLARIFICATION'
(e.g. expect(result.intent).toBe('RAG')) so the combined unsafe+low-confidence
path is locked down; reference policyChain.invoke and AgentState when locating
where to add the test.
In `@packages/llm/src/langchain/chains/__tests__/retrieval.test.ts`:
- Around line 26-56: Add a test that explicitly covers the short-circuit when
normalizedInput is absent: create an AgentState with normalizedInput undefined
or empty, call retrievalChain.invoke(state), assert that searchSpy (or
searchKnowledge) was not called, and that the returned result has
retrievedContext === '' and retrievalConfidence === 0; reference
retrievalChain.invoke, searchSpy/searchKnowledge and AgentState to locate where
to add the test.
In `@packages/llm/src/langchain/types.ts`:
- Around line 13-15: Add a stricter Citation type and use it for the citations
field: create an exported interface named Citation (with at least documentId:
string, source: string, and optional chunkId?: string and score?: number) and
replace the loose type Record<string, unknown>[] on the citations property with
Citation[] (update any usages/imports of citations to match the new interface
name).
In `@packages/llm/src/rag/__tests__/ingestion.test.ts`:
- Around line 9-19: Replace the undefined-returning mocks with a sentinel
vector-store object so the hand-off from getVectorStore(...) to
addChunksToVectorStore(...) is observable: have getVectorStore return a simple
sentinel object (e.g., { addChunksToVectorStore:
vi.fn().mockResolvedValue('SENTINEL') }) and update any mock blocks that
currently set clearDocumentChunks/addChunksToVectorStore to
mockResolvedValue(undefined) (including the other similar vi.mock blocks) to use
that sentinel method; keep singleDocumentUpsert as-is but assert the sentinel's
addChunksToVectorStore was called to verify wiring between getVectorStore and
addChunksToVectorStore.
In `@packages/llm/src/rag/ingestion.ts`:
- Around line 17-33: The function loadDocument is marked async but contains no
await, so remove the async keyword and make it a synchronous function; update
its signature (export function loadDocument(source: IngestionSource)) and ensure
any callers expecting a Promise are adjusted to accept the plain return value
(or wrap in Promise.resolve at callsites if needed), keeping the returned docs
array and metadata construction unchanged.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 207af570-1946-43e6-a385-251ea9b53df5
⛔ Files ignored due to path filters (1)
package-lock.jsonis excluded by!**/package-lock.json
📒 Files selected for processing (20)
packages/llm/package.jsonpackages/llm/src/evals/dataset.tspackages/llm/src/evals/runner.tspackages/llm/src/langchain/__tests__/pipeline.test.tspackages/llm/src/langchain/chains/__tests__/composition.test.tspackages/llm/src/langchain/chains/__tests__/confidence.test.tspackages/llm/src/langchain/chains/__tests__/policy.test.tspackages/llm/src/langchain/chains/__tests__/retrieval.test.tspackages/llm/src/langchain/chains/composition.tspackages/llm/src/langchain/chains/confidence.tspackages/llm/src/langchain/chains/policy.tspackages/llm/src/langchain/chains/retrieval.tspackages/llm/src/langchain/types.tspackages/llm/src/rag/__tests__/ingestion.test.tspackages/llm/src/rag/__tests__/vectorstore.test.tspackages/llm/src/rag/embeddings.tspackages/llm/src/rag/ingestion.tspackages/llm/src/rag/vectorstore.tspackages/shared/src/index.tspackages/shared/src/rag.ts
| // 1. Evaluate Precision (Did we retrieve chunks containing expected keywords?) | ||
| const contextLower = retrievedContext.toLowerCase(); | ||
| const matchedKeywords = testCase.expectedContextKeywords.filter((kw) => | ||
| contextLower.includes(kw.toLowerCase()), | ||
| ); | ||
|
|
||
| const precisionScore = | ||
| testCase.expectedContextKeywords.length > 0 | ||
| ? matchedKeywords.length / testCase.expectedContextKeywords.length | ||
| : 1.0; | ||
|
|
||
| totalPrecision += precisionScore; | ||
| console.log( | ||
| `- Precision Score: ${(precisionScore * 100).toFixed(0)}% (${matchedKeywords.length}/${testCase.expectedContextKeywords.length} keywords)`, | ||
| ); | ||
|
|
||
| // 2. Evaluate Faithfulness (LLM-as-a-judge check) | ||
| const faithfulnessPrompt = ` | ||
| You are an expert evaluator. Evaluate if the ANSWER is strictly faithful to and grounded by the CONTEXT. | ||
| If the ANSWER contains claims unsupported by the CONTEXT, it is unfaithful. | ||
|
|
||
| CONTEXT: | ||
| ${retrievedContext} | ||
|
|
||
| ANSWER: | ||
| ${answer} | ||
|
|
||
| Is the answer strictly faithful to the context? Reply exact ONLY with "YES" or "NO". | ||
| `; | ||
| const judgeRes = await llm.invoke(faithfulnessPrompt); | ||
| const isFaithful = judgeRes.content.toString().trim().toUpperCase() === 'YES'; | ||
| const faithfulnessScore = isFaithful ? 1 : 0; | ||
|
|
||
| totalFaithfulness += faithfulnessScore; | ||
| console.log(`- Faithfulness Score: ${faithfulnessScore * 100}%`); |
There was a problem hiding this comment.
The gate never checks answer correctness.
Right now this runner scores retrieval keyword overlap and context faithfulness only. A grounded but wrong or irrelevant answer can still pass, because nothing here compares answer to an expected answer.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@packages/llm/src/evals/runner.ts` around lines 41 - 75, The runner currently
omits a correctness check comparing the produced answer to the expected answer;
add a correctness evaluation that compares answer against
testCase.expectedAnswer (either via a direct normalized string comparison or,
preferably, by asking the llm to judge correctness similar to faithfulness).
Create a correctnessPrompt (mirroring faithfulnessPrompt) submitting CONTEXT,
ANSWER, and EXPECTED_ANSWER, call llm.invoke(correctnessPrompt), parse the judge
response (expecting "YES"/"NO"), compute a correctnessScore (1 or 0), add it to
a new totalCorrectness accumulator and include it in logs (e.g., logging "-
Correctness Score: ..."), and ensure final aggregated metrics incorporate this
new correctness metric alongside precision and faithfulness.
| } catch (err: unknown) { | ||
| console.error( | ||
| `- Error during evaluation pipeline: ${err instanceof Error ? err.message : String(err)}`, | ||
| ); | ||
| } | ||
| } | ||
|
|
||
| const avgPrecision = totalPrecision / EVAL_DATASET.length; | ||
| const avgFaithfulness = totalFaithfulness / EVAL_DATASET.length; |
There was a problem hiding this comment.
Don't divide by the full dataset after skipped failures.
The loop logs pipeline/judge errors and continues, but the final averages still divide by EVAL_DATASET.length. One transient failure depresses both metrics and can fail the quality gate for infrastructure reasons instead of model quality.
Suggested fix
let totalPrecision = 0;
let totalFaithfulness = 0;
+ let completedCases = 0;
@@
totalPrecision += precisionScore;
@@
totalFaithfulness += faithfulnessScore;
+ completedCases += 1;
} catch (err: unknown) {
console.error(
`- Error during evaluation pipeline: ${err instanceof Error ? err.message : String(err)}`,
);
}
}
- const avgPrecision = totalPrecision / EVAL_DATASET.length;
- const avgFaithfulness = totalFaithfulness / EVAL_DATASET.length;
+ if (completedCases === 0) {
+ throw new Error('No evaluation cases completed successfully.');
+ }
+
+ const avgPrecision = totalPrecision / completedCases;
+ const avgFaithfulness = totalFaithfulness / completedCases;🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@packages/llm/src/evals/runner.ts` around lines 78 - 86, The averages are
incorrectly divided by EVAL_DATASET.length even when some items are skipped
after errors; update the logic to track the number of successful evaluations
(e.g., introduce a successfulCount incremented inside the loop where
totalPrecision and totalFaithfulness are accumulated) and compute avgPrecision
and avgFaithfulness by dividing by successfulCount instead of
EVAL_DATASET.length, guarding against successfulCount === 0 (return 0 or
NaN/appropriate default) to avoid divide-by-zero; adjust any variable names
(totalPrecision, totalFaithfulness, avgPrecision, avgFaithfulness, EVAL_DATASET)
in runner.ts to use the new successfulCount.
| CRITICAL: If you are answering based on context (RAG route), your answer must be STRICTLY GROUNDED in the provided context. Do not make up facts or claims that are not supported by the context. | ||
| If the provided context is insufficient to confidently answer the user's question, you must respond with exactly "I need clarification" and set the escalate_flag to true.`; |
There was a problem hiding this comment.
Clarification is being turned into human escalation.
StructuredOutputSchema describes escalate_flag as a human-handoff signal, but the prompt asks the model to set it when the answer should be "I need clarification". Because Line 61 maps that flag directly to intent = 'ESCALATION', low-context RAG requests bypass the CLARIFICATION path in packages/llm/src/langchain/chains/policy.ts. Use a separate clarification signal, or keep escalate_flag = false when the model is only asking the user for more detail.
Also applies to: 56-61
| } else if (state.intent === 'RAG') { | ||
| // Reject low-confidence retrieval and trigger fallback | ||
| if (state.retrievalConfidence !== undefined && state.retrievalConfidence < 0.7) { | ||
| confidence = 0.3; // Below 0.6 will route to clarification_path | ||
| } else { | ||
| confidence = 0.9; | ||
| } |
There was a problem hiding this comment.
Misleading comment: threshold mismatch.
Line 17's comment states "Below 0.6 will route to clarification_path" but line 16 uses < 0.7 as the threshold. This discrepancy will confuse maintainers.
📝 Proposed fix
// Reject low-confidence retrieval and trigger fallback
if (state.retrievalConfidence !== undefined && state.retrievalConfidence < 0.7) {
- confidence = 0.3; // Below 0.6 will route to clarification_path
+ confidence = 0.3; // Below routing threshold, will route to clarification_path
} else {📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| } else if (state.intent === 'RAG') { | |
| // Reject low-confidence retrieval and trigger fallback | |
| if (state.retrievalConfidence !== undefined && state.retrievalConfidence < 0.7) { | |
| confidence = 0.3; // Below 0.6 will route to clarification_path | |
| } else { | |
| confidence = 0.9; | |
| } | |
| } else if (state.intent === 'RAG') { | |
| // Reject low-confidence retrieval and trigger fallback | |
| if (state.retrievalConfidence !== undefined && state.retrievalConfidence < 0.7) { | |
| confidence = 0.3; // Below routing threshold, will route to clarification_path | |
| } else { | |
| confidence = 0.9; | |
| } |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@packages/llm/src/langchain/chains/confidence.ts` around lines 14 - 20, The
comment in the RAG branch is inconsistent with the actual threshold:
state.intent === 'RAG' checks state.retrievalConfidence < 0.7 but the comment
says "Below 0.6 will route to clarification_path"; update either the numeric
threshold or the comment so they match. Locate the RAG branch where
retrievalConfidence is evaluated (state.retrievalConfidence) and either change
the comparison to < 0.6 to match the comment or revise the comment to state "<
0.7 will route to clarification_path" so the condition and explanation are
consistent.
| // Enforce grounded response on low RAG retrieval confidence | ||
| if ( | ||
| state.intent === 'RAG' && | ||
| state.retrievalConfidence !== undefined && | ||
| state.retrievalConfidence < 0.7 | ||
| ) { | ||
| finalResponse = 'I need clarification'; | ||
| } |
There was a problem hiding this comment.
Do not let low-confidence handling override the safety block.
If a RAG answer is both unsafe and low-confidence, Line 23 replaces the policy restriction response with "I need clarification". Keep the clarification branch behind isSafe, or return early after the unsafe case so policy enforcement always wins.
Suggested guard
if (
+ isSafe &&
state.intent === 'RAG' &&
state.retrievalConfidence !== undefined &&
state.retrievalConfidence < 0.7
) {🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@packages/llm/src/langchain/chains/policy.ts` around lines 17 - 24, The
low-confidence branch currently overwrites the safety enforcement; ensure the
safety block wins by checking isSafe before applying the RAG low-confidence
override or by returning immediately when the unsafe policy response is set.
Specifically, in the section that inspects state.intent === 'RAG' and
state.retrievalConfidence, only set finalResponse = 'I need clarification' if
isSafe is true (or move that block after an early return when the unsafe
response has been assigned), so the policy safety response (isSafe /
finalResponse) cannot be replaced by the low-confidence handler.
| const storedDoc = await singleDocumentUpsert({ | ||
| source: source.sourceUrl || `manual-${Date.now()}`, | ||
| title: source.title || null, | ||
| content: source.content, | ||
| version: docVersion, | ||
| metadata: docMetadata, | ||
| }); |
There was a problem hiding this comment.
Non-idempotent source identifier will prevent upsert deduplication.
manual-${Date.now()} generates a unique source on every call, defeating the upsert logic which relies on (source, version) as the conflict key. Re-ingesting the same content without a sourceUrl will create duplicate documents instead of updating existing ones.
Consider using a content hash or requiring sourceUrl for upsert scenarios:
🐛 Proposed fix using content hash
+import { createHash } from 'crypto';
+
+function contentHash(content: string): string {
+ return createHash('sha256').update(content).digest('hex').slice(0, 16);
+}
+
const storedDoc = await singleDocumentUpsert({
- source: source.sourceUrl || `manual-${Date.now()}`,
+ source: source.sourceUrl || `manual-${contentHash(source.content)}`,
title: source.title || null,🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@packages/llm/src/rag/ingestion.ts` around lines 46 - 52, singleDocumentUpsert
is being called with a non-deterministic source id `manual-${Date.now()}`, which
breaks upsert deduplication; replace that with a deterministic identifier (e.g.,
compute a stable content-based hash of `source.content` such as SHA-256 or
require/validate `source.sourceUrl`) and pass that hash (or the validated
`sourceUrl`) as the `source` parameter so `(source, version)` remains a stable
conflict key when re-ingesting; update the call site that builds `source`
(referencing singleDocumentUpsert, source.sourceUrl, source.content, docVersion,
docMetadata) and ensure any hashing utility is imported/used consistently.
| const supaUrl = process.env.SUPABASE_URL as string; | ||
| const supaKey = process.env.SUPABASE_SERVICE_ROLE_KEY as string; | ||
|
|
||
| // Ensure we have a singleton supabase client for the vector store | ||
| export const supabaseClient = createClient(supaUrl, supaKey, { | ||
| auth: { persistSession: false }, | ||
| }); |
There was a problem hiding this comment.
Validate the Supabase config before building the singleton.
This client is constructed at import time. If either env var is missing, importing packages/llm/src/rag/vectorstore.ts will fail before any request-level fallback can run. Fail fast with an explicit config check, or initialize the client lazily inside getVectorStore.
Suggested guard
-const supaUrl = process.env.SUPABASE_URL as string;
-const supaKey = process.env.SUPABASE_SERVICE_ROLE_KEY as string;
+const supaUrl = process.env.SUPABASE_URL;
+const supaKey = process.env.SUPABASE_SERVICE_ROLE_KEY;
+
+if (!supaUrl || !supaKey) {
+ throw new Error('SUPABASE_URL and SUPABASE_SERVICE_ROLE_KEY must be set');
+}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| const supaUrl = process.env.SUPABASE_URL as string; | |
| const supaKey = process.env.SUPABASE_SERVICE_ROLE_KEY as string; | |
| // Ensure we have a singleton supabase client for the vector store | |
| export const supabaseClient = createClient(supaUrl, supaKey, { | |
| auth: { persistSession: false }, | |
| }); | |
| const supaUrl = process.env.SUPABASE_URL; | |
| const supaKey = process.env.SUPABASE_SERVICE_ROLE_KEY; | |
| if (!supaUrl || !supaKey) { | |
| throw new Error('SUPABASE_URL and SUPABASE_SERVICE_ROLE_KEY must be set'); | |
| } | |
| // Ensure we have a singleton supabase client for the vector store | |
| export const supabaseClient = createClient(supaUrl, supaKey, { | |
| auth: { persistSession: false }, | |
| }); |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@packages/llm/src/rag/vectorstore.ts` around lines 7 - 13, The module
currently creates supabaseClient at import time using supaUrl/supaKey which will
throw if env vars are missing; instead initialize the client lazily and validate
config inside a getter. Remove the top-level createClient call and implement a
getSupabaseClient (or move logic into getVectorStore) that: 1) reads
process.env.SUPABASE_URL and SUPABASE_SERVICE_ROLE_KEY, 2) throws a clear error
if either is missing, and 3) creates and caches a singleton via createClient
(assigning to a module-scoped supabaseClient) the first time it’s called; update
getVectorStore to call this getter. Reference symbols: supaUrl, supaKey,
supabaseClient, createClient, and getVectorStore.
| export async function singleDocumentUpsert( | ||
| doc: Omit<KnowledgeDocument, 'id' | 'createdAt' | 'updatedAt'>, | ||
| ): Promise<KnowledgeDocument> { | ||
| const { data, error } = await supabaseClient | ||
| .from('knowledge_documents') | ||
| .upsert( | ||
| { | ||
| source: doc.source, | ||
| title: doc.title, | ||
| content: doc.content, | ||
| version: doc.version, | ||
| metadata: doc.metadata, | ||
| }, | ||
| { onConflict: 'source,version', ignoreDuplicates: false }, | ||
| ) | ||
| .select() | ||
| .single(); | ||
|
|
||
| if (error) { | ||
| throw new Error(`Failed to upsert knowledge document: ${error.message}`); | ||
| } | ||
|
|
||
| return data as KnowledgeDocument; |
There was a problem hiding this comment.
🧩 Analysis chain
🌐 Web query:
Does @supabase/supabase-jsv2 automatically convert snake_case Postgres columns likecreated_atandupdated_at into camelCase properties on returned rows?
💡 Result:
No. @supabase/supabase-js v2 returns row objects with keys exactly as they come from PostgREST (so created_at stays created_at, not createdAt). Automatic snake_case ↔ camelCase conversion is a long-standing feature request rather than built-in behavior. [1], [2]
If you want camelCase keys in the response, you must explicitly alias columns in your select, e.g. select('createdAt:created_at, updatedAt:updated_at'). [3]
Sources: [1] [2] [3]
🏁 Script executed:
# Find and read the KnowledgeDocument type definition
fd -t f rag.ts | head -20Repository: Ryanakml/ChatFuse
Length of output: 87
🏁 Script executed:
# Search for the type definition in the shared package
rg 'KnowledgeDocument' packages/shared/src/ -A 10 -B 2Repository: Ryanakml/ChatFuse
Length of output: 568
🏁 Script executed:
# Check the full vectorstore.ts file for similar patterns
wc -l packages/llm/src/rag/vectorstore.tsRepository: Ryanakml/ChatFuse
Length of output: 100
🏁 Script executed:
# Look at the vectorstore.ts file to see other similar patterns
cat -n packages/llm/src/rag/vectorstore.ts | head -100Repository: Ryanakml/ChatFuse
Length of output: 3979
🏁 Script executed:
# Check the rest of the file for other similar patterns
tail -20 packages/llm/src/rag/vectorstore.tsRepository: Ryanakml/ChatFuse
Length of output: 872
🏁 Script executed:
# Search for other 'as KnowledgeDocument' or similar casts in the file
rg 'as KnowledgeDocument|as Document' packages/llm/src/rag/vectorstore.ts -nRepository: Ryanakml/ChatFuse
Length of output: 99
🏁 Script executed:
# Check if there are other upsert or select patterns that might have the same issue
rg '\.select\(\)' packages/llm/src/rag/vectorstore.ts -B 5 -A 3Repository: Ryanakml/ChatFuse
Length of output: 247
Map the returned row to KnowledgeDocument instead of casting it.
@supabase/supabase-js v2 does not automatically convert snake_case Postgres columns to camelCase. The returned object has created_at and updated_at, but the KnowledgeDocument type (from packages/shared/src/rag.ts) expects createdAt and updatedAt. The cast as KnowledgeDocument masks this mismatch at compile time while leaving the fields undefined at runtime.
Example mapping
- return data as KnowledgeDocument;
+ return {
+ id: data.id,
+ source: data.source,
+ title: data.title,
+ content: data.content,
+ version: data.version,
+ metadata: data.metadata,
+ createdAt: data.created_at,
+ updatedAt: data.updated_at,
+ };🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@packages/llm/src/rag/vectorstore.ts` around lines 28 - 50, The upsert
currently casts the Supabase row to KnowledgeDocument which hides snake_case
mismatches; in singleDocumentUpsert, explicitly map the returned data fields to
a KnowledgeDocument object instead of using "as KnowledgeDocument": take
data.id, data.source, data.title, data.content, data.version, data.metadata and
convert data.created_at -> createdAt and data.updated_at -> updatedAt (and any
other snake_case DB fields) before returning; ensure types align with the
KnowledgeDocument interface and remove the unsafe cast.
| export async function addChunksToVectorStore( | ||
| vectorStore: SupabaseVectorStore, | ||
| chunks: Document<Record<string, unknown>>[], | ||
| documentId: string, | ||
| ): Promise<void> { | ||
| // LangChain SupabaseVectorStore allows providing IDs. We can let it auto-gen, but | ||
| // our schema requires document_id and chunk_index explicitly map. | ||
| // Let's format the documents to ensure metadata has what we need. | ||
| const formattedDocs = chunks.map((chunk, index) => { | ||
| return new Document({ | ||
| pageContent: chunk.pageContent, | ||
| metadata: { | ||
| ...chunk.metadata, | ||
| document_id: documentId, | ||
| chunk_index: index, | ||
| }, | ||
| }); | ||
| }); | ||
|
|
||
| await vectorStore.addDocuments(formattedDocs); |
There was a problem hiding this comment.
🧩 Analysis chain
🌐 Web query:
In @langchain/communityv1.1.22, what columns doesSupabaseVectorStore.addDocuments()insert into the configured table? Does it populate arbitrary metadata keys such asdocument_idandchunk_index as top-level columns?
💡 Result:
In @langchain/community@1.1.22, SupabaseVectorStore.addDocuments() embeds each document and then calls addVectors(), which upserts only these top-level fields into your configured tableName:
content(fromDocument.pageContent)embedding(the vector)metadata(the entireDocument.metadataobject, as a single JSON/JSONB value)idonly if you passoptions.ids(it adds{ id: options.ids[i], ...row }).
It does not “spread” arbitrary metadata keys (e.g. document_id, chunk_index) into separate top-level table columns. Those keys remain nested inside the single metadata column/value.
Sources:
🏁 Script executed:
# Find migration files related to knowledge_chunks or database schema
find . -type f \( -name "*.sql" -o -name "*migration*" -o -name "*schema*" \) | grep -iE "(supabase|migration|schema|knowledge)" | head -20Repository: Ryanakml/ChatFuse
Length of output: 244
🏁 Script executed:
# Search for knowledge_chunks table definition in the codebase
rg "knowledge_chunks" -A 10 -B 2 --type sql --type ts --type jsRepository: Ryanakml/ChatFuse
Length of output: 4114
🏁 Script executed:
# Look for migration files in common locations
git ls-files | grep -iE "(migration|supabase|\.sql)" | head -20Repository: Ryanakml/ChatFuse
Length of output: 264
🏁 Script executed:
# Search for RPC definition for match_knowledge_chunks
rg "match_knowledge_chunks" -A 20 --type sqlRepository: Ryanakml/ChatFuse
Length of output: 43
🏁 Script executed:
# Search for triggers on knowledge_chunks table
rg -i "trigger.*knowledge_chunks|knowledge_chunks.*trigger" -A 10 --type sqlRepository: Ryanakml/ChatFuse
Length of output: 43
🏁 Script executed:
# Check the full knowledge_chunks table definition and any surrounding context
cat -n ./supabase/migrations/202603061200_core_schema.sql | grep -A 15 -B 5 "knowledge_chunks"Repository: Ryanakml/ChatFuse
Length of output: 810
🏁 Script executed:
# Check if there are any other migration files that might define RPCs or triggers
cat ./supabase/migrations/202603061230_d2_indexes_retention.sqlRepository: Ryanakml/ChatFuse
Length of output: 686
addDocuments() will fail on database constraint violation.
The schema requires document_id and chunk_index as NOT NULL top-level columns (line 88-89 of supabase/migrations/202603061200_core_schema.sql), enforced by a unique constraint on (document_id, chunk_index). However, SupabaseVectorStore.addDocuments() only inserts content, embedding, metadata (as JSONB), and id. It does not extract or spread metadata keys into separate top-level columns—your nested document_id and chunk_index inside metadata will remain there, leaving the required columns NULL. The insert will fail immediately.
Insert rows directly via supabaseClient.from('knowledge_chunks').insert() with explicit column values, or redesign to accept the LangChain schema (moving document_id and chunk_index into metadata only).
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@packages/llm/src/rag/vectorstore.ts` around lines 71 - 90, The
addChunksToVectorStore implementation nests document_id and chunk_index inside
metadata but then calls SupabaseVectorStore.addDocuments, which only writes
content/embedding/metadata/id and will leave the required top-level NOT NULL
columns null, causing constraint violations; modify addChunksToVectorStore to
perform a direct insert into the knowledge_chunks table using the Supabase
client (instead of vectorStore.addDocuments) and supply explicit column values
for content, embedding (compute or reuse embeddings from vectorStore if
available), metadata, id, document_id, and chunk_index so the database receives
top-level document_id and chunk_index columns and respects the unique constraint
on (document_id, chunk_index).
| export type KnowledgeChunk = { | ||
| id: string; | ||
| documentId: string; | ||
| chunkIndex: number; | ||
| content: string; | ||
| embedding: number[]; | ||
| metadata: Record<string, unknown>; | ||
| createdAt: string; | ||
| }; |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Search for how KnowledgeChunk fields are used in the codebase
rg -n "documentId|chunkIndex|document_id|chunk_index" --type=ts -C2Repository: Ryanakml/ChatFuse
Length of output: 3722
🏁 Script executed:
#!/bin/bash
# Search for where KnowledgeChunk is used with actual data retrieval
rg -n "KnowledgeChunk" --type=ts -C3Repository: Ryanakml/ChatFuse
Length of output: 383
🏁 Script executed:
#!/bin/bash
# Search for database queries and transformations in vectorstore
rg -n "from.*knowledge_chunks|select|fetch" packages/llm/src/rag/vectorstore.ts --type=ts -C3Repository: Ryanakml/ChatFuse
Length of output: 435
🏁 Script executed:
#!/bin/bash
# Search for select queries on knowledge_chunks and any transformation logic
rg -n "select|query|fetch.*knowledge" packages/llm/src/rag/ --type=ts -C5 | head -100Repository: Ryanakml/ChatFuse
Length of output: 5360
🏁 Script executed:
#!/bin/bash
# Search for any snake_case to camelCase transformation utilities
rg -n "camelCase|toCamel|snake_case|toSnake|transformCase" --type=ts | head -50Repository: Ryanakml/ChatFuse
Length of output: 43
🏁 Script executed:
#!/bin/bash
# Search for where KnowledgeChunk is used as a return type or imported
rg -n ": KnowledgeChunk|KnowledgeChunk\[|return.*KnowledgeChunk|import.*KnowledgeChunk" --type=tsRepository: Ryanakml/ChatFuse
Length of output: 43
🏁 Script executed:
#!/bin/bash
# Check the entire vectorstore.ts file to understand the data flow
cat -n packages/llm/src/rag/vectorstore.ts | head -120Repository: Ryanakml/ChatFuse
Length of output: 4798
🏁 Script executed:
#!/bin/bash
# Search entire codebase for any imports or usage of KnowledgeChunk
rg -n "KnowledgeChunk" --type=tsRepository: Ryanakml/ChatFuse
Length of output: 121
🏁 Script executed:
#!/bin/bash
# Check what is exported from rag.ts
cat packages/shared/src/rag.tsRepository: Ryanakml/ChatFuse
Length of output: 696
Remove or fix orphaned KnowledgeChunk type definition.
The KnowledgeChunk type uses camelCase fields (documentId, chunkIndex) but the database schema uses snake_case (document_id, chunk_index). Additionally, this type is exported but never imported or used anywhere in the codebase. Either remove it if not needed, or rename the fields to match the database column names and add a comment explaining the schema alignment.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@packages/shared/src/rag.ts` around lines 12 - 20, The exported type
KnowledgeChunk is orphaned and its camelCase fields (documentId, chunkIndex)
don't match the DB snake_case schema; either delete the KnowledgeChunk type if
unused, or rename its properties to document_id and chunk_index to match the
database and add a brief comment above the type stating it mirrors DB column
names for schema alignment; search for KnowledgeChunk to ensure no imports exist
before removing, or update any uses to the new snake_case field names if you
choose to keep it.
Summary by CodeRabbit
Release Notes
New Features
Bug Fixes
Tests
Chores