Skip to content

modified PDA with better results and working#237

Merged
Deodat-Lawson merged 1 commit into
mainfrom
feature/predictive-document-analysis-improvement
Mar 5, 2026
Merged

modified PDA with better results and working#237
Deodat-Lawson merged 1 commit into
mainfrom
feature/predictive-document-analysis-improvement

Conversation

@Deodat-Lawson
Copy link
Copy Markdown
Owner

No description provided.

@vercel
Copy link
Copy Markdown

vercel Bot commented Mar 5, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
pdr-ai-v2 Building Building Preview, Comment Mar 5, 2026 9:23am

@Deodat-Lawson Deodat-Lawson merged commit 6a7954b into main Mar 5, 2026
3 of 4 checks passed
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 7f624c40ea

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

import { sanitizeErrorMessage } from "~/app/api/agents/predictive-document-analysis/utils/logging";

const embeddingCache = new Map<string, number[]>();
const EMBEDDING_MODEL = "text-embedding-3-large";
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Keep embedding dimensionality aligned with vector columns

Using text-embedding-3-large here produces 3072-dim vectors by default, while predictive search compares them against pgvector columns that are stored as 1536 dimensions (documentSections.embedding / documentRetrievalChunks.embedding). That mismatch causes ANN/vector SQL to fail at runtime, so contextual matching degrades to empty/fallback results instead of semantic retrieval. Keep the embedding output at 1536 (or migrate column dimensions and indexing together) to avoid breaking retrieval quality.

Useful? React with 👍 / 👎.

Comment on lines +144 to +146
eq(predictiveDocumentAnalysisResults.documentId, BigInt(documentId)),
eq(predictiveDocumentAnalysisResults.analysisType, typedAnalysisType),
gt(
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Scope stream polling to the requested analysis variant

The SSE poll query only matches documentId + analysisType in a 5-minute window, so it can pick up a prior row (including a stale row after forceRefresh=true, or a run with different includeRelatedDocs) and emit that as the final result before the newly queued job finishes. This returns incorrect data nondeterministically under repeated/concurrent requests; the poll predicate needs at least includeRelatedDocs and ideally a persisted jobId.

Useful? React with 👍 / 👎.

Comment on lines +94 to +97
const chunkCount = await db
.select({ count: sql<number>`count(*)` })
.from(documentContextChunks)
.where(eq(documentContextChunks.documentId, BigInt(documentId)));
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Fall back to legacy chunks during stream preflight

This preflight check counts only documentContextChunks, but many existing documents still live in legacy pdfChunks (which the synchronous route and Inngest job already handle with fallback). In that case the stream endpoint returns No chunks found and never dispatches analysis even though valid chunks exist. Mirror the same RLM-to-legacy fallback logic here to prevent false 404s.

Useful? React with 👍 / 👎.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant