Conversation
|
Caution Review failedThe pull request is closed. WalkthroughVersion bumped to 1.6.1 with rate-limiting configuration adjusted for batch optimization (reduced concurrent requests, increased throughput caps). Document processor refactored to support batch-based embedding generation alongside individual fallbacks, with runtime integration for batch API calls. Changes
Sequence Diagram(s)sequenceDiagram
participant Processor as Document Processor
participant Batcher as Batch Router
participant Runtime as Runtime/Model Service
participant Embedder as Embedding Generator
Processor->>Batcher: generateEmbeddingsForChunks(chunks)
rect rgb(200, 220, 255)
Note over Batcher: Check Config
Batcher->>Batcher: shouldUseBatchEmbeddings?
end
alt Batch Mode Enabled
rect rgb(220, 240, 220)
Batcher->>Runtime: generateEmbeddingsBatch(textArray)
Runtime->>Runtime: useModel(batch embeddings path)
Runtime-->>Batcher: embeddings[] or fallback
Batcher->>Embedder: generateEmbeddingsIndividual(failed chunks)
Embedder-->>Batcher: individual embeddings
end
else Batch Mode Disabled
rect rgb(240, 220, 220)
Batcher->>Embedder: generateEmbeddingsIndividual(chunks)
Embedder->>Runtime: generateEmbeddingWithValidation per chunk
Runtime-->>Embedder: embedding
Embedder-->>Batcher: embeddings[]
end
end
Batcher-->>Processor: results with tokens estimated & rate-limited
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Poem
✨ Finishing touches
🧪 Generate unit tests (beta)
📜 Recent review detailsConfiguration used: Organization UI Review profile: CHILL Plan: Pro Cache: Disabled due to data retention organization setting Knowledge base: Disabled due to data retention organization setting 📒 Files selected for processing (3)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
| const chunk = batch[i]; | ||
| const embedding = embeddings[i]; | ||
|
|
||
| if (embedding && embedding.length > 0 && embedding[0] !== 0) { |
There was a problem hiding this comment.
Incorrect embedding validation rejects valid embeddings with zero first element
The validation check embedding[0] !== 0 incorrectly rejects valid embeddings where the first element happens to be zero. This is inconsistent with generateEmbeddingWithValidation which only checks !embedding || embedding.length === 0. A valid embedding vector can legitimately have zero as its first component. If the intent is to detect a true zero vector, all elements would need to be checked, not just the first one.
| text: chunk.contextualizedText, | ||
| }); | ||
| } | ||
| } |
There was a problem hiding this comment.
Missing rate limiting in batch embedding error fallback path
When batch embedding fails in generateEmbeddingsBatch, the fallback loop processes chunks individually by calling generateEmbeddingWithValidation without invoking the rateLimiter. The rate limiter was only called once for the entire batch before the try block. This could lead to API rate limit errors or service disruption when the batch fails and all individual requests fire rapidly.
| } | ||
| return (result as { embedding: number[] })?.embedding || []; | ||
| }) | ||
| ); |
There was a problem hiding this comment.
Concurrent fallback requests bypass rate limiting entirely
In generateBatchEmbeddingsViaRuntime, when the handler returns a single embedding instead of batch results, the fallback uses Promise.all to process all texts concurrently without any rate limiting. This sends all individual embedding requests simultaneously, which could overwhelm the API and trigger rate limit errors, especially for large batches of up to 100 texts.
| // Fall back to individual processing for this batch | ||
| for (const chunk of batch) { | ||
| try { | ||
| const result = await generateEmbeddingWithValidation(runtime, chunk.contextualizedText); |
There was a problem hiding this comment.
Batch fallback lacks retry logic for rate limit errors
The fallback path in generateEmbeddingsBatch calls generateEmbeddingWithValidation directly without wrapping it in withRateLimitRetry. This is inconsistent with generateEmbeddingsIndividual which uses withRateLimitRetry to handle 429 errors with automatic retry. When batch processing fails and falls back to individual calls, any rate limit errors will immediately fail rather than being retried, leading to unnecessary chunk failures.
add batch embeddings
Note
Batch embeddings pipeline
document-processor.tswithEMBEDDING_BATCH_SIZE=100,shouldUseBatchEmbeddings,generateEmbeddingsBatch, andgenerateBatchEmbeddingsViaRuntime(usesruntime.useModel(ModelType.TEXT_EMBEDDING, { texts })), with automatic fallback to per-chunk embedding.Rate limiting and config tweaks
config.tsfor batch mode:MAX_CONCURRENT_REQUESTS=100,REQUESTS_PER_MINUTE=500,TOKENS_PER_MINUTE=1000000, and clarifies comments; retainsBATCH_DELAY_MS.Misc
1.6.1inpackage.json.Written by Cursor Bugbot for commit cbf3e1a. This will update automatically on new commits. Configure here.
Summary by CodeRabbit
Chores
Performance
✏️ Tip: You can customize this high-level summary in your review settings.