Skip to content

feat: add batch embeddings#50

Merged
0xbbjoker merged 1 commit into1.xfrom
feat/batch-embeddings
Dec 26, 2025
Merged

feat: add batch embeddings#50
0xbbjoker merged 1 commit into1.xfrom
feat/batch-embeddings

Conversation

@0xbbjoker
Copy link
Copy Markdown

@0xbbjoker 0xbbjoker commented Dec 26, 2025

add batch embeddings


Note

Batch embeddings pipeline

  • Adds batch embedding flow in document-processor.ts with EMBEDDING_BATCH_SIZE=100, shouldUseBatchEmbeddings, generateEmbeddingsBatch, and generateBatchEmbeddingsViaRuntime (uses runtime.useModel(ModelType.TEXT_EMBEDDING, { texts })), with automatic fallback to per-chunk embedding.
  • Embedding result handling standardized (zero-vector checks) and improved logging; failed chunks are pre-populated in results.

Rate limiting and config tweaks

  • Updates defaults in config.ts for batch mode: MAX_CONCURRENT_REQUESTS=100, REQUESTS_PER_MINUTE=500, TOKENS_PER_MINUTE=1000000, and clarifies comments; retains BATCH_DELAY_MS.
  • Simplifies client-side rate limiter (actual limits handled by API headers) and adds clearer wait logging.

Misc

  • Version bump to 1.6.1 in package.json.

Written by Cursor Bugbot for commit cbf3e1a. This will update automatically on new commits. Configure here.

Summary by CodeRabbit

  • Chores

    • Released version 1.6.1.
  • Performance

    • Updated rate-limiting configuration thresholds to optimize batch processing workflows.
    • Implemented batch-based embedding generation to reduce API call overhead and improve throughput.

✏️ Tip: You can customize this high-level summary in your review settings.

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Dec 26, 2025

Caution

Review failed

The pull request is closed.

Walkthrough

Version bumped to 1.6.1 with rate-limiting configuration adjusted for batch optimization (reduced concurrent requests, increased throughput caps). Document processor refactored to support batch-based embedding generation alongside individual fallbacks, with runtime integration for batch API calls.

Changes

Cohort / File(s) Summary
Version & Dependency Management
package.json
Version incremented from 1.6.0 to 1.6.1.
Rate-Limiting Configuration
src/config.ts
Default rate-limit values updated for batch-oriented workloads: MAX_CONCURRENT_REQUESTS (150 → 100), REQUESTS_PER_MINUTE (300 → 500), TOKENS_PER_MINUTE (750000 → 1000000). Comments revised to emphasize batch embedding optimization.
Batch Embedding Feature & Refactoring
src/document-processor.ts
Introduced batch embedding support: new EMBEDDING_BATCH_SIZE, shouldUseBatchEmbeddings, generateEmbeddingsBatch, and generateBatchEmbeddingsViaRuntime functions. Refactored generateEmbeddingsForChunks to route between batch and individual paths, add token estimation, and synchronize rate limiting. Added generateEmbeddingsIndividual as fallback linear path. Updated rate limiter commentary. Enhanced error handling and logging for batch operations.

Sequence Diagram(s)

sequenceDiagram
    participant Processor as Document Processor
    participant Batcher as Batch Router
    participant Runtime as Runtime/Model Service
    participant Embedder as Embedding Generator

    Processor->>Batcher: generateEmbeddingsForChunks(chunks)
    
    rect rgb(200, 220, 255)
    Note over Batcher: Check Config
    Batcher->>Batcher: shouldUseBatchEmbeddings?
    end

    alt Batch Mode Enabled
        rect rgb(220, 240, 220)
        Batcher->>Runtime: generateEmbeddingsBatch(textArray)
        Runtime->>Runtime: useModel(batch embeddings path)
        Runtime-->>Batcher: embeddings[] or fallback
        Batcher->>Embedder: generateEmbeddingsIndividual(failed chunks)
        Embedder-->>Batcher: individual embeddings
        end
    else Batch Mode Disabled
        rect rgb(240, 220, 220)
        Batcher->>Embedder: generateEmbeddingsIndividual(chunks)
        Embedder->>Runtime: generateEmbeddingWithValidation per chunk
        Runtime-->>Embedder: embedding
        Embedder-->>Batcher: embeddings[]
        end
    end

    Batcher-->>Processor: results with tokens estimated & rate-limited
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Poem

🐰 Batches bundled, speeds align,
A hundred texts in one design,
From chunks to streams, embeddings flow,
Where batch and fallback softly go!

✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch feat/batch-embeddings

📜 Recent review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Cache: Disabled due to data retention organization setting

Knowledge base: Disabled due to data retention organization setting

📥 Commits

Reviewing files that changed from the base of the PR and between 4d9f87d and cbf3e1a.

📒 Files selected for processing (3)
  • package.json
  • src/config.ts
  • src/document-processor.ts

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@0xbbjoker 0xbbjoker merged commit ab5660a into 1.x Dec 26, 2025
1 of 2 checks passed
const chunk = batch[i];
const embedding = embeddings[i];

if (embedding && embedding.length > 0 && embedding[0] !== 0) {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Incorrect embedding validation rejects valid embeddings with zero first element

The validation check embedding[0] !== 0 incorrectly rejects valid embeddings where the first element happens to be zero. This is inconsistent with generateEmbeddingWithValidation which only checks !embedding || embedding.length === 0. A valid embedding vector can legitimately have zero as its first component. If the intent is to detect a true zero vector, all elements would need to be checked, not just the first one.

Fix in Cursor Fix in Web

text: chunk.contextualizedText,
});
}
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing rate limiting in batch embedding error fallback path

When batch embedding fails in generateEmbeddingsBatch, the fallback loop processes chunks individually by calling generateEmbeddingWithValidation without invoking the rateLimiter. The rate limiter was only called once for the entire batch before the try block. This could lead to API rate limit errors or service disruption when the batch fails and all individual requests fire rapidly.

Fix in Cursor Fix in Web

}
return (result as { embedding: number[] })?.embedding || [];
})
);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Concurrent fallback requests bypass rate limiting entirely

In generateBatchEmbeddingsViaRuntime, when the handler returns a single embedding instead of batch results, the fallback uses Promise.all to process all texts concurrently without any rate limiting. This sends all individual embedding requests simultaneously, which could overwhelm the API and trigger rate limit errors, especially for large batches of up to 100 texts.

Fix in Cursor Fix in Web

// Fall back to individual processing for this batch
for (const chunk of batch) {
try {
const result = await generateEmbeddingWithValidation(runtime, chunk.contextualizedText);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Batch fallback lacks retry logic for rate limit errors

The fallback path in generateEmbeddingsBatch calls generateEmbeddingWithValidation directly without wrapping it in withRateLimitRetry. This is inconsistent with generateEmbeddingsIndividual which uses withRateLimitRetry to handle 429 errors with automatic retry. When batch processing fails and falls back to individual calls, any rate limit errors will immediately fail rather than being retried, leading to unnecessary chunk failures.

Fix in Cursor Fix in Web

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant