Summary
Add an OpenAI embedding backend as an alternative to the local sentence-transformers model. This enables much faster bulk indexing at minimal cost.
Motivation
The current local embedding (MiniLM-L6-v2) takes ~30 seconds per batch of 64 records on a small Fly.io instance. For 250k records, this translates to ~75 hours of indexing time.
OpenAI's embedding API can process the same dataset in minutes for under $1:
- 250k records × ~150 tokens = ~37M tokens
text-embedding-3-small: $0.02/1M tokens = ~$0.75 total
text-embedding-3-large: $0.13/1M tokens = ~$5 total
Proposed Implementation
-
New backend class: OpenAIStorageBackend in osa/infrastructure/index/openai/
-
Config:
class OpenAIBackendConfig(BackendConfig):
api_key: str # or use OPENAI_API_KEY env var
model: str = "text-embedding-3-small"
batch_size: int = 2048 # OpenAI supports up to 2048 inputs per request
dimensions: int | None = None # Optional dimensionality reduction
-
Backend implementation:
- Use
openai Python SDK (async client)
- Batch requests (up to 2048 embeddings per API call)
- Store in ChromaDB (same as current vector backend)
- Handle rate limits with exponential backoff
-
Config selection: Allow choosing backend type in index config:
indexes:
vector:
type: openai # or "local" for sentence-transformers
model: text-embedding-3-small
Alternatives Considered
- Voyage AI: Similar pricing, good quality, but OpenAI is more widely used
- Cohere: Slightly more expensive ($0.10/1M tokens)
- Larger Fly instance: More expensive than API costs for bulk indexing
- GPU instance: Overkill for this use case
Tasks
Summary
Add an OpenAI embedding backend as an alternative to the local sentence-transformers model. This enables much faster bulk indexing at minimal cost.
Motivation
The current local embedding (MiniLM-L6-v2) takes ~30 seconds per batch of 64 records on a small Fly.io instance. For 250k records, this translates to ~75 hours of indexing time.
OpenAI's embedding API can process the same dataset in minutes for under $1:
text-embedding-3-small: $0.02/1M tokens = ~$0.75 totaltext-embedding-3-large: $0.13/1M tokens = ~$5 totalProposed Implementation
New backend class:
OpenAIStorageBackendinosa/infrastructure/index/openai/Config:
Backend implementation:
openaiPython SDK (async client)Config selection: Allow choosing backend type in index config:
Alternatives Considered
Tasks
openaito dependenciesOpenAIBackendConfigOpenAIStorageBackendwith batching and rate limit handling