embed: per-model dynamic token-limit detection

Follow-up to #2 (fixed in bb31e8f).

The fix shipped a hardcoded **24,000-rune** cap in two layers:
- `MaxEmbeddingTextRunes` in `internal/store/embedding_tasks.go` (primary, participates in the content hash so future cap changes force re-embed).
- `maxEmbeddingInputRunes` in `internal/openai/client.go` (defensive cap before each request).

Sized for OpenAI's 8192-token limit at a ~3 chars/token floor. Works for every current OpenAI embedding model (`text-embedding-3-small`, `text-embedding-3-large`, `text-embedding-ada-002` all share 8192).

Limitations of the constant approach:
- New models with different limits need a code change.
- Non-OpenAI providers behind `GITCRAWL_OPENAI_BASE_URL` may have different caps.
- Conservative ratio wastes ~25% of available context on typical English (~4 chars/token).

Options to consider:
1. Static `model → token_limit` table. Cheapest; one line per new model.
2. `tiktoken-go` for exact pre-flight token counting (still needs a model→limit table). +1.5 MB BPE table.
3. Probe-and-cache from 400 error responses (`maximum input length is N tokens`). Adaptive but relies on OpenAI's error string format.
4. Opportunistically read `context_length` from `/v1/models` for compatible providers (LiteLLM, vLLM); fall back to table.

OpenAI itself does not expose token limits via API.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

embed: per-model dynamic token-limit detection #3

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

embed: per-model dynamic token-limit detection #3

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions