Skip to content

embed: per-model dynamic token-limit detection #3

@RomneyDa

Description

@RomneyDa

Follow-up to #2 (fixed in bb31e8f).

The fix shipped a hardcoded 24,000-rune cap in two layers:

  • MaxEmbeddingTextRunes in internal/store/embedding_tasks.go (primary, participates in the content hash so future cap changes force re-embed).
  • maxEmbeddingInputRunes in internal/openai/client.go (defensive cap before each request).

Sized for OpenAI's 8192-token limit at a ~3 chars/token floor. Works for every current OpenAI embedding model (text-embedding-3-small, text-embedding-3-large, text-embedding-ada-002 all share 8192).

Limitations of the constant approach:

  • New models with different limits need a code change.
  • Non-OpenAI providers behind GITCRAWL_OPENAI_BASE_URL may have different caps.
  • Conservative ratio wastes ~25% of available context on typical English (~4 chars/token).

Options to consider:

  1. Static model → token_limit table. Cheapest; one line per new model.
  2. tiktoken-go for exact pre-flight token counting (still needs a model→limit table). +1.5 MB BPE table.
  3. Probe-and-cache from 400 error responses (maximum input length is N tokens). Adaptive but relies on OpenAI's error string format.
  4. Opportunistically read context_length from /v1/models for compatible providers (LiteLLM, vLLM); fall back to table.

OpenAI itself does not expose token limits via API.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions