Skip to content

FEATURE REQUEST:Fix EmbeddingService blocking the event loop #261

@vaishcodescape

Description

@vaishcodescape

Is your feature request related to a problem?

  • Yes, it is related to a problem

Describe the feature you'd like

🌟Feature Description

Offload blocking embedding work from the async event loop so that EmbeddingService does not block other concurrent requests. The service should run CPU/GPU-bound model.encode() calls in a thread pool (e.g. via asyncio.to_thread()) while keeping the public API async.

🔍 Problem Statement

EmbeddingService exposes async methods (get_embedding, get_embeddings, summarize_user_profile, search_similar_profiles) but internally calls synchronous SentenceTransformer code:

  • self.model.encode(...) in get_embedding and get_embeddings is blocking. It runs on CPU/GPU and does not yield to the event loop.
  • While one request is generating embeddings, the entire process is blocked: other HTTP requests, agent tools, and background tasks stall until the encode finishes.
  • The LLM call in summarize_user_profile correctly uses await self.llm.ainvoke(...) and is non-blocking; only the embedding step blocks.

This hurts latency and concurrency wherever the service is used (e.g. issue_processor.py, contributor_recommendation.py, user profiling).

🎯 Expected Outcome

  • Event loop stays responsive during embedding generation: other async work (API handlers, other embeddings, LLM calls) can run while model.encode() runs in a worker thread.
  • No change to the public API: callers keep using await embedding_service.get_embedding(...) and await embedding_service.get_embeddings(...).
  • Implementation approach: add a synchronous helper that performs model.encode() and tensor-to-list conversion; call it from get_embedding and get_embeddings via asyncio.to_thread() (or loop.run_in_executor() with a ThreadPoolExecutor).
  • Optional: run model lazy-load in a thread at first use to avoid blocking on first request.

📷 Screenshots and Design Ideas

Before: One long embedding request blocks the event loop → other requests wait.

After: Embedding runs in a thread pool → event loop continues handling other requests; embedding call still awaited by the original request.

No UI changes; this is a backend concurrency fix.

📋 Additional Context

  • File to change: backend/app/services/embedding_service/service.py
  • Consumers: app/services/github/user/profiling.py, app/services/github/issue_processor.py, app/agents/devrel/github/tools/contributor_recommendation.py
  • Suggested steps:
    1. Add import asyncio.
    2. Add a sync helper method (e.g. _encode(texts)) that calls self.model.encode(...) and returns list(s) of floats.
    3. In get_embedding: replace direct model.encode with await asyncio.to_thread(self._encode, [text]), then return the single embedding list.
    4. In get_embeddings: replace direct model.encode with await asyncio.to_thread(self._encode, texts) and return the list of lists.
  • Verification: While one request is generating embeddings, trigger another (e.g. health check or simple async endpoint); the second should respond without waiting for the first.

Record

  • I agree to follow this project's Code of Conduct
  • I want to work on implementing this feature

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions