Bug Description
The vector_search and hybrid_search methods in ChunksHybridSearchRetriever are defined as async functions, but they invoke embedding_model.embed(query_text), which is a synchronous blocking call.
Since the underlying embed implementation performs network I/O (via litellm.embedding), calling this synchronously blocks the main asyncio event loop. This prevents the server from handling concurrent requests while waiting for the embedding API to respond, leading to significant performance degradation under load.
would it make sense to wrap this blocking call with asyncio.to_thread(...) to avoid blocking the event loop?
|
async def hybrid_search( |
|
self, |
|
query_text: str, |
|
top_k: int, |
|
search_space_id: int, |
|
document_type: str | None = None, |
|
start_date: datetime | None = None, |
|
end_date: datetime | None = None, |
|
) -> list: |
|
""" |
|
Hybrid search that returns **documents** (not individual chunks). |
|
|
|
Each returned item is a document-grouped dict that preserves real DB chunk IDs so |
|
downstream agents can cite with `[citation:<chunk_id>]`. |
|
|
|
Args: |
|
query_text: The search query text |
|
top_k: Number of documents to return |
|
search_space_id: The search space ID to search within |
|
document_type: Optional document type to filter results (e.g., "FILE", "CRAWLED_URL") |
|
start_date: Optional start date for filtering documents by updated_at |
|
end_date: Optional end date for filtering documents by updated_at |
|
|
|
Returns: |
|
List of dictionaries containing document data and relevance scores. Each dict contains: |
|
- chunk_id: a "primary" chunk id for compatibility (best-ranked chunk for the doc) |
|
- content: concatenated chunk content (useful for reranking) |
|
- chunks: list[{chunk_id, content}] for citation-aware prompting |
|
- document: {id, title, document_type, metadata} |
|
""" |
|
from sqlalchemy import func, select, text |
|
from sqlalchemy.orm import joinedload |
|
|
|
from app.config import config |
|
from app.db import Chunk, Document, DocumentType |
|
|
|
# Get embedding for the query |
|
embedding_model = config.embedding_model_instance |
|
query_embedding = embedding_model.embed(query_text) |
|
|
BaseEmbeddings.embed
@abstractmethod
def embed(self, text: str) -> np.ndarray:
"""Embed a text string into a vector representation.
This method should be implemented for all embeddings models.
Args:
text (str): Text string to embed
Returns:
np.ndarray: Embedding vector for the text string
"""
raise NotImplementedError
LiteLLMEmbeddings
def embed(self, text: str) -> np.ndarray:
"""Get embedding for a single text.
Args:
text: Text string to embed
Returns:
np.ndarray: Embedding vector
Raises:
RuntimeError: If the API call fails after retries
"""
try:
kwargs = self._prepare_api_call_kwargs()
response = litellm.embedding( # type: ignore
input=[text],
**kwargs,
)
# Extract embedding from response
embedding = response.data[0]["embedding"]
return np.array(embedding, dtype=np.float32)
except Exception as e:
raise RuntimeError(f"LiteLLM API error during embedding: {e}") from e
Deployment Type
Steps to Reproduce
Expected Behavior
The embedding generation should not block the asyncio event loop. The server should be able to handle other concurrent requests while waiting for the embedding API response.
Actual Behavior
The entire event loop is blocked during the execution of embedding_model.embed(), causing all other pending tasks to wait until the synchronous network call returns.
Screenshots/Videos
Environment Information
- Browser: Chrome 144
- Operating System: MacOS 15
- SurfSense Version: self-hosted(latest version)
Additional Environment Details (for Self-hosted only)
- Python Version: [3.12.11]
- Node.js Version: [25.5.0]
- Database: [PostgreSQL 15]
- Deployment Method: [Docker]
Additional Context
Logs/Error Messages
Checklist
Bug Description
The
vector_searchandhybrid_searchmethods inChunksHybridSearchRetrieverare defined asasyncfunctions, but they invokeembedding_model.embed(query_text), which is a synchronous blocking call.Since the underlying
embedimplementation performs network I/O (vialitellm.embedding), calling this synchronously blocks the main asyncio event loop. This prevents the server from handling concurrent requests while waiting for the embedding API to respond, leading to significant performance degradation under load.would it make sense to wrap this blocking call with asyncio.to_thread(...) to avoid blocking the event loop?
SurfSense/surfsense_backend/app/retriever/chunks_hybrid_search.py
Lines 124 to 163 in d970688
BaseEmbeddings.embed
LiteLLMEmbeddings
Deployment Type
Steps to Reproduce
Expected Behavior
The embedding generation should not block the asyncio event loop. The server should be able to handle other concurrent requests while waiting for the embedding API response.
Actual Behavior
The entire event loop is blocked during the execution of
embedding_model.embed(), causing all other pending tasks to wait until the synchronous network call returns.Screenshots/Videos
Environment Information
Additional Environment Details (for Self-hosted only)
Additional Context
Logs/Error Messages
Checklist