[Performance] Synchronous embedding generation block event loop in async search endpoints

## Bug Description

The `vector_search` and `hybrid_search` methods in `ChunksHybridSearchRetriever` are defined as `async` functions, but they invoke `embedding_model.embed(query_text)`, which is a synchronous blocking call.

Since the underlying `embed` implementation performs network I/O (via `litellm.embedding`), calling this synchronously blocks the main asyncio event loop. This prevents the server from handling concurrent requests while waiting for the embedding API to respond, leading to significant performance degradation under load.

would it make sense to wrap this blocking call with asyncio.to_thread(...) to avoid blocking the event loop?

### 
https://github.com/MODSetter/SurfSense/blob/d97068882a179f068e0b1337d883619d034f757e/surfsense_backend/app/retriever/chunks_hybrid_search.py#L124-L163

##### BaseEmbeddings.embed
```python
@abstractmethod
    def embed(self, text: str) -> np.ndarray:
        """Embed a text string into a vector representation.

        This method should be implemented for all embeddings models.

        Args:
            text (str): Text string to embed

        Returns:
            np.ndarray: Embedding vector for the text string

        """
        raise NotImplementedError
```

### LiteLLMEmbeddings
```python
    def embed(self, text: str) -> np.ndarray:
        """Get embedding for a single text.

        Args:
            text: Text string to embed

        Returns:
            np.ndarray: Embedding vector

        Raises:
            RuntimeError: If the API call fails after retries

        """
        try:
            kwargs = self._prepare_api_call_kwargs()
            response = litellm.embedding(  # type: ignore
                input=[text],
                **kwargs,
            )

            # Extract embedding from response
            embedding = response.data[0]["embedding"]
            return np.array(embedding, dtype=np.float32)

        except Exception as e:
            raise RuntimeError(f"LiteLLM API error during embedding: {e}") from e
```


## Deployment Type

- [ ] SurfSense Cloud (hosted version)
- [x] Self-hosted version

## Steps to Reproduce


## Expected Behavior

The embedding generation should not block the asyncio event loop. The server should be able to handle other concurrent requests while waiting for the embedding API response.

## Actual Behavior

The entire event loop is blocked during the execution of `embedding_model.embed()`, causing all other pending tasks to wait until the synchronous network call returns.

## Screenshots/Videos


## Environment Information

- **Browser:** Chrome 144
- **Operating System:** MacOS 15
- **SurfSense Version:** self-hosted(latest version)

### Additional Environment Details (for Self-hosted only)

- **Python Version:** [3.12.11]
- **Node.js Version:** [25.5.0]
- **Database:** [PostgreSQL 15]
- **Deployment Method:** [Docker]

## Additional Context


## Logs/Error Messages



## Checklist
- [x] I have searched existing issues to ensure this is not a duplicate
- [x] I have provided all the required information above
- [x] I have added appropriate labels (bug, deployment type)

	async def hybrid_search(
	self,
	query_text: str,
	top_k: int,
	search_space_id: int,
	document_type: str \| None = None,
	start_date: datetime \| None = None,
	end_date: datetime \| None = None,
	) -> list:
	"""
	Hybrid search that returns documents (not individual chunks).

	Each returned item is a document-grouped dict that preserves real DB chunk IDs so
	downstream agents can cite with `[citation:<chunk_id>]`.

	Args:
	query_text: The search query text
	top_k: Number of documents to return
	search_space_id: The search space ID to search within
	document_type: Optional document type to filter results (e.g., "FILE", "CRAWLED_URL")
	start_date: Optional start date for filtering documents by updated_at
	end_date: Optional end date for filtering documents by updated_at

	Returns:
	List of dictionaries containing document data and relevance scores. Each dict contains:
	- chunk_id: a "primary" chunk id for compatibility (best-ranked chunk for the doc)
	- content: concatenated chunk content (useful for reranking)
	- chunks: list[{chunk_id, content}] for citation-aware prompting
	- document: {id, title, document_type, metadata}
	"""
	from sqlalchemy import func, select, text
	from sqlalchemy.orm import joinedload

	from app.config import config
	from app.db import Chunk, Document, DocumentType

	# Get embedding for the query
	embedding_model = config.embedding_model_instance
	query_embedding = embedding_model.embed(query_text)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Performance] Synchronous embedding generation block event loop in async search endpoints #794

Bug Description

BaseEmbeddings.embed

LiteLLMEmbeddings

Deployment Type

Steps to Reproduce

Expected Behavior

Actual Behavior

Screenshots/Videos

Environment Information

Additional Environment Details (for Self-hosted only)

Additional Context

Logs/Error Messages

Checklist

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Uh oh!

[Performance] Synchronous embedding generation block event loop in async search endpoints #794

Description

Bug Description

BaseEmbeddings.embed

LiteLLMEmbeddings

Deployment Type

Steps to Reproduce

Expected Behavior

Actual Behavior

Screenshots/Videos

Environment Information

Additional Environment Details (for Self-hosted only)

Additional Context

Logs/Error Messages

Checklist

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions