Add Mistral as an alternative AI backend by ddulic · Pull Request #1 · ddulic/supernote

ddulic · 2026-03-10T19:41:46Z

Summary

Introduces an `AIService` abstract base class (`ocr_image`, `embed_text`, `generate_json`, `provider_name`) to decouple the processing pipeline from any specific AI provider
Adds `MistralService` implementing `AIService` — uses the dedicated `mistral-ocr-latest` OCR API, `mistral-embed` for embeddings, and `mistral-large-latest` with JSON mode for summaries
Refactors `GeminiService` to inherit `AIService` and accept model names at construction time, including a separate `SUPERNOTE_GEMINI_CHAT_MODEL` setting (fixes bug where chat/summary generation was accidentally using the OCR model)
Replaces `GeminiOcrModule`/`GeminiEmbeddingModule` with provider-agnostic `OcrModule`/`EmbeddingModule`
Provider is selected at startup: Mistral is used when `SUPERNOTE_MISTRAL_API_KEY` is set, otherwise falls back to Gemini
Changes default ports from 8080/8081 to 8000/8001 for the API and file server

New config options

Env var	Default	Description
`SUPERNOTE_MISTRAL_API_KEY`	—	Enables Mistral backend when set
`SUPERNOTE_MISTRAL_OCR_MODEL`	`mistral-ocr-latest`	Dedicated OCR model
`SUPERNOTE_MISTRAL_EMBEDDING_MODEL`	`mistral-embed`	Embedding model
`SUPERNOTE_MISTRAL_CHAT_MODEL`	`mistral-large-latest`	Chat model for summaries
`SUPERNOTE_MISTRAL_MAX_CONCURRENCY`	`5`	Max concurrent API calls (minimum 1)
`SUPERNOTE_GEMINI_CHAT_MODEL`	`gemini-2.0-flash`	Gemini chat model for summaries

Port change

Default ports are now 8000 (API) and 8001 (file server), changed from 8080/8081. Update any existing configurations, reverse proxies, or firewall rules accordingly.

Robustness improvements

Semaphore deadlock prevention: Both `GeminiService` and `MistralService` clamp `max_concurrency` to a minimum of 1; invalid env var values (non-integer or `< 1`) log a warning instead of silently misconfiguring
Embedding validation: `GeminiService.embed_text()` raises `ValueError` on empty/missing embedding values; `EmbeddingModule` validates before persisting to the database
Search zero-norm guard: `SearchService` skips candidate embeddings with zero L2 norm to prevent NaN/inf cosine similarity scores
Mistral OCR robustness: `ocr_image()` uses safe `getattr` access for `response.pages` and skips pages with missing markdown, returning an empty string rather than raising
Compact JSON schema prompts: `generate_json` serializes the schema without indentation to reduce token usage
PNG chunk accumulation: OCR module uses list + `b"".join()` instead of `bytes +=` to avoid O(n²) memory copies for large pages
Valid JSON guarantee: `MistralService.generate_json()` uses `json.dumps` (not `str()`) for non-string SDK responses to always return parseable JSON

Important note on provider switching

Mistral embeddings are 1024-dimensional vs Gemini's 3072-dimensional. Switching providers invalidates all stored embeddings — notes will need to be re-processed after a provider change. Mixing embeddings from different providers in the same search index is not supported.

Introduces an AIService abstraction layer so the server can use either Google Gemini or Mistral AI for OCR, embeddings, and summary generation. The active backend is selected at startup based on which API key is configured (Mistral takes precedence when both are set). - Add AIService abstract base class (ocr_image, embed_text, generate_json) - Add MistralService implementing AIService via the mistralai SDK - Refactor GeminiService to inherit AIService; accept model names at construction instead of per-call; add high-level AIService methods - Replace GeminiOcrModule/GeminiEmbeddingModule with provider-agnostic OcrModule/EmbeddingModule accepting AIService - Update SummaryModule and SearchService to use AIService - Add Mistral config fields with env var support (SUPERNOTE_MISTRAL_API_KEY, SUPERNOTE_MISTRAL_OCR_MODEL, SUPERNOTE_MISTRAL_EMBEDDING_MODEL, SUPERNOTE_MISTRAL_CHAT_MODEL, SUPERNOTE_MISTRAL_MAX_CONCURRENCY) - Add mistralai>=1.0.0 to server dependencies - Update all tests to use the new AIService-based mocks Note: switching AI providers invalidates stored embeddings; all notes will need to be re-processed after a provider change.

Copilot

Pull request overview

This PR introduces a provider-agnostic AI abstraction for the server’s processing pipeline, adds a Mistral-backed implementation, and refactors existing Gemini integration + processor modules/tests to use the new interface.

Changes:

Add AIService ABC and implement MistralService (OCR, embeddings, JSON generation) alongside a refactored GeminiService implementing the same interface.
Refactor server modules (OcrModule, EmbeddingModule, SummaryModule, SearchService) to depend on AIService rather than Gemini-specific calls.
Add Mistral configuration options and startup-time provider selection (Mistral when SUPERNOTE_MISTRAL_API_KEY is set, otherwise Gemini), plus broad test updates/mocks.

Reviewed changes

Copilot reviewed 21 out of 21 changed files in this pull request and generated 3 comments.

Show a summary per file

File	Description
tests/server/services/test_search.py	Updates search tests to mock `AIService.embed_text()` instead of Gemini-specific embedding APIs.
tests/server/services/test_processor_modules.py	Updates orchestration tests to use provider-agnostic `OcrModule`/`EmbeddingModule` specs.
tests/server/services/test_processor.py	Updates Gemini service construction in concurrency test to include model parameters.
tests/server/services/processor_modules/test_summary_module.py	Updates summary module tests to use `generate_json()` and provider name for `data_source`.
tests/server/services/processor_modules/test_processor_integration.py	Updates integration test to mock `AIService` methods (`ocr_image`, `embed_text`, `generate_json`).
tests/server/services/processor_modules/test_ocr.py	Refactors OCR module tests to `OcrModule` and `AIService.ocr_image()` contract.
tests/server/services/processor_modules/test_embedding.py	Refactors embedding module tests to `EmbeddingModule` and `AIService.embed_text()`.
tests/server/services/processor_modules/conftest.py	Introduces a shared `mock_ai_service` fixture (with backward-compatible alias).
tests/server/routes/test_extended.py	Updates route-level Gemini mocking to patch `is_configured` as a property and `embed_text()`.
tests/server/mcp/test_client.py	Updates MCP tests to patch Gemini’s property + new embedding method.
supernote/server/utils/gemini_content.py	Removes Gemini-specific request-building helper (logic moved into provider-agnostic OCR module).
supernote/server/services/search.py	Refactors search to call `AIService.embed_text()` and removes Gemini-specific embedding response handling.
supernote/server/services/processor_modules/summary.py	Refactors summary generation to call `AIService.generate_json()` and record provider name.
supernote/server/services/processor_modules/ocr.py	Replaces Gemini OCR module with provider-agnostic `OcrModule` building a text prompt + calling `ocr_image()`.
supernote/server/services/processor_modules/embedding.py	Replaces Gemini embedding module with provider-agnostic `EmbeddingModule` calling `embed_text()`.
supernote/server/services/mistral.py	Adds `MistralService` implementation using the `mistralai` SDK (vision OCR, embeddings, JSON mode).
supernote/server/services/gemini.py	Refactors `GeminiService` to implement `AIService` and expose `ocr_image()`, `embed_text()`, `generate_json()`.
supernote/server/services/ai_service.py	Adds the `AIService` abstract interface for AI backends.
supernote/server/config.py	Adds Mistral-related configuration fields + env var loading.
supernote/server/app.py	Adds startup-time selection between Mistral and Gemini; wires `AIService` through search + processor modules.
pyproject.toml	Adds `mistralai` dependency to the `server` extras.

Comments suppressed due to low confidence (3)

supernote/server/services/processor_modules/embedding.py:84

EmbeddingModule.process() persists whatever ai_service.embed_text() returns without checking for an empty embedding. Storing [] (or a wrong-length vector) will later cause search to fail/mis-rank results. Validate that the returned vector is non-empty (and ideally numeric / expected dimensionality) and raise/mark the task failed when it isn’t.
supernote/server/services/processor_modules/ocr.py:112
PNG bytes are assembled via repeated png_data += chunk inside an async loop. This is quadratic for many chunks and can become a hotspot for larger pages. Prefer accumulating chunks in a list/bytearray and joining once at the end.
supernote/server/services/search.py:90
embed_text() results are used without validating that an embedding was actually returned. If embed_text() returns an empty list (or all zeros), query_norm becomes 0 and the cosine similarity computation will produce divide-by-zero / NaN and/or dimension errors later. Add a guard to treat empty/zero-norm embeddings as an error (log and return []) before computing similarities.

        try:
            embedding_values = await self.ai_service.embed_text(query)
            query_embedding = np.array(embedding_values)
        except (ValueError, RuntimeError, TypeError) as e:
            logger.error(f"Failed to fetch or process query embedding: {e}")
            return []

        query_norm = np.linalg.norm(query_embedding)

        # 2. Fetch Candidates

You can also share your feedback on Copilot code review. Take the survey.

- Switch MistralService.ocr_image() from Pixtral chat completions to the dedicated mistral-ocr-latest API (client.ocr.process_async) - Update default mistral_ocr_model config to mistral-ocr-latest - Validate non-empty embedding before persisting in EmbeddingModule - Accumulate PNG chunks in list then join once (avoid quadratic bytes concat) - Guard zero-norm query embedding in SearchService to prevent NaN scores

Copilot

Pull request overview

Copilot reviewed 21 out of 21 changed files in this pull request and generated 2 comments.

You can also share your feedback on Copilot code review. Take the survey.

- Add gemini_chat_model config field (SUPERNOTE_GEMINI_CHAT_MODEL, default gemini-2.0-flash) and fix bug where gemini_ocr_model was incorrectly used for summary/chat generation - Raise ValueError in GeminiService.embed_text() when values is empty/None, consistent with MistralService and the EmbeddingModule guard added earlier - Remove unused config parameter from SearchService constructor and call sites - Document in MistralService.ocr_image() why prompt is intentionally unused

Copilot

Pull request overview

Copilot reviewed 21 out of 21 changed files in this pull request and generated 3 comments.

Comments suppressed due to low confidence (1)

supernote/server/services/processor_modules/summary.py:205

The exception handler around generate_json logs the error and returns, which causes ProcessorModule.run() to mark the task as COMPLETED even though summary generation failed. Re-raise the exception (or let it propagate) so the task is marked FAILED and can be retried/diagnosed correctly.

        try:
            response_text = await self.ai_service.generate_json(
                prompt=prompt,
                schema=build_json_schema(SummaryResponse).to_dict(),
            )
        except Exception as e:
            logger.error(f"Failed to generate AI summary for file {file_id}: {e}")
            return

You can also share your feedback on Copilot code review. Take the survey.

- Remove unused logger/logging import from GeminiService and MistralService (would fail ruff unused-variable check in CI) - Fix test_summary_module assertion to use call_args.kwargs["prompt"] since generate_json is called with keyword arguments, making call_args.args empty

Copilot

Pull request overview

Copilot reviewed 21 out of 21 changed files in this pull request and generated 2 comments.

Comments suppressed due to low confidence (1)

supernote/server/services/processor_modules/summary.py:204

In process(), failures from ai_service.generate_json(...) are caught and then the function returns. Because ProcessorModule.run() marks the task as COMPLETED whenever process() returns without raising, this will incorrectly mark the summary task successful even when the AI call failed. To keep task state accurate, either let the exception propagate (preferred) or re-raise a descriptive exception after logging.

        try:
            response_text = await self.ai_service.generate_json(
                prompt=prompt,
                schema=build_json_schema(SummaryResponse).to_dict(),
            )
        except Exception as e:
            logger.error(f"Failed to generate AI summary for file {file_id}: {e}")
            return

You can also share your feedback on Copilot code review. Take the survey.

…tent fallback - Guard against zero-norm candidate embeddings in SearchService cosine similarity to prevent NaN/inf scores, mirroring the existing query guard - Change generate_json non-string fallback from "" to str(content) so unexpected response shapes produce something inspectable rather than silent empty string - Add unit tests for MistralService covering OCR, embeddings, JSON generation, error paths, and concurrency limiting via semaphore

Copilot

Pull request overview

Copilot reviewed 22 out of 22 changed files in this pull request and generated 1 comment.

You can also share your feedback on Copilot code review. Take the survey.

- README: update tagline, pipeline description, and quick start to show both Gemini and Mistral as provider options - server/README: update feature description, prerequisites, and expand AI configuration section with full env var tables for both providers, including the embedding dimension switching note - note_processing_design: replace Gemini-specific references with provider-agnostic language

Copilot

Pull request overview

Copilot reviewed 25 out of 25 changed files in this pull request and generated 2 comments.

Comments suppressed due to low confidence (1)

supernote/server/services/gemini.py:45

max_concurrency can be set to 0/negative (via config/env), which will create an asyncio.Semaphore that blocks forever and effectively disables the service. Please validate max_concurrency >= 1 (raise a clear error or coerce to 1) before constructing/using the semaphore.

        max_concurrency: int = 5,
    ) -> None:
        self.api_key = api_key
        self._ocr_model = ocr_model
        self._embedding_model = embedding_model
        self._chat_model = chat_model
        self.max_concurrency = max_concurrency
        self._client: genai.Client | None = None
        self._semaphore: asyncio.Semaphore | None = None
        if self.api_key:
            self._client = genai.Client(
                api_key=self.api_key, http_options={"api_version": "v1alpha"}
            )

    @property
    def is_configured(self) -> bool:
        return self._client is not None

    @property
    def provider_name(self) -> str:
        return "GEMINI"

    def _get_semaphore(self) -> asyncio.Semaphore:
        """Lazy initialization of semaphore to ensure it's in the correct event loop."""
        if self._semaphore is None:
            self._semaphore = asyncio.Semaphore(self.max_concurrency)
        return self._semaphore

You can also share your feedback on Copilot code review. Take the survey.

…lock - Clamp max_concurrency to minimum 1 in GeminiService and MistralService constructors so a bad value can never create a blocking semaphore - Log a warning and clamp to 1 when SUPERNOTE_GEMINI_MAX_CONCURRENCY or SUPERNOTE_MISTRAL_MAX_CONCURRENCY env vars are set to < 1 - Document the minimum value of 1 in server README config tables

Copilot

Pull request overview

Copilot reviewed 25 out of 25 changed files in this pull request and generated 1 comment.

You can also share your feedback on Copilot code review. Take the survey.

Mirrors the earlier SearchService cleanup — config was assigned but never read anywhere in the module after the Gemini-specific refactor. Removed from the constructor, app.py call site, and both test fixtures.

Copilot

Pull request overview

Copilot reviewed 25 out of 25 changed files in this pull request and generated 4 comments.

You can also share your feedback on Copilot code review. Take the survey.

…arnings - Handle missing/empty pages in MistralService.ocr_image() with safe getattr access and per-page markdown guard, returning "" instead of raising TypeError - Use compact json.dumps separators in generate_json() to reduce prompt token usage when sending the schema to the model - Log a warning (instead of silently passing) when SUPERNOTE_GEMINI_MAX_CONCURRENCY or SUPERNOTE_MISTRAL_MAX_CONCURRENCY env vars contain non-integer values

Copilot

Pull request overview

Copilot reviewed 25 out of 25 changed files in this pull request and generated 2 comments.

You can also share your feedback on Copilot code review. Take the survey.

- Update import from 'from mistralai import Mistral' to 'from mistralai.client import Mistral' for mistralai>=2.0.0 compatibility - Bump pyproject.toml constraint to mistralai>=2.0.0 - Fix test_full_processing_pipeline_with_real_file: generate_json was mocked to return '{}' (no segments) but assertion expected summary content from OCR text — fix mock to return a proper segments JSON and assert against it

…SON output

Copilot

Pull request overview

Copilot reviewed 25 out of 26 changed files in this pull request and generated no new comments.

You can also share your feedback on Copilot code review. Take the survey.

…alidation, and search - Add test_gemini.py: full coverage for GeminiService (was completely untested) — is_configured, provider_name, ocr_image, embed_text empty/missing values, generate_json, max_concurrency clamping, and concurrency limit enforcement - test_mistral.py: add edge cases for empty/missing OCR pages, pages without markdown, generate_json with non-string content (json.dumps path), empty choices, and max_concurrency clamping to 1 - test_embedding.py: verify EmbeddingModule marks task FAILED when AI returns empty vector - test_search.py: verify zero-norm candidate embeddings are skipped and don't produce NaN scores

Copilot

Pull request overview

Copilot reviewed 26 out of 27 changed files in this pull request and generated no new comments.

You can also share your feedback on Copilot code review. Take the survey.

- CONTRIBUTING.md: show both Gemini and Mistral API key options for local dev - note_processing_design.md: replace GeminiAPIError example with generic ValueError - PLAN.md: update Phase 5 to reflect AIService abstraction, MistralService, and renamed OcrModule/EmbeddingModule (were GeminiOcrModule/GeminiEmbeddingModule)

- Remove stale type: ignore[arg-type] comments in gemini.py and mistral.py - Use proper UserMessage type for Mistral chat messages with cast to satisfy mypy list invariance requirement - Handle Optional[List[float]] from Mistral embedding response - Add method-assign to type: ignore comments on AsyncMock assignments in test_gemini.py and test_mistral.py

Switch to more standard Python web server defaults (8000 for main server, 8001 for MCP server) to avoid conflicts with common port 8080/8081 usage.

Copilot

Pull request overview

Copilot reviewed 34 out of 35 changed files in this pull request and generated 2 comments.

You can also share your feedback on Copilot code review. Take the survey.

- Re-raise exception in SummaryModule.process() after logging so the task is marked FAILED (not COMPLETED) when AI summary generation fails - Update mistral_api_key docstring to mention summaries alongside OCR and embeddings - Remove partial API key characters from log output for both Gemini and Mistral API keys to avoid inadvertent credential exposure

ddulic requested a review from Copilot March 10, 2026 19:42

Copilot started reviewing on behalf of ddulic March 10, 2026 19:43 View session

Copilot AI reviewed Mar 10, 2026

View reviewed changes

Comment thread supernote/server/app.py Outdated

Comment thread supernote/server/services/gemini.py Outdated

Comment thread supernote/server/services/search.py Outdated

ddulic requested a review from Copilot March 10, 2026 19:54

Copilot started reviewing on behalf of ddulic March 10, 2026 19:54 View session

Copilot AI reviewed Mar 10, 2026

View reviewed changes

Comment thread supernote/server/config.py

Comment thread supernote/server/services/mistral.py Outdated

ddulic requested a review from Copilot March 10, 2026 20:08

Copilot started reviewing on behalf of ddulic March 10, 2026 20:08 View session

Copilot AI reviewed Mar 10, 2026

View reviewed changes

Comment thread supernote/server/services/mistral.py Outdated

Comment thread tests/server/services/processor_modules/test_summary_module.py Outdated

Comment thread supernote/server/services/gemini.py

ddulic requested a review from Copilot March 10, 2026 20:18

Copilot started reviewing on behalf of ddulic March 10, 2026 20:18 View session

Copilot AI reviewed Mar 10, 2026

View reviewed changes

Comment thread supernote/server/services/search.py

Comment thread supernote/server/services/mistral.py Outdated

ddulic requested a review from Copilot March 10, 2026 20:28

Copilot started reviewing on behalf of ddulic March 10, 2026 20:28 View session

Copilot AI reviewed Mar 10, 2026

View reviewed changes

Comment thread tests/server/services/test_mistral.py Outdated

ddulic added 2 commits March 10, 2026 20:33

Fix incorrect ocr_model value in concurrency test

cf75c77

ddulic requested a review from Copilot March 10, 2026 20:39

Copilot started reviewing on behalf of ddulic March 10, 2026 20:40 View session

Copilot AI reviewed Mar 10, 2026

View reviewed changes

Comment thread supernote/server/config.py Outdated

Comment thread supernote/server/services/mistral.py

ddulic requested a review from Copilot March 10, 2026 20:48

Copilot started reviewing on behalf of ddulic March 10, 2026 20:49 View session

Copilot AI reviewed Mar 10, 2026

View reviewed changes

Comment thread supernote/server/services/processor_modules/summary.py Outdated

Remove unused config parameter from SummaryModule

515e773

Mirrors the earlier SearchService cleanup — config was assigned but never read anywhere in the module after the Gemini-specific refactor. Removed from the constructor, app.py call site, and both test fixtures.

ddulic requested a review from Copilot March 10, 2026 21:04

Copilot started reviewing on behalf of ddulic March 10, 2026 21:04 View session

Copilot AI reviewed Mar 10, 2026

View reviewed changes

Comment thread supernote/server/services/mistral.py Outdated

Comment thread supernote/server/services/mistral.py Outdated

Comment thread supernote/server/config.py Outdated

Comment thread supernote/server/config.py Outdated

ddulic requested a review from Copilot March 10, 2026 21:10

Copilot started reviewing on behalf of ddulic March 10, 2026 21:11 View session

Copilot AI reviewed Mar 10, 2026

View reviewed changes

Comment thread tests/server/services/processor_modules/test_processor_integration.py Outdated

Comment thread supernote/server/services/mistral.py Outdated

ddulic added 3 commits March 10, 2026 21:23

Update uv.lock for mistralai>=2.0.0

6c2c9bc

Use json.dumps for non-string generate_json content to ensure valid J…

0aa2ab0

…SON output

ddulic requested a review from Copilot March 10, 2026 21:33

Copilot started reviewing on behalf of ddulic March 10, 2026 21:34 View session

Copilot AI reviewed Mar 10, 2026

View reviewed changes

ddulic requested a review from Copilot March 10, 2026 21:52

Copilot started reviewing on behalf of ddulic March 10, 2026 21:52 View session

Copilot AI reviewed Mar 10, 2026

View reviewed changes

ddulic added 2 commits March 10, 2026 21:59

ddulic force-pushed the feat/mistral-ai-backend branch from 227bd8f to ec2ec1f Compare March 11, 2026 12:55

Change default ports from 8080/8081 to 8000/8001

cdb725e

Switch to more standard Python web server defaults (8000 for main server, 8001 for MCP server) to avoid conflicts with common port 8080/8081 usage.

ddulic requested a review from Copilot March 11, 2026 13:42

Copilot started reviewing on behalf of ddulic March 11, 2026 13:44 View session

Copilot AI reviewed Mar 11, 2026

View reviewed changes

Comment thread supernote/server/config.py

Comment thread supernote/server/config.py Outdated

ddulic merged commit 1672069 into main Mar 11, 2026

Conversation

ddulic commented Mar 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

New config options

Port change

Robustness improvements

Important note on provider switching

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

ddulic commented Mar 10, 2026 •

edited

Loading