Conversation
- Embedding shifted to LiteLLM - Quote util shifted to LiteLLM - Minor fix to test quote utils
## Walkthrough
This PR swaps out the OpenAI client for the litellm library across embedding and LLM completion utilities, parameterizes all LLM API credentials and model selection via new environment variables, and updates configuration and test imports accordingly. It refactors report generation to use litellm completions and Directus API calls instead of DB queries and Anthropic client. Task actor priorities are adjusted. No changes to public APIs or control flow beyond these backend and config updates.
## Changes
| File(s) | Change Summary |
|------------------------------------------------------------|----------------------------------------------------------------------------------------------------------------|
| `echo/server/dembrane/config.py` | Added new env vars for SMALL, MEDIUM, LARGE LiteLLM tiers (model, API key, version, base) with assertions and debug logs; removed defaults for Whisper vars with assertions. |
| `echo/server/dembrane/embedding.py` | Replaced OpenAI embedding client with litellm embedding call; updated embedding dim from 1536 to 3072; adjusted response parsing. |
| `echo/server/dembrane/quote_utils.py` | Replaced OpenAI chat completion calls with litellm completion calls; parameterized model and API credentials; adjusted JSON parsing; removed unused imports. |
| `echo/server/dembrane/report_utils.py` | Refactored report content fetching to use Directus API and litellm token counting/completion; removed DB dependency and Anthropic client; added token limit logic and logging. |
| `echo/server/dembrane/api/project.py` | Removed DB parameter from `create_report` endpoint; updated internal call accordingly. |
| `echo/server/dembrane/tasks.py` | Added or adjusted dramatiq actor priorities; changed some task queues; replaced attribute access with dict access in one function; updated logging accordingly. |
| `echo/server/dembrane/reply_utils.py` | Added a FIXME comment before litellm completion call; no functional change. |
| `echo/server/tests/test_quote_utils.py` | Added imports for `generate_uuid` and `EMBEDDING_DIM` to support updated test setups. |
| `echo/server/prompt_templates/system_report.*.jinja` | Added instruction to all language variants forbidding `<br>` tags in output; instruct to use line breaks instead. |
## Sequence Diagram(s)
```mermaid
sequenceDiagram
participant User
participant quote_utils.py
participant config.py
participant litellm
User->>quote_utils.py: Call LLM-related function
quote_utils.py->>config.py: Fetch LLM config (model, API key, etc.)
quote_utils.py->>litellm: Call completion/embedding API with config
litellm-->>quote_utils.py: Return response
quote_utils.py-->>User: Return processed resultAssessment against linked issues
Possibly related PRs
Suggested reviewers
|
…t-codebase-and-convert-them-to
There was a problem hiding this comment.
Actionable comments posted: 2
📜 Review details
Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro
📒 Files selected for processing (4)
echo/server/dembrane/config.py(1 hunks)echo/server/dembrane/embedding.py(2 hunks)echo/server/dembrane/quote_utils.py(11 hunks)echo/server/tests/test_quote_utils.py(1 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (2)
echo/server/tests/test_quote_utils.py (2)
echo/server/dembrane/utils.py (1)
generate_uuid(13-14)echo/server/dembrane/database.py (4)
QuoteModel(447-490)ProcessingStatusEnum(74-78)ProjectAnalysisRunModel(168-200)get_db(591-598)
echo/server/dembrane/quote_utils.py (3)
echo/server/dembrane/s3.py (1)
save_to_s3_from_url(86-116)echo/server/dembrane/ner.py (1)
anonymize_sentence(25-48)echo/server/dembrane/utils.py (2)
generate_uuid(13-14)get_utc_timestamp(49-50)
⏰ Context from checks skipped due to timeout of 90000ms (1)
- GitHub Check: ci-build-servers (dbr-echo-server, ./echo/server, Dockerfile, dbr-echo-server)
🔇 Additional comments (7)
echo/server/dembrane/config.py (1)
176-190: Environment variables for LiteLLM config added correctlyThese new environment variables align perfectly with the LiteLLM integration pattern. Clean implementation of config validation with assertions and consistent debug logging.
echo/server/tests/test_quote_utils.py (1)
10-17: Import additions look solidAdding these imports enables proper test setup with UUID generation and correct embedding dimensionality - critical for maintaining test consistency after the LiteLLM migration.
echo/server/dembrane/embedding.py (1)
5-7: Clean litellm import and OpenAI client removalNice swap of imports. Properly commented out the old OpenAI import instead of just deleting it - makes rollback easier if needed.
echo/server/dembrane/quote_utils.py (4)
10-28: Well-structured imports for LiteLLM migrationGreat job updating the imports to reflect the new LiteLLM dependency while maintaining backwards compatibility with commented-out imports. The config imports follow the standard pattern.
49-50: Properly commented out unused initializationsClean approach to deprecate the old embeddings setup. This follows good practice of commenting out rather than just deleting for easier troubleshooting/rollback.
99-105: Solid implementation of litellm in llm_split_text functionThe migration to litellm.completion is implemented correctly with all required parameters.
467-473: Consistent implementation of litellm across LLM interaction pointsThe LiteLLM migration is consistently applied across all completion endpoints. Code follows the same parameter pattern throughout, making future maintenance easier.
Also applies to: 606-612, 751-757, 777-783, 936-942, 972-978, 998-1004
echo/server/dembrane/embedding.py
Outdated
| return ( | ||
| client.embeddings.create(input=[text], model="text-embedding-3-small").data[0].embedding | ||
| response = litellm.embedding( | ||
| model="text-embedding-3-small", |
There was a problem hiding this comment.
Actionable comments posted: 2
🔭 Outside diff range comments (1)
echo/server/dembrane/reply_utils.py (1)
245-253:⚠️ Potential issueTODO comment without implementation change.
The "FIXME: reply" comment suggests this section needs further modification, but no actual changes were made to the LLM call. Model "anthropic/claude-3-5-sonnet-20240620" is still hardcoded rather than using the new LiteLLM configuration variables.
This hardcoded model name should be replaced with the appropriate LiteLLM configuration variable to complete the migration. Suggest implementing similar to other modules:
- # FIXME: reply - response = await litellm.acompletion( - model="anthropic/claude-3-5-sonnet-20240620", - messages=[ - {"role": "user", "content": prompt}, - {"role": "assistant", "content": ""}, - ], - stream=True, - ) + response = await litellm.acompletion( + model=LARGE_LITELLM_MODEL, + api_key=LARGE_LITELLM_API_KEY, + api_base=LARGE_LITELLM_API_BASE, + api_version=LARGE_LITELLM_API_VERSION, + messages=[ + {"role": "user", "content": prompt}, + {"role": "assistant", "content": ""}, + ], + stream=True, + )
♻️ Duplicate comments (2)
echo/server/dembrane/embedding.py (1)
24-31: 🛠️ Refactor suggestionModel name still hardcoded despite parameterization elsewhere.
The model name "azure/text-embedding-3-large" remains hardcoded while the rest of the system uses environment variables for model selection. This inconsistency needs attention.
- response = litellm.embedding( - api_key=str(LIGHTRAG_LITELLM_EMBEDDING_API_KEY), - api_base=str(LIGHTRAG_LITELLM_EMBEDDING_API_BASE), - api_version=str(LIGHTRAG_LITELLM_EMBEDDING_API_VERSION), - model="azure/text-embedding-3-large", - input=[text], - ) + response = litellm.embedding( + api_key=str(LIGHTRAG_LITELLM_EMBEDDING_API_KEY), + api_base=str(LIGHTRAG_LITELLM_EMBEDDING_API_BASE), + api_version=str(LIGHTRAG_LITELLM_EMBEDDING_API_VERSION), + model=str(LIGHTRAG_LITELLM_EMBEDDING_MODEL), + input=[text], + )echo/server/dembrane/quote_utils.py (1)
1044-1050: 🧹 Nitpick (assertive)Remove redundant FIXME comment
The code has already been migrated to litellm, so the FIXME comment is now redundant.
- # FIXME: use litellm response = completion( model=SMALL_LITELLM_MODEL, messages=messages, api_key=SMALL_LITELLM_API_KEY, api_version=SMALL_LITELLM_API_VERSION, api_base=SMALL_LITELLM_API_BASE, )
📜 Review details
Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro
📒 Files selected for processing (12)
echo/server/dembrane/api/project.py(1 hunks)echo/server/dembrane/config.py(1 hunks)echo/server/dembrane/embedding.py(2 hunks)echo/server/dembrane/quote_utils.py(10 hunks)echo/server/dembrane/reply_utils.py(1 hunks)echo/server/dembrane/report_utils.py(2 hunks)echo/server/dembrane/tasks.py(13 hunks)echo/server/prompt_templates/system_report.de.jinja(1 hunks)echo/server/prompt_templates/system_report.en.jinja(1 hunks)echo/server/prompt_templates/system_report.es.jinja(1 hunks)echo/server/prompt_templates/system_report.fr.jinja(1 hunks)echo/server/prompt_templates/system_report.nl.jinja(1 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (3)
echo/server/dembrane/api/project.py (1)
echo/server/dembrane/report_utils.py (1)
get_report_content_for_project(36-137)
echo/server/dembrane/report_utils.py (3)
echo/server/dembrane/prompts.py (1)
render_prompt(55-88)echo/server/dembrane/api/conversation.py (1)
get_conversation_transcript(292-322)echo/server/dembrane/api/dependency_auth.py (1)
DirectusSession(13-22)
echo/server/dembrane/tasks.py (2)
echo/server/dembrane/quote_utils.py (1)
generate_quotes(118-258)echo/server/dembrane/database.py (1)
QuoteModel(447-490)
🪛 Ruff (0.8.2)
echo/server/dembrane/report_utils.py
5-5: sqlalchemy.orm.Session imported but unused
Remove unused import: sqlalchemy.orm.Session
(F401)
⏰ Context from checks skipped due to timeout of 90000ms (1)
- GitHub Check: ci-check-server
🔇 Additional comments (29)
echo/server/prompt_templates/system_report.fr.jinja (1)
51-52: Clean formatting instruction added for report generation.Adding explicit guidance to avoid HTML break tags in favor of proper newlines - solid move for consistent output formatting across the LiteLLM integration pipeline. Formatting consistency is key for downstream processing.
echo/server/dembrane/config.py (3)
176-191: Structured environment var configuration for tiered LLM models.The implementation follows the right patterns - strict assertions for required vars, explicit debug logging, and proper isolation of different tiers (SMALL, MEDIUM, LARGE). This parameterization approach is crucial for the LiteLLM migration.
192-222: Environment configs for MEDIUM and LARGE tiers follow consistent pattern.The symmetrical implementation across all tiers shows solid design - each tier has the same four variables (MODEL, API_KEY, API_VERSION, API_BASE) with identical validation and logging. This will scale well as model selection evolves.
226-228: Updates to Whisper config settings for consistency.Removed fallbacks in favor of explicit environment variable requirements - good alignment with the fail-fast pattern established for other LiteLLM configs.
echo/server/dembrane/embedding.py (1)
14-14: Updated embedding dimension constant.Bumped from 1536 to 3072 to match the new model capabilities - critical update for downstream vector operations.
echo/server/prompt_templates/system_report.en.jinja (1)
52-53: LGTM! Smart optimization on formatting instructions.The addition of the explicit instruction to avoid
<br>tags and use newlines instead standardizes the LLM output format across all language variants. This is a clean optimization that ensures consistent plain text formatting in the generated reports.echo/server/prompt_templates/system_report.nl.jinja (1)
51-52: LGTM! Consistent formatting standardization.Dutch version mirrors the same formatting optimization found in other language templates. This ensures consistent output handling across different languages. Solid engineered consistency.
echo/server/prompt_templates/system_report.es.jinja (1)
51-52: LGTM! Perfect formatting constraint.Spanish template updated with the same formatting instruction pattern as other language variants. This standardization simplifies downstream content processing. Ship it.
echo/server/prompt_templates/system_report.de.jinja (1)
51-52: LGTM! Formatting consistency achieved.German template gets the same formatting instruction as all other language variants. This cross-language standardization is exactly what we need for clean output processing.
echo/server/dembrane/api/project.py (1)
377-380: LGTM! Smart API abstraction refactor.Clean removal of DB dependency from endpoint. This change elegantly abstracts DB access away from API layer, moving it to the service layer via the
directusclient inget_report_content_for_project(). The function signature is now more focused and follows separation of concerns.echo/server/dembrane/quote_utils.py (6)
10-10: Solid move to litellm for completionsThe switch to litellm provides a unified interface for multiple LLM providers. This is a strong architectural decision that will make future provider switching frictionless.
20-29: Config constants pattern is on pointUsing environment-based config constants rather than hardcoded model names and API credentials is 100% the right move. This parameterization makes the codebase more maintainable and deployable across environments.
94-100: LGTM! Clean implementation of litellmThe migration from OpenAI client to litellm is implemented correctly. Using the SMALL model tier for this simple text splitting operation is resource-efficient.
462-469: Clean litellm implementation with JSON response handlingThe replacement of OpenAI client with litellm is properly implemented including response format specification for structured JSON output. This standardizes our LLM interaction pattern.
473-488: Improved error handling and loggingThe enhanced debug logging and more robust error handling for parsed responses is a great improvement. The explicit JSON parsing from response content is more reliable than depending on
.parsedattribute.
604-610: Consistent migration pattern across all LLM callsThe systematic replacement of all OpenAI client calls with litellm.completion follows the same pattern throughout the file. This consistency is key for maintainability and ensures behavior remains predictable across different function calls.
Also applies to: 749-755, 775-781, 934-940, 970-976, 996-1002, 1044-1050
echo/server/dembrane/report_utils.py (8)
1-4: Clean imports and litellm integrationGood work importing the necessary modules from litellm. The token_counter is a nice addition for managing context size.
7-14: Well-organized config importsThe imports for litellm configuration are properly structured and follow the same pattern as in quote_utils.py. This consistency is key.
20-25: Smart dynamic context length managementBrilliant approach to dynamically set the context length based on model capabilities. The 4.1 model check with 700k tokens vs 128k tokens shows advanced knowledge of model constraints.
36-50: Refactored to use directus API instead of direct DB queriesThe refactoring to use directus API instead of DB queries is a clean architectural improvement that decouples the report generation from database implementation details.
52-80: Token-aware conversation processingThe implementation now properly tracks token counts and limits context length, which is critical for reliable LLM operations. Breaking early when token limit is reached prevents wasted API calls.
81-99: Efficient transcript handling with token countingThis implementation intelligently accumulates transcripts while monitoring token count, ensuring we maximize context utilization without overflowing limits. The early break on limit exceeded is an optimization win.
108-121: Clean litellm integration with simplified message formatThe migration from anthropic client to litellm is well-implemented. Removing the prefill message simplifies the code while maintaining functionality.
123-134: Robust content extractionThe pattern matching to extract content from article tags with a fallback to the raw content is a resilient approach that handles different model output formats.
echo/server/dembrane/tasks.py (5)
117-117: Task prioritization improvementSetting explicit priority=30 for these actors ensures they have consistent execution priority. This is an important optimization for the task queue.
Also applies to: 139-139, 164-164, 181-181
334-334: Standardized priority for core processing tasksSetting priority=50 for these critical processing tasks gives them appropriate precedence in the queue while still allowing urgent tasks to take priority. This balanced approach improves system responsiveness.
Also applies to: 446-446, 463-463, 490-490, 507-507, 524-524, 583-583, 651-651
651-651: Queue optimization for project library creationMoving task_create_project_library from "network" to "cpu" queue is a smart resource allocation improvement, as this task is more CPU-bound than network-bound.
402-404: Updated data access patternChanged from attribute-style access to dictionary-style access to match the directus API response format. This is consistent with the broader migration to API-based data retrieval.
407-407: Consistent dictionary-style accessThe consistent use of dictionary-style access pattern throughout the function ensures compatibility with the directus API response structure.
Also applies to: 415-415, 423-423
| import backoff | ||
| import litellm | ||
|
|
||
| from dembrane.openai import client | ||
| from dembrane.config import ( | ||
| # FIXME: update to use dembrane embeddings | ||
| LIGHTRAG_LITELLM_EMBEDDING_API_KEY, | ||
| LIGHTRAG_LITELLM_EMBEDDING_API_BASE, | ||
| LIGHTRAG_LITELLM_EMBEDDING_API_VERSION, | ||
| ) |
There was a problem hiding this comment.
💡 Verification agent
🧩 Analysis chain
Inconsistent embedding configuration.
Using LIGHTRAG_LITELLM_EMBEDDING_* variables while the comment indicates these should be updated to use "dembrane embeddings". This appears to be a temporary implementation with a TODO marker.
🏁 Script executed:
#!/bin/bash
# Check if SMALL_LITELLM_* variables are used elsewhere for embeddings
rg "SMALL_LITELLM_" --type py | grep -v config.pyLength of output: 2457
🏁 Script executed:
#!/bin/bash
# List all embedding-related config variables to verify available dembrane-specific keys
rg "EMBEDDING" -n echo/server/dembrane/config.pyLength of output: 1402
Embeddings pipeline: wire up dembrane embedding config – LGTM pending fix
We’re still importing and using LIGHTRAG_LITELLM_EMBEDDING_* keys in
echo/server/dembrane/embedding.py (lines 4–12), but config.py only defines those. To complete the “FIXME: update to use dembrane embeddings”:
• In echo/server/dembrane/config.py (lines 329–353):
– Define new DEMBRANE_EMBEDDING_* variables alongside the existing LIGHTRAG_* ones, e.g.:
python DEMBRANE_EMBEDDING_MODEL = os.environ.get("DEMBRANE_EMBEDDING_MODEL") DEMBRANE_EMBEDDING_API_BASE = os.environ.get("DEMBRANE_EMBEDDING_API_BASE") DEMBRANE_EMBEDDING_API_KEY = os.environ.get("DEMBRANE_EMBEDDING_API_KEY") DEMBRANE_EMBEDDING_API_VERSION = os.environ.get("DEMBRANE_EMBEDDING_API_VERSION")
• In echo/server/dembrane/embedding.py (lines 4–12):
– Replace the LIGHTRAG_LITELLM_* imports with the new DEMBRANE_EMBEDDING_* keys and drop the # FIXME:
diff - from dembrane.config import ( - # FIXME: update to use dembrane embeddings - LIGHTRAG_LITELLM_EMBEDDING_API_KEY, - LIGHTRAG_LITELLM_EMBEDDING_API_BASE, - LIGHTRAG_LITELLM_EMBEDDING_API_VERSION, - ) + from dembrane.config import ( + DEMBRANE_EMBEDDING_API_KEY, + DEMBRANE_EMBEDDING_API_BASE, + DEMBRANE_EMBEDDING_API_VERSION, + )
Once those variables are in place and referenced, the embedding module will be fully aligned with the dembrane standard.
🤖 Prompt for AI Agents
In echo/server/dembrane/embedding.py lines 4 to 12, replace the imported
LIGHTRAG_LITELLM_EMBEDDING_API_KEY, LIGHTRAG_LITELLM_EMBEDDING_API_BASE, and
LIGHTRAG_LITELLM_EMBEDDING_API_VERSION with the new DEMBRANE_EMBEDDING_API_KEY,
DEMBRANE_EMBEDDING_API_BASE, and DEMBRANE_EMBEDDING_API_VERSION variables
respectively, and remove the FIXME comment. Also, ensure these
DEMBRANE_EMBEDDING_* variables are defined in echo/server/dembrane/config.py
around lines 329 to 353 by adding environment variable retrievals for
DEMBRANE_EMBEDDING_MODEL, DEMBRANE_EMBEDDING_API_BASE,
DEMBRANE_EMBEDDING_API_KEY, and DEMBRANE_EMBEDDING_API_VERSION.
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (1)
echo/server/dembrane/report_utils.py (1)
4-5: Unused import removal - thanks for the cleanup!The Session import from sqlalchemy.orm has been properly removed. Codebase hygiene++
📜 Review details
Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro
📒 Files selected for processing (1)
echo/server/dembrane/report_utils.py(2 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (1)
echo/server/dembrane/report_utils.py (3)
echo/server/dembrane/prompts.py (1)
render_prompt(55-88)echo/server/dembrane/api/conversation.py (1)
get_conversation_transcript(292-322)echo/server/dembrane/api/dependency_auth.py (1)
DirectusSession(13-22)
⏰ Context from checks skipped due to timeout of 90000ms (1)
- GitHub Check: ci-check-server
🔇 Additional comments (6)
echo/server/dembrane/report_utils.py (6)
1-15: Solid imports for LiteLLM migration 🚀Cleanly swapped in LiteLLM dependencies and config imports. Nicely organized, no cruft.
17-24: Dynamically scaling context window based on model version - clutch!Great optimization to leverage larger context windows (700k) when using GPT-4-1106 while falling back to 128k for other models. This is exactly the kind of model-aware scaling that separates 10x from 100x engineering.
35-46: Function signature cleanup + Directus migration LGTMSmart move removing the db parameter since we're using Directus API now. Query structure looks clean.
48-50: Proper debugging is production-ready engineeringThose debug logs will save someone's bacon during a 3am incident.
51-97: Token management game is strongSolid implementation of token counting with the LiteLLM library. Love how you're checking the token count against MAX_REPORT_CONTEXT_LENGTH before adding each conversation. Early breaking on token limit is a performance win.
122-133: Robust article tag extractionSmart fallback handling when article tags aren't found. This handles edge cases like a pro.
| # Use litellm.completion instead of anthropic client | ||
| response = completion( | ||
| model=MEDIUM_LITELLM_MODEL, | ||
| api_key=MEDIUM_LITELLM_API_KEY, | ||
| api_version=MEDIUM_LITELLM_API_VERSION, | ||
| api_base=MEDIUM_LITELLM_API_BASE, | ||
| # max tokens needed for "anthropic" | ||
| # max_tokens=4096, | ||
| messages=[ | ||
| {"role": "user", "content": prompt_message}, | ||
| # prefill message | ||
| {"role": "assistant", "content": "<article>"}, | ||
| # prefill message only for "anthropic" | ||
| # {"role": "assistant", "content": "<article>"}, | ||
| ], | ||
| ) |
There was a problem hiding this comment.
🧹 Nitpick (assertive)
Clean LiteLLM client integration
Nice work converting from Anthropic to LiteLLM - kept the commented max_tokens and prefill stuff as documentation for future devs. 10/10.
Consider removing the comments if they're not needed after successful deployment.
🤖 Prompt for AI Agents
In echo/server/dembrane/report_utils.py around lines 107 to 120, the commented
lines related to max_tokens and prefill messages for the Anthropic client are no
longer needed after switching to LiteLLM. Remove these commented lines to clean
up the code and avoid confusion for future developers.
…ength of Report (#148) * Shift to LiteLLM - Embedding shifted to LiteLLM - Quote util shifted to LiteLLM - Minor fix to test quote utils * fix up bunch and also ECHO-224 * remove unused Session --------- Co-authored-by: Sameer Pashikanti <sameer.pashikanti@gmail.com>
@spashii , Requesting you look into this and test thoroughly before completing pull
All tests passing.
Summary by CodeRabbit
New Features
Refactor
Chores
<br>tags and requiring line breaks instead.