Conversation
- add retry logic upto 3 times for LLM calls - add exception handling for LLM calls
|
Note Other AI code review bot(s) detectedCodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review. WalkthroughAdds LLM-driven auto-selection and follow-up detection to backend chat flows with multilingual prompts, updates conversation citation/selection utilities, and removes several frontend processing indicators/warnings while triggering chat-context refetch when auto-select adds context. Changes
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
LGTM. Pre-merge checks and finishing touches❌ Failed checks (2 warnings)
✅ Passed checks (3 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
📜 Recent review detailsConfiguration used: CodeRabbit UI Review profile: ASSERTIVE Plan: Pro Disabled knowledge base sources:
📒 Files selected for processing (1)
🧰 Additional context used🧬 Code graph analysis (1)echo/server/dembrane/api/chat.py (6)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
📜 Review details
Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro
Disabled knowledge base sources:
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (16)
echo/frontend/src/components/chat/Sources.tsx(1 hunks)echo/frontend/src/components/conversation/AutoSelectConversations.tsx(0 hunks)echo/frontend/src/components/conversation/ConversationAccordion.tsx(0 hunks)echo/frontend/src/routes/project/chat/ProjectChatRoute.tsx(1 hunks)echo/server/dembrane/api/chat.py(6 hunks)echo/server/dembrane/chat_utils.py(5 hunks)echo/server/prompt_templates/auto_select_conversations.de.jinja(1 hunks)echo/server/prompt_templates/auto_select_conversations.en.jinja(1 hunks)echo/server/prompt_templates/auto_select_conversations.es.jinja(1 hunks)echo/server/prompt_templates/auto_select_conversations.fr.jinja(1 hunks)echo/server/prompt_templates/auto_select_conversations.nl.jinja(1 hunks)echo/server/prompt_templates/is_followup_question.de.jinja(1 hunks)echo/server/prompt_templates/is_followup_question.en.jinja(1 hunks)echo/server/prompt_templates/is_followup_question.es.jinja(1 hunks)echo/server/prompt_templates/is_followup_question.fr.jinja(1 hunks)echo/server/prompt_templates/is_followup_question.nl.jinja(1 hunks)
💤 Files with no reviewable changes (2)
- echo/frontend/src/components/conversation/AutoSelectConversations.tsx
- echo/frontend/src/components/conversation/ConversationAccordion.tsx
🧰 Additional context used
🧬 Code graph analysis (3)
echo/server/dembrane/api/chat.py (4)
echo/server/dembrane/prompts.py (1)
render_prompt(58-91)echo/server/dembrane/chat_utils.py (2)
auto_select_conversations(256-375)create_system_messages_for_chat(100-166)echo/server/dembrane/audio_lightrag/utils/lightrag_utils.py (1)
get_project_id(363-365)echo/server/dembrane/database.py (2)
ConversationModel(340-386)ProjectChatMessageModel(256-283)
echo/frontend/src/routes/project/chat/ProjectChatRoute.tsx (1)
echo/frontend/src/config.ts (1)
ENABLE_CHAT_AUTO_SELECT(31-32)
echo/server/dembrane/chat_utils.py (4)
echo/server/dembrane/database.py (1)
ConversationModel(340-386)echo/server/dembrane/api/conversation.py (1)
get_conversation_transcript(398-429)echo/server/dembrane/api/dependency_auth.py (1)
DirectusSession(13-22)echo/server/dembrane/prompts.py (1)
render_prompt(58-91)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: ci-check-server
🔇 Additional comments (19)
echo/frontend/src/components/chat/Sources.tsx (1)
21-21: LGTM! 🚀Clean UI text update that properly reflects the auto-select behavior. Ships it.
echo/server/prompt_templates/is_followup_question.de.jinja (1)
1-21: LGTM! 🔥Solid German template. Structure is clean, JSON-only output requirement is crystal clear, examples are on point. This will ship.
echo/server/prompt_templates/auto_select_conversations.nl.jinja (1)
1-35: LGTM! 💯Dutch template is fire. Structure mirrors the other language variants perfectly. JSON-only constraint is explicit, relevance instructions are comprehensive, and the fallback to empty list is handled. Ships it.
echo/server/prompt_templates/auto_select_conversations.es.jinja (1)
1-35: LGTM! ⚡Spanish template is clean. Consistent structure, clear JSON output format, comprehensive relevance guidance. This will ship without issues.
echo/frontend/src/routes/project/chat/ProjectChatRoute.tsx (1)
124-126: LGTM with a minor thought! 🤔The conditional refetch logic is clean and properly gated. One thing to verify: since this fires in
onResponse(beforeonFinish), make sure the backend has actually committed the auto-selected conversations to the context by this point. If there's any async delay on the server side, you might be refetching too early and getting stale data.Looks solid though—should ship.
echo/server/prompt_templates/is_followup_question.fr.jinja (1)
1-21: LGTM! 🇫🇷French template is solid. Structure is consistent, JSON output is strictly enforced, examples are clear. This will ship.
echo/server/prompt_templates/is_followup_question.es.jinja (1)
1-21: LGTM! 🌮Spanish template is clean. Consistent structure across all language variants, JSON-only output is explicit, examples are helpful. Ships it.
echo/server/prompt_templates/is_followup_question.en.jinja (1)
1-20: LGTM! 🎯English base template is fire. Clear instructions, JSON-only output constraint is explicit, examples are on point. This is the template all the others are based on and it's solid. Ships it.
echo/server/prompt_templates/auto_select_conversations.de.jinja (1)
22-25: Consider tightening inclusion criteria.
Saying “include indirect relevance and anything potentially useful” invites the model to grab way too much context; we need the 80 % cap to stick. Tune the copy (e.g., preference order or max slots) to weigh direct matches first or give a hard selection budget so we don’t blow the window.
[ suggest_recommended_refactor ]echo/server/prompt_templates/auto_select_conversations.en.jinja (1)
22-25: Dial back the looseness.
Same story: “partial or indirect relevance” + “if it might help, include it” is basically carte blanche. Tighten instructions so the assistant keeps us under the 80 % allotment—e.g., cap selections or require justification before opting in marginal threads.
[ suggest_recommended_refactor ]echo/server/prompt_templates/auto_select_conversations.fr.jinja (1)
21-25: Limit the sprawl.
Le texte français laisse aussi trop de marge (“pertinence partielle ou indirecte” + “incluez si ça pourrait aider”). Reformule ton prompt pour prioriser la pertinence directe et imposer un plafond explicite, sinon le modèle va selectionner en vrac.
[ suggest_recommended_refactor ]echo/server/dembrane/chat_utils.py (8)
292-368: Batch fan-out nukes context compliance.
You kick off every batch viaasyncio.gatherwith the samebatch_num. If even two batches sneak through, each LLM call sees only ~20 conversations—no global limit—so the merged union happily reassembles 200 IDs and torpedoes the “≤80 % of window” mandate from ECHO‑497. We need either (a) sequential selection with a running token budget, or (b) global arbitration (e.g., take top-N scores) before flattening results. Right now the first model that says “sure, everything’s relevant” wins.
[ raise_major_issue ]
319-335: Parallelism = N concurrent LLM calls.
asyncio.gatherlaunches the whole fleet at once. If a project has 500 convos, we’re now spamming 25 simultaneous LLM hits—no backpressure, no semaphore, nothing. That’s a self-inflicted DDOS against Azure GPT5 and will blow rate limits. Gate the concurrency (semaphore / worker pool) or throttle via streaming pipeline.
[ raise_major_issue ]
463-513: Recursive split drops observability & ordering.
When you recurse, both halves reuse the parentbatch_num, so logs become useless—can’t tell which subbatch bombed. Worse, you recombine raw IDs without dedup priorities, so order becomes “whatever recursion returns,” not stable by created_at or similarity. Track subbatch identifiers and enforce deterministic ordering (e.g., original index) before merging.
[ suggest_recommended_refactor ]
524-548: LLM output contract unchecked.
Youjson.loadsand trust the shape without schema validation, then silently drop bogus IDs. At minimum run pydantic / jsonschema to fail fast, otherwise the selection path can degrade quietly (empty lists) with zero upstream signal. Even better: surface failures so the caller retries or falls back to KG search.
[ raise_major_issue ]
530-538: ID filter slips integers.
isinstance(id, (int, str))lets “1” through even though all conv IDs are UUID strings (seeConversationModel.id). If the model emits1, validation keeps it, the laterid in valid_idsrejects, and we silently drop it—masking the prompt or parsing error. Tighten to strings and log hard failures so we can fix upstream prompt drift instead of papering over it.
[ suggest_essential_refactor ]
553-568: Context overflow path dead-ends.
If one batch hitsContextWindowExceededError, we return{error: "context_exceeded"}but the caller just tallies failed_batches and moves on—no retry with smaller chunk, no escalation. That contradicts the issue requirement to keep context ≤80 %: we’re simply discarding coverage. Handle this like the earlier recursive split (or fallback to KG) instead of dropping everything.
[ raise_major_issue ]
560-571: Unrecoverable errors get swallowed.
After retries you return emptyselected_ids, and upstream happily ships an empty recommendations list. We need to bubble an exception or at least propagateerrorup so the API can alert the user/UI to rerun auto-select rather than pretending nothing matched.
[ raise_major_issue ]
296-374: Project loop assumes single ID.
Docstring saysproject_id_list“contains a single project ID,” but the code loops arbitrary length and aggregates independently. If callers ever pass >1 (the API already supports multi-project queries), we’re doing multiple simultaneousgathers with additive load. Either enforce len==1 or design the scheduling to handle true multi-project selection.
[ suggest_essential_refactor ]
There was a problem hiding this comment.
Actionable comments posted: 2
📜 Review details
Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro
Disabled knowledge base sources:
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (1)
echo/server/dembrane/api/chat.py(6 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
echo/server/dembrane/api/chat.py (6)
echo/server/dembrane/prompts.py (1)
render_prompt(58-91)echo/server/dembrane/chat_utils.py (2)
auto_select_conversations(256-375)create_system_messages_for_chat(100-166)echo/server/dembrane/quote_utils.py (1)
count_tokens(264-268)echo/server/dembrane/api/conversation.py (1)
get_conversation_token_count(437-456)echo/server/dembrane/audio_lightrag/utils/lightrag_utils.py (1)
get_project_id(363-365)echo/server/dembrane/database.py (2)
ConversationModel(340-386)ProjectChatMessageModel(256-283)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: ci-check-server
🔇 Additional comments (5)
echo/server/dembrane/api/chat.py (5)
17-19: LGTM!Solid imports for the new follow-up detection and auto-select flow. SMALL_LITELLM_* config for lightweight calls, render_prompt for multilingual templates, auto_select_conversations for LLM-driven selection, get_project_id for project extraction. All checks out.
Also applies to: 26-26, 39-39, 45-45
503-531: LGTM.The follow-up detection gate is wired up correctly: you only check for follow-ups when locked conversations exist (line 524), call the detection function with the conversation history, and set the
should_reuse_lockedflag based on the result. Logging at lines 527 and 530 is clutch for debugging. No issues here.
560-632: Auto-select branch looks solid.The auto-select logic is clean: you call
auto_select_conversations(line 565), extract the selected IDs from the nested result dict (lines 575-579), add them tochat.used_conversationswith a duplicate check (lines 582-587), create a "dembrane" message to lock the newly selected conversations (lines 590-601), and then rebuild the context and formatted messages with the updated set (lines 604-621).Token counting at lines 624-626 runs after building
formatted_messages, and you throw a 400 with a clear error message ("Auto select returned too many conversations") if the context is too large. The error message here is more actionable than the follow-up branch message, which is good.One subtle thing: line 585 checks
conversation not in chat.used_conversationsto avoid duplicates. This relies on SQLAlchemy's relationship membership check, which should work correctly. If you wanted to be extra paranoid, you could track IDs in a set, but the current implementation is fine.
634-690: References payload and streaming logic ship-ready.Lines 634-643 correctly build the references payload from only the newly added conversations (
conversations_added), not all conversations in the context. This is the right behavior per your comment at line 636 and the AI summary. Each reference includes the conversation ID and participant name.The streaming function (lines 647-690) emits the references payload as an "h:" frame (line 649), then streams the LLM response. The timeout adjustments at lines 661-662 (5-minute overall timeout, 3-minute stream timeout) are clutch for preventing hanging requests—this is a solid reliability improvement over having no explicit timeout.
Error handling at lines 673-684 is correct: you catch exceptions, log them, delete the in-flight user message from the DB, and yield a protocol-appropriate error payload. All good here.
691-784: Manual-select path LGTM.The manual-select flow (when auto-select is disabled) builds system messages from locked conversations (line 692-694), filters the conversation history (lines 714-728), constructs the outgoing messages (lines 730-743), and streams the LLM response (lines 746-763).
The timeout adjustments at lines 753-754 (5-minute overall timeout, 3-minute stream timeout) match the auto-select path—good for consistency and reliability. Error handling at lines 764-774 is correct: you delete the in-flight user message and yield a protocol-appropriate error.
One observation: the manual-select path doesn't check token length before streaming. This could theoretically blow up if the user manually adds too many conversations, but this behavior existed before your changes, so it's not a regression. If you wanted to add a guard here, you could, but it's not critical.
| async def is_followup_question( | ||
| conversation_history: List[Dict[str, str]], language: str = "en" | ||
| ) -> bool: | ||
| """ | ||
| Determine if the current question is a follow-up to previous messages. | ||
| Uses a small LLM call to check semantic relationship. | ||
|
|
||
| Returns: | ||
| True if it's a follow-up question, False if it's a new independent question | ||
| """ | ||
| if len(conversation_history) < 2: | ||
| # No previous context, can't be a follow-up | ||
| return False | ||
|
|
||
| # Take last 4 messages for context (2 exchanges) | ||
| recent_messages = conversation_history[-4:] | ||
|
|
||
| # Format messages for the prompt | ||
| previous_messages = [ | ||
| {"role": msg["role"], "content": msg["content"]} for msg in recent_messages[:-1] | ||
| ] | ||
| current_question = recent_messages[-1]["content"] | ||
|
|
||
| prompt = render_prompt( | ||
| "is_followup_question", | ||
| language, | ||
| { | ||
| "previous_messages": previous_messages, | ||
| "current_question": current_question, | ||
| }, | ||
| ) | ||
|
|
||
| try: | ||
| response = await litellm.acompletion( | ||
| model=SMALL_LITELLM_MODEL, | ||
| api_key=SMALL_LITELLM_API_KEY, | ||
| api_base=SMALL_LITELLM_API_BASE, | ||
| messages=[{"role": "user", "content": prompt}], | ||
| temperature=0, # Deterministic | ||
| timeout=60, # 1 minute timeout for quick decision | ||
| ) | ||
|
|
||
| result_text = response.choices[0].message.content.strip() | ||
| result = json.loads(result_text) | ||
| is_followup = result.get("is_followup", False) | ||
|
|
||
| logger.info(f"Follow-up detection: {is_followup} for query: {current_question[:50]}...") | ||
| return is_followup | ||
| except Exception as e: | ||
| logger.warning(f"Follow-up detection failed: {e}. Defaulting to False (run auto-select)") | ||
| return False | ||
|
|
There was a problem hiding this comment.
🧹 Nitpick | 🔵 Trivial
Ship it.
The follow-up detection logic is clean: grabs the last 4 messages, renders a prompt, hits the small LLM with temperature=0 for deterministic classification, parses the JSON response, and defaults to False on error. The 60-second timeout is generous but not problematic for a quick classification call.
One optional nitpick: lines 94-96 assume the LLM returns valid JSON. If the model hallucinates and emits prose instead of {"is_followup": true}, json.loads throws and you catch it (safe fallback). If you want to be extra paranoid, you could add explicit JSON validation or use structured outputs if the model supports it, but the current error handling is solid enough.
🤖 Prompt for AI Agents
In echo/server/dembrane/api/chat.py around lines 52 to 103, add defensive JSON
validation around the LLM response parsing: after getting
response.choices[0].message.content, attempt to safely parse JSON (e.g., use
json.JSONDecoder().raw_decode to find a JSON object at the start) and fall back
to treating it as non-followup if parsing fails; log the raw response when
parsing fails for debugging and keep the existing exception handler as a final
fallback.
| if should_reuse_locked: | ||
| # Reuse existing locked conversations for follow-up questions | ||
| updated_conversation_id_list = locked_conversation_id_list | ||
|
|
||
| system_messages = await create_system_messages_for_chat( | ||
| updated_conversation_id_list, db, language, project_id | ||
| ) | ||
|
|
||
| formatted_messages = [] | ||
| top_k = max(5, top_k - 10) | ||
| query = filtered_messages[-1]["content"] | ||
| conversation_history = filtered_messages | ||
| rag_prompt = await get_lightrag_prompt_by_params( | ||
| query=query, | ||
| conversation_history=conversation_history, | ||
| echo_conversation_ids=chat_context.conversation_id_list, | ||
| echo_project_ids=[project_id], | ||
| auto_select_bool=chat_context.auto_select_bool, | ||
| get_transcripts=True, | ||
| top_k=top_k, | ||
| if isinstance(system_messages, list): | ||
| for msg in system_messages: | ||
| formatted_messages.append({"role": "system", "content": msg["text"]}) | ||
| formatted_messages.extend(conversation_history) | ||
| else: | ||
| formatted_messages = [ | ||
| {"role": "system", "content": system_messages} | ||
| ] + conversation_history | ||
|
|
||
| # Check context length | ||
| prompt_len = token_counter( | ||
| model=LIGHTRAG_LITELLM_INFERENCE_MODEL, messages=formatted_messages | ||
| ) | ||
|
|
||
| if prompt_len > MAX_CHAT_CONTEXT_LENGTH: | ||
| raise HTTPException( | ||
| status_code=400, | ||
| detail="The conversation context with the new message exceeds the maximum context length.", | ||
| ) |
There was a problem hiding this comment.
🧹 Nitpick | 🔵 Trivial
Token guard is in, ship it.
The follow-up branch now has the token counter check (lines 551-559) that the past review flagged. You're counting tokens and throwing a 400 if the context blows past MAX_CHAT_CONTEXT_LENGTH. This plugs the regression where reused locked conversations could bypass the guard and cause LiteLLM to throw a generic streaming error.
One UX nitpick: the error message at line 558 is correct but generic. The original review suggested something more actionable like "Locked conversations exceed the maximum context length. Remove conversations or rerun auto-select." Your current message doesn't tell the user what to do. Consider making it more specific so users know they can remove conversations or rerun auto-select to fix the issue.
Also, you could fail-fast by checking token count before building formatted_messages, but the current order is fine—it's just a micro-optimization.
🤖 Prompt for AI Agents
In echo/server/dembrane/api/chat.py around lines 532 to 559, the token-length
check correctly prevents oversized contexts for reused locked conversations but
the HTTPException message is generic; update the raised exception to provide an
actionable message (e.g., "Locked conversations exceed the maximum context
length. Remove conversations or rerun auto-select.") so users know how to
resolve it; optionally, for micro-optimization, move the token-length estimate
to operate on a lightweight representation (or pre-check estimated lengths)
before building formatted_messages, but the primary fix is to replace the
generic detail string with the more specific, actionable guidance.
| async def stream_response_async_autoselect() -> AsyncGenerator[str, None]: | ||
| conversation_references_yeild = f"h:{json.dumps(conversation_references)}\n" | ||
| # Send conversation references (selected conversations) | ||
| conversation_references_yeild = f"h:{json.dumps([conversation_references])}\n" |
There was a problem hiding this comment.
Bug: Chat Context Loss and Conversation Reference Issues
The post_chat endpoint has two issues: auto-selected conversations added to chat.used_conversations are not persisted, leading to their loss. Also, for follow-up questions reusing locked conversations, the frontend doesn't receive references to these conversations, obscuring the active context from users.
<!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit - **New Features** - Automatic conversation selection for chat context with multilingual prompt support (EN/DE/ES/FR/NL). - Smart follow-up detection to reuse existing context when appropriate. - Chat context refreshes after responses and newly added conversations are surfaced to the user. - **Changes** - Updated label to: “The following conversations were automatically added to the context.” - Streamlined chat UI by removing “Processing” indicators and in-progress warnings. - Conversation selection labels now display more consistently. <!-- end of auto-generated comment: release notes by coderabbit.ai -->
Summary by CodeRabbit
New Features
Changes