Skip to content

ECHO-226 Fix multilingual transcription with RunPod#157

Merged
spashii merged 9 commits intomainfrom
feature/echo-226-bug-transcription-remains-in-english-eventhough-language
May 26, 2025
Merged

ECHO-226 Fix multilingual transcription with RunPod#157
spashii merged 9 commits intomainfrom
feature/echo-226-bug-transcription-remains-in-english-eventhough-language

Conversation

@ArindamRoy23
Copy link
Copy Markdown
Contributor

@ArindamRoy23 ArindamRoy23 commented May 16, 2025

  • Updated transcription logic in transcribe.py to detect language and translate if necessary, improving multi-language support.
  • Adjusted prompt formatting for better clarity in transcription requests.
  • Added langdetect version 1.0.9 to dependencies in pyproject.toml, requirements-dev.lock, and requirements.lock.

Summary by CodeRabbit

  • New Features

    • Added support for audio transcription using the RunPod API, with configurable fallback to LiteLLM transcription.
    • Introduced scheduled and asynchronous processing for RunPod transcription results.
    • Added new fields to conversation chunks for tracking RunPod job status and request count.
    • Provided prompt templates for transcription and translation in multiple languages (English, German, Spanish, French, Dutch).
  • Configuration

    • Enabled environment-based configuration for selecting and managing transcription services.
  • Bug Fixes

    • Improved error handling and logging for transcription workflows.

- Updated transcription logic in `transcribe.py` to detect language and translate if necessary, improving multi-language support.
- Adjusted prompt formatting for better clarity in transcription requests.
- Added `langdetect` version 1.0.9 to dependencies in `pyproject.toml`, `requirements-dev.lock`, and `requirements.lock`.
@linear
Copy link
Copy Markdown

linear bot commented May 16, 2025

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented May 16, 2025

Walkthrough

This update introduces RunPod Whisper as a new transcription backend, configurable via environment variables, and integrates it alongside LiteLLM. The system now routes transcription requests based on configuration, manages job status and request counts, and includes scheduled and asynchronous tasks for polling and processing RunPod results. New prompt templates support multiple languages and translation scenarios.

Changes

File(s) Change Summary
echo/server/dembrane/transcribe.py Added RunPod Whisper transcription support, including job queuing, request threshold logic, dynamic prompt rendering, and conditional backend selection. Disabled previous chunk context logic. Refactored error handling and logging. Removed obsolete synchronous transcription function.
echo/server/dembrane/config.py Added config flags and environment variable assertions for enabling RunPod and LiteLLM Whisper transcription. Introduced validation and logging for new RunPod settings. Made LiteLLM config conditional. Minor formatting change for audio model config.
echo/server/dembrane/scheduler.py Registered a new scheduled job to trigger the polling of RunPod transcription responses every minute.
echo/server/dembrane/tasks.py Added two Dramatiq actors: one to process RunPod transcription job results and update Directus, and another to poll for pending jobs and dispatch processing tasks in parallel. Includes error handling and logging.
echo/directus/sync/snapshot/fields/conversation_chunk/runpod_job_status_link.json
echo/directus/sync/snapshot/fields/conversation_chunk/runpod_request_count.json
Added new fields to the conversation_chunk collection: runpod_job_status_link (text, stores job status URL) and runpod_request_count (integer, tracks number of RunPod requests for the chunk).
echo/server/prompt_templates/default_whisper_prompt.*.jinja Added default system prompt templates for Whisper transcription in German, English, Spanish, French, and Dutch.
echo/server/prompt_templates/translate_transcription.*.jinja Added translation prompt templates for DE, EN, ES, FR, and NL, instructing accurate translation of transcripts between detected and target languages with strict output requirements.

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant TranscribeModule
    participant RunPodAPI
    participant Directus
    participant Scheduler
    participant DramatiqWorker

    Client->>TranscribeModule: Request transcription for chunk
    alt RunPod enabled and request count < threshold
        TranscribeModule->>RunPodAPI: Queue transcription job
        RunPodAPI-->>TranscribeModule: Return job status link
        TranscribeModule->>Directus: Update chunk with status link, increment count
    else LiteLLM enabled
        TranscribeModule->>LiteLLM: Transcribe audio
        LiteLLM-->>TranscribeModule: Transcript
        TranscribeModule->>Directus: Update chunk with transcript
    else Neither enabled
        TranscribeModule-->>Client: Raise TranscriptionError
    end

    Scheduler->>DramatiqWorker: Every minute, trigger update task
    DramatiqWorker->>Directus: Query for chunks with pending RunPod jobs
    loop For each pending chunk
        DramatiqWorker->>RunPodAPI: Fetch job status
        alt Success with transcript
            DramatiqWorker->>Directus: Update chunk with transcript, clear status link
        else Not ready
            DramatiqWorker->>TranscribeModule: Optionally re-queue transcription
        end
    end
Loading

Suggested reviewers

  • ussaama
  • spashii

LGTM.

Note

⚡️ AI Code Reviews for VS Code, Cursor, Windsurf

CodeRabbit now has a plugin for VS Code, Cursor and Windsurf. This brings AI code reviews directly in the code editor. Each commit is reviewed immediately, finding bugs before the PR is raised. Seamless context handoff to your AI code agent ensures that you can easily incorporate review feedback.
Learn more here.

✨ Finishing Touches
  • 📝 Generate Docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Explain this complex logic.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai explain this code block.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and explain its main purpose.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

📜 Review details

Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 674060d and 92bc9a5.

⛔ Files ignored due to path filters (2)
  • echo/server/requirements-dev.lock is excluded by !**/*.lock
  • echo/server/requirements.lock is excluded by !**/*.lock
📒 Files selected for processing (7)
  • echo/server/dembrane/transcribe.py (4 hunks)
  • echo/server/prompt_templates/translate_transcription.de.jinja (1 hunks)
  • echo/server/prompt_templates/translate_transcription.en.jinja (1 hunks)
  • echo/server/prompt_templates/translate_transcription.es.jinja (1 hunks)
  • echo/server/prompt_templates/translate_transcription.fr.jinja (1 hunks)
  • echo/server/prompt_templates/translate_transcription.nl.jinja (1 hunks)
  • echo/server/pyproject.toml (1 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (1)
echo/server/dembrane/transcribe.py (1)
echo/server/dembrane/prompts.py (1)
  • render_prompt (55-88)
⏰ Context from checks skipped due to timeout of 90000ms (2)
  • GitHub Check: ci-check-server
  • GitHub Check: ci-build-servers (dbr-echo-server, ./echo/server, Dockerfile, dbr-echo-server)
🔇 Additional comments (6)
echo/server/pyproject.toml (1)

89-89: Solid addition of the langdetect dependency! 🚀

Adding langdetect at version 1.0.9 is exactly what we need to implement language detection capabilities in the transcription pipeline. This choice demonstrates production-ready engineering.

echo/server/prompt_templates/translate_transcription.fr.jinja (1)

1-5: Elegant French prompt template implementation! ✨

This template is clean, efficient, and provides clear instructions for the translation model. The focus on preserving meaning, tone, and context while avoiding modifications demonstrates a deep understanding of translation quality requirements.

echo/server/prompt_templates/translate_transcription.de.jinja (1)

1-5: Excellent German prompt template construction! 💯

The German translation template follows the same robust pattern as the other language templates, maintaining consistency across the system. This modular design allows for seamless scaling to support additional languages in the future.

echo/server/prompt_templates/translate_transcription.nl.jinja (1)

1-5: Dutch prompt template perfectly crafted! 🔥

The Dutch translation template completes the set of language templates with the same high-quality approach. The consistent structure across all templates will ensure uniform translation behavior regardless of language pairs. Nice work!

echo/server/prompt_templates/translate_transcription.en.jinja (1)

1-5: Prompt reads well, ship it

Everything’s crisp; no action items.

echo/server/dembrane/transcribe.py (1)

83-87: Good call on explicit default prompts

Explicit language declarations reduce hallucinated code-switching – nicely done.

- Changed variable name from `response` to `llm_translation_response` for clarity in the transcription process.
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🔭 Outside diff range comments (1)
echo/server/dembrane/transcribe.py (1)

35-79: 🧹 Nitpick (assertive)

Consider adding metrics/logging for translation performance

Your transcription + translation pipeline is solid, but monitoring will be crucial as this rolls out. Consider adding:

  1. Performance metrics to track the time spent in transcription vs. translation
  2. Success rates for language detection
  3. Counters for how often translation is invoked vs. skipped

This would help identify bottlenecks and track the value-add of the new features.

♻️ Duplicate comments (1)
echo/server/dembrane/transcribe.py (1)

60-76: 🛠️ Refactor suggestion

Language detection + translation implementation needs hardening

Your approach for detecting and conditionally translating is solid conceptually, but needs some defensive programming:

  1. Missing try-except around detect() which can throw on short/ambiguous text
  2. Variable reuse - response for both Whisper output and completion response - creates potential debugging nightmares
  3. No temperature setting for deterministic translations
  4. Missing case normalization for language comparison
  5. No system message to improve translation quality

These are all small issues but add up to potential edge case failures.

-        detected_language = detect(response["text"])
-        if detected_language != language and language != "multi":
-            translation_prompt = render_prompt(
-                "translate_transcription",
-                str(language),
-                {"transcript": response["text"], "detected_language": detected_language, "desired_language": language}
-            )
-            llm_translation_response = completion(
-                model=SMALL_LITELLM_MODEL,
-                messages=[{"role": "user", "content": translation_prompt}],
-                api_key=SMALL_LITELLM_API_KEY,
-                api_base=SMALL_LITELLM_API_BASE,
-                api_version=SMALL_LITELLM_API_VERSION,
-            )
-            return llm_translation_response['choices'][0]['message']['content']
-        else: 
-            return response["text"]
+        transcript_text = response["text"]
+        
+        try:
+            detected_language = detect(transcript_text)
+        except Exception as e:
+            logger.warning(f"Language detection failed: {e}")
+            detected_language = language or "unknown"
+            
+        if detected_language and language and detected_language.lower() != language.lower() and language != "multi":
+            translation_prompt = render_prompt(
+                "translate_transcription",
+                str(language).lower(),
+                {"transcript": transcript_text, "detected_language": detected_language, "desired_language": language}
+            )
+            llm_translation_response = completion(
+                model=SMALL_LITELLM_MODEL,
+                messages=[
+                    {"role": "system", "content": "You are a professional translator."},
+                    {"role": "user", "content": translation_prompt}
+                ],
+                api_key=SMALL_LITELLM_API_KEY,
+                api_base=SMALL_LITELLM_API_BASE,
+                api_version=SMALL_LITELLM_API_VERSION,
+                temperature=0,
+            )
+            return llm_translation_response['choices'][0]['message']['content']
+        else: 
+            return transcript_text
📜 Review details

Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 92bc9a5 and 209a0ac.

📒 Files selected for processing (1)
  • echo/server/dembrane/transcribe.py (4 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (1)
echo/server/dembrane/transcribe.py (2)
echo/server/dembrane/s3.py (1)
  • get_stream_from_s3 (176-180)
echo/server/dembrane/prompts.py (1)
  • render_prompt (55-88)
⏰ Context from checks skipped due to timeout of 90000ms (1)
  • GitHub Check: ci-check-server
🔇 Additional comments (4)
echo/server/dembrane/transcribe.py (4)

83-87: LGTM on prompt improvements!

The explicit system prompts using clear language-specific instructions are much better than the previous informal ones. This will give the whisper model clearer direction, leading to more accurate transcriptions.


116-134: Disabling previous chunk context - intentional design change?

You've commented out the code that retrieves previous chunk transcripts. This is a significant change in how the system processes sequential audio. While it simplifies the flow, it might affect the context-awareness of transcriptions.

Is this an intentional design change rather than an accidental comment-out? If intentional, consider removing the code instead of commenting it out to keep the codebase clean.


172-173: Consistent with previous chunk retrieval disabling

This comment-out matches the earlier disabling of previous chunk retrieval. Makes sense for consistency.


169-169: LGTM on improved prompt structure

Adding the explicit "\n\nuser: Project prompt: \n\n" prefix creates a clearer separation in the prompt structure. This should help the model distinguish between instructions and content.

- Introduced environment variables for enabling RunPod Whisper transcription and managing API keys.
- Implemented `queue_transcribe_audio_runpod` function to handle audio transcription requests to RunPod.
- Updated `transcribe_conversation_chunk` to support both RunPod and LiteLLM transcription based on configuration.
- Added default whisper prompt templates for multiple languages (English, Spanish, French, German, Dutch).
- Created new directus field in conversation_chunk to hold runpod job id
- Improved error handling and logging for transcription processes.
@ArindamRoy23 ArindamRoy23 marked this pull request as draft May 20, 2025 16:23
auto-merge was automatically disabled May 20, 2025 16:23

Pull request was converted to draft

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

♻️ Duplicate comments (1)
echo/server/dembrane/transcribe.py (1)

98-129: 🛠️ Refactor suggestion

Language-detect flow still brittle – let’s bulletproof it
Previous feedback wasn’t fully incorporated. Two quick wins:

detect() explodes on short text – wrap it.
• Normalise casing + deterministic translation (temperature=0) with a system seed message for coherence.

-        detected_language = detect(response["text"])
-        if detected_language != language and language != "multi":
+        transcript_text = response["text"]
+
+        try:
+            detected_language = detect(transcript_text)
+        except Exception as exc:  # LangDetectException on short/ambiguous input
+            logger.debug(f"LangDetect failed -> default to requested lang: {exc}")
+            detected_language = language or "unknown"
+
+        if detected_language.lower() != (language or "").lower() and language != "multi":
             translation_prompt = render_prompt(
                 "translate_transcription",
-                str(language),
+                (language or "en").lower(),
                 {
-                    "transcript": response["text"],
+                    "transcript": transcript_text,
                     "detected_language": detected_language,
                     "desired_language": language,
                 },
             )
             llm_translation_response = completion(
                 model=SMALL_LITELLM_MODEL,
-                messages=[{"role": "user", "content": translation_prompt}],
+                messages=[
+                    {"role": "system", "content": "You are a professional translator."},
+                    {"role": "user", "content": translation_prompt},
+                ],
                 api_key=SMALL_LITELLM_API_KEY,
                 api_base=SMALL_LITELLM_API_BASE,
                 api_version=SMALL_LITELLM_API_VERSION,
+                temperature=0,
             )
             return llm_translation_response["choices"][0]["message"]["content"]
-        else:
-            return response["text"]
+        return transcript_text
📜 Review details

Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 209a0ac and 33505d8.

⛔ Files ignored due to path filters (3)
  • echo/directus/sync/specs/item.graphql is excluded by !echo/directus/sync/specs/**
  • echo/directus/sync/specs/openapi.json is excluded by !echo/directus/sync/specs/**
  • echo/directus/sync/specs/system.graphql is excluded by !echo/directus/sync/specs/**
📒 Files selected for processing (8)
  • echo/directus/sync/snapshot/fields/conversation_chunk/runpod_job_id.json (1 hunks)
  • echo/server/dembrane/config.py (3 hunks)
  • echo/server/dembrane/transcribe.py (7 hunks)
  • echo/server/prompt_templates/default_whisper_prompt.de.jinja (1 hunks)
  • echo/server/prompt_templates/default_whisper_prompt.en.jinja (1 hunks)
  • echo/server/prompt_templates/default_whisper_prompt.es.jinja (1 hunks)
  • echo/server/prompt_templates/default_whisper_prompt.fr.jinja (1 hunks)
  • echo/server/prompt_templates/default_whisper_prompt.nl.jinja (1 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (1)
echo/server/dembrane/transcribe.py (2)
echo/server/dembrane/s3.py (1)
  • get_signed_url (143-148)
echo/server/dembrane/prompts.py (1)
  • render_prompt (55-88)
⏰ Context from checks skipped due to timeout of 90000ms (1)
  • GitHub Check: ci-check-server
🔇 Additional comments (8)
echo/server/prompt_templates/default_whisper_prompt.es.jinja (1)

1-1: Ship it! Solid Spanish prompt template.

The Spanish language prompt template follows the same format as other language templates and provides clear instructions to transcribe in Spanish.

echo/server/prompt_templates/default_whisper_prompt.en.jinja (1)

1-1: LGTM! Clean English prompt template.

Concise, well-structured English transcription instruction following the same pattern as other language templates.

echo/server/prompt_templates/default_whisper_prompt.fr.jinja (1)

1-1: Parfait! French template ready to ship.

The French language prompt template aligns perfectly with the project's multilingual approach and maintains consistency with other language templates.

echo/server/dembrane/config.py (2)

279-281: Clean code simplification, LGTM!

Nice refactoring of the LIGHTRAG_LITELLM_AUDIOMODEL_MODEL environment variable retrieval to a single line, making the code cleaner and more maintainable.


233-259: 🧹 Nitpick (assertive)

Missing newline after flag declaration.

Need a blank line after ENABLE_LITELLM_WHISPER_TRANSCRIPTION flag for consistency with the pattern used elsewhere in the file.

ENABLE_LITELLM_WHISPER_TRANSCRIPTION = os.environ.get(
    "ENABLE_LITELLM_WHISPER_TRANSCRIPTION", "false"
).lower() in ["true", "1"]
logger.debug(f"ENABLE_LITELLM_WHISPER_TRANSCRIPTION: {ENABLE_LITELLM_WHISPER_TRANSCRIPTION}")
+

LITELLM_WHISPER_API_KEY = os.environ.get("LITELLM_WHISPER_API_KEY")

Likely an incorrect or invalid review comment.

echo/server/prompt_templates/default_whisper_prompt.de.jinja (1)

1-1: LGTM – prompt is crisp and correct German
Nothing to tweak here; the wording is clear and forces Whisper to stay in target language.

echo/server/prompt_templates/default_whisper_prompt.nl.jinja (1)

1-1: LGTM – Dutch template reads well
The instruction is unambiguous and mirrors the pattern used in other languages. Ship it.

echo/server/dembrane/transcribe.py (1)

224-236: LGTM – graceful hand-off between RunPod and LiteLLM
The configuration gate is clean and the Directus update looks atomic.

- Introduced new environment variables for RunPod Whisper configuration, including priority URL and max request threshold.
- Added a new scheduled job to update RunPod transcription responses every 1 minute.
- Enhanced `queue_transcribe_audio_runpod` to support priority requests based on source and request count.
- Implemented `task_process_runpod_chunk_response` to handle chunk status updates and logging.
- Improved error handling and logging throughout the transcription process.
@ArindamRoy23 ArindamRoy23 marked this pull request as ready for review May 22, 2025 10:57
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 7

📜 Review details

Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 33505d8 and b887fbd.

⛔ Files ignored due to path filters (5)
  • echo/directus/sync/specs/item.graphql is excluded by !echo/directus/sync/specs/**
  • echo/directus/sync/specs/openapi.json is excluded by !echo/directus/sync/specs/**
  • echo/directus/sync/specs/system.graphql is excluded by !echo/directus/sync/specs/**
  • echo/server/requirements-dev.lock is excluded by !**/*.lock
  • echo/server/requirements.lock is excluded by !**/*.lock
📒 Files selected for processing (6)
  • echo/directus/sync/snapshot/fields/conversation_chunk/runpod_job_status_link.json (1 hunks)
  • echo/directus/sync/snapshot/fields/conversation_chunk/runpod_request_count.json (1 hunks)
  • echo/server/dembrane/config.py (3 hunks)
  • echo/server/dembrane/scheduler.py (1 hunks)
  • echo/server/dembrane/tasks.py (3 hunks)
  • echo/server/dembrane/transcribe.py (7 hunks)
🔇 Additional comments (11)
echo/directus/sync/snapshot/fields/conversation_chunk/runpod_job_status_link.json (1)

1-43: Field looks clean for storing RunPod job status URLs.

The schema for the runpod_job_status_link text field is properly configured to store URLs that track the status of RunPod transcription jobs. Making it nullable is a smart design choice since this field will only be populated during active transcription jobs and cleared when complete.

echo/directus/sync/snapshot/fields/conversation_chunk/runpod_request_count.json (1)

1-43: Counter field for tracking RunPod API requests looks solid.

The integer field with default value of 0 will effectively track transcription request counts. This enables your system to implement retry logic and thresholds to prevent infinite retries for failed transcription attempts. The schema is well-structured and integrates nicely with the RunPod workflow.

echo/server/dembrane/transcribe.py (7)

7-10: Imports look clean, imported functions now reused from shared modules

You're pulling in requests for direct API calls and importing both get_signed_url and get_stream_from_s3 from the s3 module. This avoids duplicating code logic - the sign of a senior dev! Kudos for reusing battle-tested utils.


93-93: Cleaner vertical spacing

Good improvement to vertical spacing in transcribe_audio_litellm function. Makes the error handling section more visually distinct.


178-178: Dynamic prompt rendering using templating system

Nice optimization using render_prompt instead of hardcoded prompts. This supports the multi-language capability mentioned in the PR objectives.


200-237: RunPod transcription path is well-implemented

The RunPod integration flow is solid:

  1. Retrieves current request count before queuing
  2. Implements priority routing based on source
  3. Updates Directus with job status link for tracking
  4. Implements threshold check to prevent infinite retries
  5. Clears job status if threshold is reached

This is 10x engineering - handling all edge cases!


238-254: LiteLLM path uses the correct variable name

You've renamed transcription to transcript to avoid shadowing the imported function, which was flagged in a previous review. The code flow is clean and straightforward. Well done!


255-256: Good fallback error handling

Nice to see explicit error handling when no transcription method is enabled. This will prevent silent failures.


134-153:

Details

❓ Verification inconclusive

Previous chunk context retrieval is commented out

You've commented out the code that fetched the previous chunk's transcript. Is this intentional? The PR mentions enhancing transcription logic, but removing context from previous chunks might affect transcription quality for conversations that need continuity.


🏁 Script executed:

#!/bin/bash
# Search codebase for any new implementation that might replace this functionality
echo "Searching for alternative implementations of previous chunk context retrieval..."
rg -p "previous.*chunk" --glob "*.py" | grep -v "^#"

Length of output: 683


Missing previous-chunk context logic remains commented out
It looks like the code fetching previous_chunk_transcript (echo/server/dembrane/transcribe.py lines 134–153) and the prompt append (lines 193–194) is still completely disabled, and there’s no alternative implementation elsewhere. Without cross-chunk context, transcription continuity may suffer. Was this intentional? If not, please restore or replace this logic.

• echo/server/dembrane/transcribe.py: lines 134–153
• echo/server/dembrane/transcribe.py: lines 193–194

LGTM.

echo/server/dembrane/config.py (2)

243-269: LiteLLM assertions now properly conditional

Good job making the LiteLLM whisper assertions conditional. This follows the pattern you established for RunPod and ensures the server can start even if these variables aren't set, as long as the feature is disabled.


290-290: Cleaner variable assignment

The variable assignment for LIGHTRAG_LITELLM_AUDIOMODEL_MODEL is now more concise. Clean refactoring.

- Implemented a check for the job status of RunPod transcription, logging progress and returning None if the job is still in progress.
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

📜 Review details

Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b887fbd and 25edfb2.

📒 Files selected for processing (1)
  • echo/server/dembrane/transcribe.py (7 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (1)
  • GitHub Check: ci-check-server
🔇 Additional comments (4)
echo/server/dembrane/transcribe.py (4)

7-23: Imports look solid. LGTM.

The RunPod configuration imports and requests library are properly structured. The dual transcription backend support is cleanly separated through feature flags.


36-76: RunPod integration is rock solid. Ship it! 🚀

The error handling cascade, conditional payload construction, and priority queue routing are all dialed in. The 600s timeout is hefty but transcription jobs can be chunky. Past review concerns about raise_for_status() and URL construction are properly addressed.


94-103: Config constants properly wired up. LGTM.

Clean migration to use the imported configuration constants while preserving the robust error handling.


178-191: Clean prompt templating implementation. LGTM.

The shift to render_prompt for language-specific defaults is solid. The "user:" prefix addition helps with prompt engineering. Note that previous chunk context is disabled - ensure this aligns with your transcription quality requirements.

collection_name="conversation_chunk",
item_id=conversation_chunk_id,
item_data={
"runpod_job_status_link": str(RUNPOD_WHISPER_BASE_URL) + "/status/" + job_id,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick (assertive)

Consider using urljoin for cleaner URL construction.

While the current string concatenation works, using urllib.parse.urljoin would be more robust.

+from urllib.parse import urljoin
...
-                    "runpod_job_status_link": str(RUNPOD_WHISPER_BASE_URL) + "/status/" + job_id,
+                    "runpod_job_status_link": urljoin(str(RUNPOD_WHISPER_BASE_URL), f"status/{job_id}"),
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
"runpod_job_status_link": str(RUNPOD_WHISPER_BASE_URL) + "/status/" + job_id,
# At the top of echo/server/dembrane/transcribe.py, add:
from urllib.parse import urljoin
# … other imports …
# Then, in the RunPod logic where you build the status link:
# (replace the existing line)
"runpod_job_status_link": urljoin(
str(RUNPOD_WHISPER_BASE_URL),
f"status/{job_id}"
),
🤖 Prompt for AI Agents
In echo/server/dembrane/transcribe.py at line 238, replace the string
concatenation used to build the "runpod_job_status_link" URL with
urllib.parse.urljoin for more robust URL construction. Import urljoin from
urllib.parse if not already imported, then use it to join
RUNPOD_WHISPER_BASE_URL and "/status/" + job_id to form the complete URL.

Comment on lines +200 to +224
if ENABLE_RUNPOD_WHISPER_TRANSCRIPTION:
directus_response = directus.get_items(
"conversation_chunk",
{
"query": {"filter": {"id": {"_eq": conversation_chunk_id}},
"fields": ["source","runpod_job_status_link","runpod_request_count"]},
},
)
runpod_request_count = (directus_response[0]["runpod_request_count"])
source = (directus_response[0]["source"])
runpod_job_status_link = (directus_response[0]["runpod_job_status_link"])

headers = {
"Content-Type": "application/json",
"Authorization": f"Bearer {RUNPOD_WHISPER_API_KEY}",
}

if runpod_job_status_link:
response = requests.get(runpod_job_status_link, headers=headers)
job_status = response.json()['status']
logger.debug(f"job_status: {job_status}")
if job_status == "IN_PROGRESS":
logger.info(f"RunPod job {runpod_job_status_link} is in progress")
return None

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Add defensive checks for API calls and null fields.

The RunPod status check needs hardening:

  • source could be None/missing and cause issues on line 226
  • The requests.get call on line 218 needs error handling

Apply this fix:

         source = (directus_response[0]["source"])
         runpod_job_status_link = (directus_response[0]["runpod_job_status_link"])
 
         headers = {
             "Content-Type": "application/json",
             "Authorization": f"Bearer {RUNPOD_WHISPER_API_KEY}",
         }
 
         if runpod_job_status_link:
-            response = requests.get(runpod_job_status_link, headers=headers)
-            job_status = response.json()['status']
+            try:
+                response = requests.get(runpod_job_status_link, headers=headers, timeout=30)
+                response.raise_for_status()
+                job_status = response.json().get('status')
+            except Exception as e:
+                logger.error(f"Failed to check RunPod job status: {e}")
+                job_status = None
             logger.debug(f"job_status: {job_status}")
             if job_status == "IN_PROGRESS":
                 logger.info(f"RunPod job {runpod_job_status_link} is in progress")
                 return None
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if ENABLE_RUNPOD_WHISPER_TRANSCRIPTION:
directus_response = directus.get_items(
"conversation_chunk",
{
"query": {"filter": {"id": {"_eq": conversation_chunk_id}},
"fields": ["source","runpod_job_status_link","runpod_request_count"]},
},
)
runpod_request_count = (directus_response[0]["runpod_request_count"])
source = (directus_response[0]["source"])
runpod_job_status_link = (directus_response[0]["runpod_job_status_link"])
headers = {
"Content-Type": "application/json",
"Authorization": f"Bearer {RUNPOD_WHISPER_API_KEY}",
}
if runpod_job_status_link:
response = requests.get(runpod_job_status_link, headers=headers)
job_status = response.json()['status']
logger.debug(f"job_status: {job_status}")
if job_status == "IN_PROGRESS":
logger.info(f"RunPod job {runpod_job_status_link} is in progress")
return None
if ENABLE_RUNPOD_WHISPER_TRANSCRIPTION:
directus_response = directus.get_items(
"conversation_chunk",
{
"query": {"filter": {"id": {"_eq": conversation_chunk_id}},
"fields": ["source","runpod_job_status_link","runpod_request_count"]},
},
)
runpod_request_count = directus_response[0]["runpod_request_count"]
source = directus_response[0]["source"]
runpod_job_status_link = directus_response[0]["runpod_job_status_link"]
headers = {
"Content-Type": "application/json",
"Authorization": f"Bearer {RUNPOD_WHISPER_API_KEY}",
}
if runpod_job_status_link:
try:
response = requests.get(runpod_job_status_link, headers=headers, timeout=30)
response.raise_for_status()
job_status = response.json().get("status")
except Exception as e:
logger.error(f"Failed to check RunPod job status: {e}")
job_status = None
logger.debug(f"job_status: {job_status}")
if job_status == "IN_PROGRESS":
logger.info(f"RunPod job {runpod_job_status_link} is in progress")
return None
🤖 Prompt for AI Agents
In echo/server/dembrane/transcribe.py around lines 200 to 224, add defensive
checks to handle cases where 'source' might be None or missing after fetching
from Directus, and wrap the requests.get call in a try-except block to catch and
handle potential network or HTTP errors gracefully. Ensure that before using
'source' or other fields, you verify they are not None, and log or handle errors
from the API call to prevent crashes.

@spashii spashii changed the title Add langdetect dependency and enhance transcription logic ECHO-226 Fix multi-lingual transcription with RunPod May 26, 2025
@spashii spashii changed the title ECHO-226 Fix multi-lingual transcription with RunPod ECHO-226 Fix multilingual transcription with RunPod May 26, 2025
@spashii spashii added this pull request to the merge queue May 26, 2025
Merged via the queue into main with commit 5e7b290 May 26, 2025
7 checks passed
@coderabbitai coderabbitai bot mentioned this pull request Jun 6, 2025
spashii pushed a commit that referenced this pull request Nov 18, 2025
* Add langdetect dependency and enhance transcription logic

- Updated transcription logic in `transcribe.py` to detect language and translate if necessary, improving multi-language support.
- Adjusted prompt formatting for better clarity in transcription requests.
- Added `langdetect` version 1.0.9 to dependencies in `pyproject.toml`, `requirements-dev.lock`, and `requirements.lock`.

* Refactor variable naming in transcription logic

- Changed variable name from `response` to `llm_translation_response` for clarity in the transcription process.

* Add RunPod Whisper transcription support and enhance language handling

- Introduced environment variables for enabling RunPod Whisper transcription and managing API keys.
- Implemented `queue_transcribe_audio_runpod` function to handle audio transcription requests to RunPod.
- Updated `transcribe_conversation_chunk` to support both RunPod and LiteLLM transcription based on configuration.
- Added default whisper prompt templates for multiple languages (English, Spanish, French, German, Dutch).
- Created new directus field in conversation_chunk to hold runpod job id
- Improved error handling and logging for transcription processes.

* cosmetic formatting correction

* Add RunPod Whisper transcription enhancements and new scheduling task

- Introduced new environment variables for RunPod Whisper configuration, including priority URL and max request threshold.
- Added a new scheduled job to update RunPod transcription responses every 1 minute.
- Enhanced `queue_transcribe_audio_runpod` to support priority requests based on source and request count.
- Implemented `task_process_runpod_chunk_response` to handle chunk status updates and logging.
- Improved error handling and logging throughout the transcription process.

* directus updates for runpod

* Remove langdetect dependency and related code from transcription module

* Enhance transcription process by adding RunPod job status check

- Implemented a check for the job status of RunPod transcription, logging progress and returning None if the job is still in progress.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants