Conversation
WalkthroughAdds explicit transcription provider configuration and dispatch with new AssemblyAI and LiteLLM Whisper flags, implements AssemblyAI transcription flow, updates conversation chunk updates to support diarization, removes the EventService and its events module, guards RunPod diarization with a feature flag, adds an S3 docstring note, and introduces AssemblyAI integration tests. Changes
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Suggested labels
Pre-merge checks and finishing touches❌ Failed checks (1 warning, 1 inconclusive)
✅ Passed checks (1 passed)
✨ Finishing touches
🧪 Generate unit tests
Tip 👮 Agentic pre-merge checks are now available in preview!Pro plan users can now enable pre-merge checks in their settings to enforce checklists before merging PRs.
Please see the documentation for more information. Example: reviews:
pre_merge_checks:
custom_checks:
- name: "Undocumented Breaking Changes"
mode: "warning"
instructions: |
Pass/fail criteria: All breaking changes to public APIs, CLI flags, environment variables, configuration keys, database schemas, or HTTP/GraphQL endpoints must be documented in the "Breaking Change" section of the PR description and in CHANGELOG.md. Exclude purely internal or private changes (e.g., code not exported from package entry points or explicitly marked as internal).Please share your feedback with us on this Discord post. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 10
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
echo/server/dembrane/service/conversation.py (1)
260-263: Critical: provided file_url is ignored, path stored as None.If caller passes file_url (no upload), we still write path=None. That breaks downstream processing.
Minimal fix:
- "path": file_url if needs_upload else None, + "path": file_url,echo/server/dembrane/conversation_health.py (1)
216-221: Persist job status link on the chunk for observability.Store runpod_job_status_link so operators/users can click through.
job_data = _submit_diarization_job(audio_url, project_language) if not job_data: return None job_id, job_status_link = job_data + # Persist status link for operators + try: + directus.update_item( + "conversation_chunk", + chunk_id, + {"runpod_job_status_link": job_status_link}, + ) + except Exception as e: + logger.warning(f"Failed to persist job status link for {chunk_id}: {e}")
📜 Review details
Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro
Disabled knowledge base sources:
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (8)
echo/server/dembrane/config.py(3 hunks)echo/server/dembrane/conversation_health.py(16 hunks)echo/server/dembrane/s3.py(1 hunks)echo/server/dembrane/service/__init__.py(1 hunks)echo/server/dembrane/service/conversation.py(2 hunks)echo/server/dembrane/service/event.py(0 hunks)echo/server/dembrane/transcribe.py(6 hunks)echo/server/tests/test_transcribe_assembly.py(1 hunks)
💤 Files with no reviewable changes (1)
- echo/server/dembrane/service/event.py
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-05-05T19:39:05.113Z
Learnt from: ArindamRoy23
PR: Dembrane/echo#128
File: echo/server/dembrane/transcribe.py:45-55
Timestamp: 2025-05-05T19:39:05.113Z
Learning: LiteLLM transcription with Whisper models always returns a response with a "text" field, making additional KeyError handling unnecessary.
Applied to files:
echo/server/dembrane/config.py
🧬 Code graph analysis (3)
echo/server/dembrane/transcribe.py (3)
echo/server/dembrane/s3.py (1)
get_signed_url(143-151)echo/server/dembrane/prompts.py (1)
render_prompt(58-91)echo/server/dembrane/service/conversation.py (1)
update_chunk(277-340)
echo/server/tests/test_transcribe_assembly.py (3)
echo/server/dembrane/s3.py (2)
delete_from_s3(186-188)save_to_s3_from_url(86-116)echo/server/dembrane/utils.py (2)
get_utc_timestamp(49-50)get(67-79)echo/server/dembrane/transcribe.py (2)
transcribe_audio_assemblyai(129-180)transcribe_conversation_chunk(342-398)
echo/server/dembrane/service/__init__.py (1)
echo/server/dembrane/service/conversation.py (5)
ConversationService(34-409)ConversationServiceException(18-19)ConversationNotFoundException(22-23)ConversationChunkNotFoundException(30-31)ConversationNotOpenForParticipationException(26-27)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Cursor Bugbot
🔇 Additional comments (13)
echo/server/dembrane/service/__init__.py (2)
28-33: Service wiring LGTM.Initialization order and dependencies look clean.
18-26: EventService removal — clean sweep, no references found; LGTM.
Repo-wide, case-insensitive search for "EventService" (imports & class defs) returned no matches.echo/server/dembrane/service/conversation.py (1)
281-299: Diarization field support LGTM.Parameter order and _UNSET checks are consistent with the existing update pattern. Ensure Directus schema has a JSON/text field named “diarization”.
Do you already have the “diarization” field on conversation_chunk in Directus? If not, I can draft the migration.
echo/server/dembrane/config.py (3)
189-201: AssemblyAI defaults: confirm EU base URL as the right default.Defaulting to https://api.eu.assemblyai.com is opinionated; many users expect the global endpoint. If EU is intentional (data residency), all good—otherwise flip the default.
228-254: LiteLLM Whisper flags LGTM.Boolean gating + asserts are consistent and safe at import time.
468-489: RunPod diarization flag LGTM.Good to finally gate on ENABLE_RUNPOD_DIARIZATION; conversation_health respects it.
echo/server/dembrane/conversation_health.py (3)
37-45: Directus field path LGTM; handle missing language defensively.The field chain is right. If language is null, _should_skip_diarization will treat it as non‑en; that’s fine. Keep logs at debug.
197-200: Proper feature‑flag guard.Early return when ENABLE_RUNPOD_DIARIZATION is off—nice.
343-348: Slice+flatten guard LGTM.Defensive try/except is fine; consider logging response length to aid debugging, but not blocking.
echo/server/tests/test_transcribe_assembly.py (1)
14-22: LGTM! Clean provider isolation.The function properly enforces AssemblyAI as the sole provider during tests while skipping when credentials are absent.
echo/server/dembrane/transcribe.py (3)
1-6: LGTM! Honest TODO acknowledging technical debt.The comment correctly identifies the need for provider abstraction. Ship it now, refactor later.
373-376: LGTM! Storing raw words array for diarization.Smart to preserve the full AssemblyAI response structure for future speaker identification features.
256-264: Provider selection logic is clean and explicit.The precedence order (AssemblyAI > RunPod > LiteLLM) is clear. Good error when none configured.
| ## ENABLE_ASSEMBLYAI_TRANSCRIPTION = os.environ.get( | ||
| # "ENABLE_ASSEMBLYAI_TRANSCRIPTION", "false" | ||
| # ).lower() in ["true", "1"] | ||
| # This is a bad pattern for hygiene because it allows for multiple values to be set if you want it to be true/false | ||
|
|
||
| # This file inits twice for some reason... | ||
|
|
There was a problem hiding this comment.
🧹 Nitpick
Drop stale commented code.
This dead snippet adds noise and confuses the enablement story.
🤖 Prompt for AI Agents
In echo/server/dembrane/config.py around lines 9 to 15, remove the stale
commented-out environment variable block and the stray note ("This file inits
twice for some reason...") so the file contains only active configuration code;
if an enablement flag is required, replace the commented snippet with a single
canonical implementation elsewhere (or document its location) and run
linters/tests to ensure no references rely on the deleted comments.
| if status == "COMPLETED": | ||
| dirz_response_data = response_data.get("output") | ||
| if dirz_response_data: | ||
| logger.info(f"Diarization job {job_id} completed. Updating chunk {chunk_id} with results.") | ||
| logger.info( | ||
| f"Diarization job {job_id} completed. Updating chunk {chunk_id} with results." | ||
| ) | ||
| _update_chunk_with_results(chunk_id, dirz_response_data) | ||
| return | ||
| else: | ||
| logger.warning(f"Diarization job {job_id} completed but no output data received.") | ||
| logger.warning( | ||
| f"Diarization job {job_id} completed but no output data received." | ||
| ) | ||
| return |
There was a problem hiding this comment.
🧹 Nitpick
Handle failure states; don’t spin until timeout.
If RunPod returns FAILED/CANCELED, bail early and optionally annotate the chunk.
if response_data:
status = response_data.get("status")
logger.debug(f"Job {job_id} status: {status}")
if status == "COMPLETED":
dirz_response_data = response_data.get("output")
if dirz_response_data:
logger.info(
f"Diarization job {job_id} completed. Updating chunk {chunk_id} with results."
)
_update_chunk_with_results(chunk_id, dirz_response_data)
return
else:
logger.warning(
f"Diarization job {job_id} completed but no output data received."
)
return
+ elif status in {"FAILED", "CANCELED", "CANCELLED", "ERROR"}:
+ logger.warning(f"Diarization job {job_id} ended with status={status}")
+ return📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| if status == "COMPLETED": | |
| dirz_response_data = response_data.get("output") | |
| if dirz_response_data: | |
| logger.info(f"Diarization job {job_id} completed. Updating chunk {chunk_id} with results.") | |
| logger.info( | |
| f"Diarization job {job_id} completed. Updating chunk {chunk_id} with results." | |
| ) | |
| _update_chunk_with_results(chunk_id, dirz_response_data) | |
| return | |
| else: | |
| logger.warning(f"Diarization job {job_id} completed but no output data received.") | |
| logger.warning( | |
| f"Diarization job {job_id} completed but no output data received." | |
| ) | |
| return | |
| if status == "COMPLETED": | |
| dirz_response_data = response_data.get("output") | |
| if dirz_response_data: | |
| logger.info( | |
| f"Diarization job {job_id} completed. Updating chunk {chunk_id} with results." | |
| ) | |
| _update_chunk_with_results(chunk_id, dirz_response_data) | |
| return | |
| else: | |
| logger.warning( | |
| f"Diarization job {job_id} completed but no output data received." | |
| ) | |
| return | |
| elif status in {"FAILED", "CANCELED", "CANCELLED", "ERROR"}: | |
| logger.warning(f"Diarization job {job_id} ended with status={status}") | |
| return |
🤖 Prompt for AI Agents
In echo/server/dembrane/conversation_health.py around lines 237 to 249, the code
only handles the "COMPLETED" case and does not bail early on RunPod failure
states; add explicit checks for status == "FAILED" and status == "CANCELED" that
log a warning or error, annotate the chunk to mark it as failed/canceled (use
the existing chunk update/annotation helper or add a small helper like
_mark_chunk_failed_with_reason), and then return immediately so we don't spin
until timeout; ensure the log includes job_id, chunk_id and the
response_data/error reason and that the chunk annotation persists so callers
know the diarization failed.
| """ | ||
| WARNING: this will also "get fake signed urls" for files that don't exist | ||
| """ |
There was a problem hiding this comment.
🧹 Nitpick
Don’t just warn—give callers a way to fail fast for missing objects.
Presigned URLs are generated without existence checks. Offer an opt‑in existence check to prevent queuing downstream work with dead links.
Apply within this function:
-def get_signed_url(file_name: str, expires_in_seconds: int = 3600) -> str:
+def get_signed_url(file_name: str, expires_in_seconds: int = 3600, verify_exists: bool = False) -> str:
"""
WARNING: this will also "get fake signed urls" for files that don't exist
"""
- return s3_client.generate_presigned_url(
+ key = get_sanitized_s3_key(file_name)
+ if verify_exists:
+ # HEAD check so callers can opt-in to strict behavior
+ s3_client.head_object(Bucket=STORAGE_S3_BUCKET, Key=key)
+ return s3_client.generate_presigned_url(
"get_object",
- Params={"Bucket": STORAGE_S3_BUCKET, "Key": get_sanitized_s3_key(file_name)},
+ Params={"Bucket": STORAGE_S3_BUCKET, "Key": key},
ExpiresIn=expires_in_seconds,
)And add the import up top:
-from botocore.response import StreamingBody
+from botocore.response import StreamingBody
+from botocore.exceptions import ClientError # optional: catch/translate on HEAD failuresWould you like me to wire this stricter mode where we generate signed URLs for diarization/transcription?
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| """ | |
| WARNING: this will also "get fake signed urls" for files that don't exist | |
| """ | |
| from botocore.response import StreamingBody | |
| from botocore.exceptions import ClientError # optional: catch/translate on HEAD failures | |
| def get_signed_url(file_name: str, expires_in_seconds: int = 3600, verify_exists: bool = False) -> str: | |
| """ | |
| WARNING: this will also "get fake signed urls" for files that don't exist | |
| """ | |
| key = get_sanitized_s3_key(file_name) | |
| if verify_exists: | |
| # HEAD check so callers can opt-in to strict behavior | |
| s3_client.head_object(Bucket=STORAGE_S3_BUCKET, Key=key) | |
| return s3_client.generate_presigned_url( | |
| "get_object", | |
| Params={"Bucket": STORAGE_S3_BUCKET, "Key": key}, | |
| ExpiresIn=expires_in_seconds, | |
| ) |
🤖 Prompt for AI Agents
In echo/server/dembrane/s3.py around lines 144 to 146, the docstring warns but
the function still generates presigned URLs for objects that may not exist; add
an opt-in parameter (e.g., verify_exists: bool = False) to this function
signature, and when True perform an S3 existence check (head_object) before
generating a presigned URL; if the head_object call returns a 404/NotFound raise
a clear exception so callers can fail fast and avoid downstream dead links. Also
add the necessary import at the top (e.g., from botocore.exceptions import
ClientError) and handle ClientError to distinguish missing objects vs other
errors; ensure the new behavior is documented in the function docstring and unit
tests updated to cover both verify_exists True/False.
| exceptions = { | ||
| "file": { | ||
| "FileServiceException": FileServiceException, | ||
| }, | ||
| "conversation": { | ||
| "ConversationChunkNotFoundException": ConversationChunkNotFoundException, | ||
| "ConversationNotFoundException": ConversationNotFoundException, | ||
| "ConversationNotOpenForParticipationException": ConversationNotOpenForParticipationException, | ||
| "ConversationServiceException": ConversationServiceException, | ||
| }, | ||
| "project": { | ||
| "ProjectNotFoundException": ProjectNotFoundException, | ||
| "ProjectServiceException": ProjectServiceException, | ||
| }, | ||
| } |
There was a problem hiding this comment.
🧹 Nitpick
Public API polish: export surface explicitly.
Optional: define all so downstreams get a stable import surface for services and exceptions.
Example:
exceptions = {
...
}
+
+__all__ = [
+ "file_service", "project_service", "conversation_service",
+ "FileServiceException",
+ "ProjectService", "ProjectServiceException", "ProjectNotFoundException",
+ "ConversationService", "ConversationServiceException",
+ "ConversationNotFoundException", "ConversationChunkNotFoundException",
+ "ConversationNotOpenForParticipationException",
+ "exceptions",
+]📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| exceptions = { | |
| "file": { | |
| "FileServiceException": FileServiceException, | |
| }, | |
| "conversation": { | |
| "ConversationChunkNotFoundException": ConversationChunkNotFoundException, | |
| "ConversationNotFoundException": ConversationNotFoundException, | |
| "ConversationNotOpenForParticipationException": ConversationNotOpenForParticipationException, | |
| "ConversationServiceException": ConversationServiceException, | |
| }, | |
| "project": { | |
| "ProjectNotFoundException": ProjectNotFoundException, | |
| "ProjectServiceException": ProjectServiceException, | |
| }, | |
| } | |
| exceptions = { | |
| "file": { | |
| "FileServiceException": FileServiceException, | |
| }, | |
| "conversation": { | |
| "ConversationChunkNotFoundException": ConversationChunkNotFoundException, | |
| "ConversationNotFoundException": ConversationNotFoundException, | |
| "ConversationNotOpenForParticipationException": ConversationNotOpenForParticipationException, | |
| "ConversationServiceException": ConversationServiceException, | |
| }, | |
| "project": { | |
| "ProjectNotFoundException": ProjectNotFoundException, | |
| "ProjectServiceException": ProjectServiceException, | |
| }, | |
| } | |
| __all__ = [ | |
| "file_service", "project_service", "conversation_service", | |
| "FileServiceException", | |
| "ProjectService", "ProjectServiceException", "ProjectNotFoundException", | |
| "ConversationService", "ConversationServiceException", | |
| "ConversationNotFoundException", "ConversationChunkNotFoundException", | |
| "ConversationNotOpenForParticipationException", | |
| "exceptions", | |
| ] |
🤖 Prompt for AI Agents
In echo/server/dembrane/service/__init__.py around lines 35-49, add an explicit
module export surface by defining __all__ to include the publicly intended names
(at minimum the exceptions mapping and the top-level service classes); e.g. set
__all__ = ["exceptions", "FileService", "ConversationService", "ProjectService"]
so downstream code importing * gets a stable API surface; adjust the list to
match the actual service class names defined in this package.
| def transcribe_audio_assemblyai( | ||
| audio_file_uri: str, | ||
| language: Optional[str], # pyright: ignore[reportUnusedParameter] | ||
| hotwords: Optional[List[str]], | ||
| ) -> tuple[str, dict[str, Any]]: |
There was a problem hiding this comment.
🧹 Nitpick
Parameter marked unused but actually used.
The language parameter has # pyright: ignore[reportUnusedParameter] but it's used in lines 150-154.
def transcribe_audio_assemblyai(
audio_file_uri: str,
- language: Optional[str], # pyright: ignore[reportUnusedParameter]
+ language: Optional[str],
hotwords: Optional[List[str]],
) -> tuple[str, dict[str, Any]]:📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| def transcribe_audio_assemblyai( | |
| audio_file_uri: str, | |
| language: Optional[str], # pyright: ignore[reportUnusedParameter] | |
| hotwords: Optional[List[str]], | |
| ) -> tuple[str, dict[str, Any]]: | |
| def transcribe_audio_assemblyai( | |
| audio_file_uri: str, | |
| language: Optional[str], | |
| hotwords: Optional[List[str]], | |
| ) -> tuple[str, dict[str, Any]]: |
🤖 Prompt for AI Agents
In echo/server/dembrane/transcribe.py around lines 129-133, the language
parameter is marked with "# pyright: ignore[reportUnusedParameter]" even though
it is referenced later (lines 150-154); remove the ignore comment so the linter
sees the parameter is used, and if pyright still complains add an explicit use
(e.g., pass it into the function/variable where needed or reference it before
conditional logic) or update the signature to match actual usage to eliminate
the false positive.
| # TODO: using webhooks will be ideal, but this is easy to impl and test for ;) | ||
| # we will be blocking some of our cheap "workers" here with time.sleep | ||
| while True: | ||
| transcript = requests.get(polling_endpoint, headers=headers).json() | ||
| if transcript["status"] == "completed": | ||
| # return both to add the diarization response later... | ||
| return transcript["text"], transcript | ||
| elif transcript["status"] == "error": | ||
| raise RuntimeError(f"Transcription failed: {transcript['error']}") | ||
| else: | ||
| time.sleep(3) |
There was a problem hiding this comment.
Polling without backoff or timeout is risky.
The infinite loop with fixed 3-second sleep could run forever if AssemblyAI never returns "completed" or "error". Add timeout and exponential backoff.
+ max_wait_time = 600 # 10 minutes max
+ start_time = time.time()
+ poll_interval = 3
+
# TODO: using webhooks will be ideal, but this is easy to impl and test for ;)
# we will be blocking some of our cheap "workers" here with time.sleep
while True:
+ if time.time() - start_time > max_wait_time:
+ raise TranscriptionError(f"Transcription timed out after {max_wait_time} seconds")
+
transcript = requests.get(polling_endpoint, headers=headers).json()
if transcript["status"] == "completed":
# return both to add the diarization response later...
return transcript["text"], transcript
elif transcript["status"] == "error":
raise RuntimeError(f"Transcription failed: {transcript['error']}")
else:
- time.sleep(3)
+ time.sleep(poll_interval)
+ poll_interval = min(poll_interval * 1.5, 30) # Cap at 30 secondsCommittable suggestion skipped: line range outside the PR's diff.
🤖 Prompt for AI Agents
In echo/server/dembrane/transcribe.py around lines 166-176, the current polling
loop can run forever and uses a fixed 3s sleep; change it to stop after a
configurable overall timeout (e.g., max_wait_seconds) or max_retries, use
requests.get with a timeout parameter, and implement exponential backoff with
optional jitter between retries (e.g., sleep = min(base * 2**attempt,
max_backoff) ± small jitter). Track elapsed time or attempts and raise a
TimeoutError (or RuntimeError) if exceeded; keep handling of "completed" and
"error" statuses the same and ensure any exceptions from requests are caught and
retried up to the limit.
| def _build_hotwords(conversation: dict) -> Optional[List[str]]: | ||
| """Build the hotwords from the conversation""" | ||
| hotwords_str = conversation["project_id"].get("default_conversation_transcript_prompt") | ||
| if hotwords_str: | ||
| return [str(word.strip()) for word in hotwords_str.split(",")] | ||
| return None |
There was a problem hiding this comment.
🛠️ Refactor suggestion
🧩 Analysis chain
Hotwords extraction logic doesn't match naming.
Function extracts from default_conversation_transcript_prompt which sounds like a prompt template, not a comma-separated hotword list. Either the field is misnamed or this is extracting from the wrong place.
🏁 Script executed:
#!/bin/bash
# Check how default_conversation_transcript_prompt is actually used across the codebase
rg -n "default_conversation_transcript_prompt" --type py -C 3Length of output: 4240
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "=== transcribe.py (1-400) ==="
nl -ba echo/server/dembrane/transcribe.py | sed -n '1,400p'
echo
echo "=== grep for _build_hotwords usages ==="
rg -n "_build_hotwords" || true
echo
echo "=== grep for 'hotword' occurrences ==="
rg -n "hotword" || true
echo
echo "=== reply_utils.py (1-420) ==="
nl -ba echo/server/dembrane/reply_utils.py | sed -n '1,420p'
echo
echo "=== database.py (1-240) ==="
nl -ba echo/server/dembrane/database.py | sed -n '1,240p'Length of output: 70
Do not parse default_conversation_transcript_prompt as hotwords — use a dedicated field or rename.
default_conversation_transcript_prompt is a prompt template (stored as Text and appended to transcript prompts); splitting it on commas will corrupt prompts. Read hotwords from a dedicated field (e.g., default_conversation_hotwords) or rename/repurpose the DB field and update usages.
Refs: echo/server/dembrane/transcribe.py (≈241–244, 248–253), echo/server/dembrane/reply_utils.py (≈160–163, 311–314), echo/server/dembrane/database.py (≈130–132)
🤖 Prompt for AI Agents
In echo/server/dembrane/transcribe.py around lines 248–253, the function
currently splits
conversation["project_id"].get("default_conversation_transcript_prompt") into
hotwords, but that DB field is a prompt template and must not be parsed as
hotwords; change the code to read hotwords from a dedicated field (e.g.,
conversation["project_id"].get("default_conversation_hotwords")) instead, and
stop splitting or trimming the transcript prompt. Update usages and DB schema
accordingly (or rename/repurpose the DB field and update all references in
reply_utils.py and database.py), ensuring the transcript prompt remains intact
and hotwords come from the new dedicated field.
| path = save_to_s3_from_url( | ||
| "https://github.com/runpod-workers/sample-inputs/raw/refs/heads/main/audio/Arthur.mp3", | ||
| public=True, | ||
| ) |
There was a problem hiding this comment.
🧹 Nitpick
🧩 Analysis chain
Verify S3 public access flag consistency.
The fixture uploads with public=True but generates a signed URL later in the transcription flow. This works but consider if public access is truly needed for test assets.
🏁 Script executed:
#!/bin/bash
# Check if test assets actually need public access or if signed URLs suffice
rg -n "save_to_s3_from_url.*public=True" --type pyLength of output: 492
Verify S3 public access flag consistency
rg found three occurrences of save_to_s3_from_url(public=True):
- echo/server/tests/test_transcribe_runpod.py:33,69 — test fixtures upload public=True; switch to public=False and use signed URLs or mocks unless public is required.
- echo/server/dembrane/image_utils.py:119 — image uploads are public; keep only if these assets must be publicly served (document reason).
🤖 Prompt for AI Agents
In echo/server/tests/test_transcribe_assembly.py around lines 43 to 46, the test
calls save_to_s3_from_url(..., public=True) which makes test artifacts publicly
accessible; change the call to public=False and update the test to either
generate/use a presigned (signed) URL for access or mock save_to_s3_from_url to
return a stable test URL/path so the test does not rely on public S3 objects;
also search the repo for the other reported occurrences
(echo/server/tests/test_transcribe_runpod.py and
echo/server/dembrane/image_utils.py) and switch test fixture uploads to
public=False (or document/justify keeping public=True for image assets that must
be publicly served) and update tests/mocks accordingly.
| logger.info("teardown") | ||
| directus.delete_item("conversation_chunk", cc["id"]) | ||
| directus.delete_item("conversation", c["id"]) | ||
| directus.delete_item("project", p["id"]) | ||
| delete_from_s3(path) |
There was a problem hiding this comment.
🧹 Nitpick
Consider extracting common teardown logic.
Both fixtures have identical teardown sequences. A helper function would reduce duplication.
def _teardown_test_data(cc_id: str, c_id: str, p_id: str, path: str):
"""Common teardown for test fixtures."""
directus.delete_item("conversation_chunk", cc_id)
directus.delete_item("conversation", c_id)
directus.delete_item("project", p_id)
delete_from_s3(path)Also applies to: 111-115
🤖 Prompt for AI Agents
In echo/server/tests/test_transcribe_assembly.py around lines 64-68 (and
similarly 111-115), the teardown sequence is duplicated; extract a helper
function named something like _teardown_test_data(cc_id: str, c_id: str, p_id:
str, path: str) that calls directus.delete_item for conversation_chunk,
conversation and project and then delete_from_s3(path), place it near other test
helpers or at module level, and update both fixtures to call this helper with
the appropriate ids/paths to remove the duplicated code.
| def test_transcribe_audio_assemblyai(): | ||
| transcript, response = transcribe_audio_assemblyai( | ||
| audio_file_uri="https://github.com/runpod-workers/sample-inputs/raw/refs/heads/main/audio/Arthur.mp3", | ||
| language="en", | ||
| hotwords=["Arther"], | ||
| ) | ||
|
|
||
| assert transcript is not None | ||
| assert response is not None | ||
| assert response.get("words") is not None | ||
| assert response.get("words") is not None |
There was a problem hiding this comment.
🧹 Nitpick
Fix duplicate assertion and typo in hotwords.
Line 151 duplicates the assertion from line 150. Also, "Arther" should likely be "Arthur" to match the audio sample.
- hotwords=["Arther"],
+ hotwords=["Arthur"],
)
assert transcript is not None
assert response is not None
assert response.get("words") is not None
- assert response.get("words") is not None📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| def test_transcribe_audio_assemblyai(): | |
| transcript, response = transcribe_audio_assemblyai( | |
| audio_file_uri="https://github.com/runpod-workers/sample-inputs/raw/refs/heads/main/audio/Arthur.mp3", | |
| language="en", | |
| hotwords=["Arther"], | |
| ) | |
| assert transcript is not None | |
| assert response is not None | |
| assert response.get("words") is not None | |
| assert response.get("words") is not None | |
| def test_transcribe_audio_assemblyai(): | |
| transcript, response = transcribe_audio_assemblyai( | |
| audio_file_uri="https://github.com/runpod-workers/sample-inputs/raw/refs/heads/main/audio/Arthur.mp3", | |
| language="en", | |
| hotwords=["Arthur"], | |
| ) | |
| assert transcript is not None | |
| assert response is not None | |
| assert response.get("words") is not None |
🤖 Prompt for AI Agents
In echo/server/tests/test_transcribe_assembly.py around lines 141 to 151, the
test contains a duplicated assertion (response.get("words") asserted twice) and
a typo in the hotwords list ("Arther"); remove the duplicate assertion so each
check is unique (keep one assert response.get("words") is not None) and correct
the hotword to "Arthur" to match the audio sample.
- add assembly ai provider - actually use the ENABLE_RUNPOD_DIARIZATION variable <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit - New Features - Added AssemblyAI transcription support with language detection and optional hotwords. - Enabled provider selection (AssemblyAI, RunPod, or LiteLLM) with unified error handling. - Saved diarization data alongside transcripts when available. - Refactor - Consolidated transcription settings into a dedicated configuration section and removed legacy flags. - Simplified service dependencies by removing the internal event system. - Tests - Added integration tests for AssemblyAI transcription, covering English and Dutch audio flows. <!-- end of auto-generated comment: release notes by coderabbit.ai -->
Summary by CodeRabbit