Conversation
- Updated logging messages in run_etl.py to indicate progress through the ETL pipeline stages. - Refactored AudioETLPipeline to separate processing of audio and non-audio files, improving clarity and functionality. - Enhanced ContextualChunkETLPipeline to handle non-audio segments more effectively, including transcript management and error handling. - Improved DirectusETLPipeline to ensure proper handling of missing audio paths by introducing a placeholder value. - Streamlined ProcessTracker methods for better data management and clarity.
- Updated LiteLLM configuration documentation to include new settings for Redis lock management, including `REDIS_LOCK_PREFIX` and `REDIS_LOCK_EXPIRY`. - Refactored `run_etl.py` to implement Redis locks, preventing concurrent processing of the same conversation ID and ensuring smoother ETL pipeline execution. - Changed `AUDIO_LIGHTRAG_TIME_THRESHOLD_SECONDS` to `AUDIO_LIGHTRAG_COOL_OFF_TIME_SECONDS` for clarity in audio processing settings. - Improved error handling and logging in the ETL pipeline to track Redis lock status and processing flow. - Cleaned up unused code in the audio ETL pipeline for better maintainability.
WalkthroughThis update introduces Redis-based locking to the ETL pipeline to prevent concurrent or repeated processing of the same conversation IDs within a set expiry window. The configuration for audio processing cool-off time and Redis locks is clarified and documented, with environment variable names updated accordingly. The ETL, audio, and contextual chunk pipelines are refactored to handle audio and non-audio files distinctly, ensuring proper transcript handling and LightRAG insertion. Documentation is updated to reflect these changes, and minor utility methods are cleaned up for clarity. Changes
Sequence Diagram(s)sequenceDiagram
participant User
participant ETL Runner
participant Redis
participant Directus
participant LightRAG
User->>ETL Runner: Trigger ETL with conv_id_list
ETL Runner->>Redis: Check for locks on conv_id_list
Redis-->>ETL Runner: Return locked/unlocked IDs
ETL Runner->>ETL Runner: Filter out locked IDs
alt IDs available
ETL Runner->>Redis: Set locks for processing IDs
ETL Runner->>Directus: Run Directus ETL pipeline
Directus-->>ETL Runner: Return processed data
ETL Runner->>LightRAG: Insert transcripts (audio/non-audio logic)
LightRAG-->>ETL Runner: Ack
ETL Runner->>Redis: Release locks on error
else All IDs locked
ETL Runner-->>User: Log and exit early
end
Assessment against linked issues
Suggested reviewers
Poem
✨ Finishing Touches
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Actionable comments posted: 5
🔭 Outside diff range comments (1)
echo/server/dembrane/audio_lightrag/pipelines/audio_etl_pipeline.py (1)
60-78:⚠️ Potential issuePotential infinite loop if
process_audio_filesreturns unchanged listIf
process_audio_filesdecides a file is too large or hits an internal guard, theunprocessed_chunk_file_uri_licould come back unchanged ⇒ infinite while-loop. Recommend breaking after N identical iterations or asserting the list shrinks each pass.
♻️ Duplicate comments (1)
echo/server/dembrane/audio_lightrag/pipelines/contextual_chunk_etl_pipeline.py (1)
114-118: Same shadowing bug hits non-audio path
The non-audio branch repeats the overwrite issue. Apply the same fix to dodge attribute errors.
🧹 Nitpick comments (10)
echo/server/dembrane/config.py (1)
318-326: Expose lock settings per-envGreat to see locking baked in. One minor win: bubble
REDIS_LOCK_EXPIRY/REDIS_LOCK_PREFIXthrough aget_int()/get_str()helper so tests can override them without env-munging. Totally optional, but it keeps CI slick.echo/server/dembrane/audio_lightrag/pipelines/directus_etl_pipeline.py (2)
81-85: UsePath().suffixfor safer format parsingSplitting on
"."works until we meet filenames likeaudio.backup.2024.mp3. A pathlib-based read is bullet-proof and marginally faster:-from pathlib import Path - -conversation_df['format'] = conversation_df.path.apply(lambda x: x.split('.')[-1]) +from pathlib import Path + +conversation_df['format'] = conversation_df.path.apply(lambda p: Path(p).suffix.lstrip('.'))
103-106: Vectorise the cool-off filter
apply(lambda …)is O(N ) Python loops. Pandas can do this in C:-# take diff between current_timestamp and timestamp -timestamp_diff = conversation_df['timestamp'].apply(lambda x: (run_timestamp - x).total_seconds()) -conversation_df = conversation_df[timestamp_diff > int(AUDIO_LIGHTRAG_COOL_OFF_TIME_SECONDS)] +# vectorised delta in seconds +timestamp_diff = (run_timestamp - conversation_df['timestamp']).dt.total_seconds() +conversation_df = conversation_df[timestamp_diff > AUDIO_LIGHTRAG_COOL_OFF_TIME_SECONDS]Improves large-batch performance noticeably.
echo/docs/litellm_config.md (2)
45-48: Tiny comma nit“Files will not be processed if uploaded earlier than cooloff. Currently disabled…”
Add a comma after “Currently” to appease grammar bots.
🧰 Tools
🪛 LanguageTool
[uncategorized] ~47-~47: A comma may be missing after the conjunctive/linking adverb ‘Currently’.
Context: ...essed if uploaded earlier than cooloff. Currently disabled, pass the current tz in run_et...(SENT_START_CONJUNCTIVE_LINKING_ADVERB_COMMA)
55-57: Good call-out on Redis locks – maybe link to config snippetConsider adding a one-liner on how the key actually looks:
<REDIS_LOCK_PREFIX><conversation_id>– saves newcomers a dive into the source.🧰 Tools
🪛 LanguageTool
[uncategorized] ~57-~57: Loose punctuation mark.
Context: ... the ETL pipeline. -REDIS_LOCK_EXPIRY: Time in seconds before a Redis lock exp...(UNLIKELY_OPENING_PUNCTUATION)
echo/server/dembrane/audio_lightrag/pipelines/audio_etl_pipeline.py (1)
45-58: Derive unique (project, conversation) pairs without Pythonset(zip())Pandas can hand you uniques directly and avoid the temporary Python objects:
-zip_unique_audio = list( - set( - zip( - transform_audio_process_tracker_df.project_id, - transform_audio_process_tracker_df.conversation_id, - strict=True - ) - ) -) +zip_unique_audio = ( + transform_audio_process_tracker_df + .loc[:, ['project_id', 'conversation_id']] + .drop_duplicates() + .itertuples(index=False, name=None) +)Cleaner & faster.
echo/server/dembrane/audio_lightrag/main/run_etl.py (2)
42-44: Reuse your Redis connection; save the sockets for real work
You spin upredis.from_urlhere and again in the failure handler. One client per run is plenty; pass it around instead of reconnecting.
50-53: Negative TTL handling
ttlreturns-2/-1for “no key / no expiry”. Dividing by 60 then rounding yields weird negatives in the logs. Guard the edge case to keep logs sane.echo/server/dembrane/audio_lightrag/pipelines/contextual_chunk_etl_pipeline.py (2)
37-41: Stop recomputing the tracker DF twice per iteration
self.process_tracker()is called twice on the same line – that’s two DataFrame materialisations every loop. Cache it locally once; your CPU will thank you.- for conversation_id in self.process_tracker().conversation_id.unique(): - load_tracker = self.process_tracker()[self.process_tracker()['conversation_id'] == conversation_id] + tracker_df = self.process_tracker() + for conversation_id in tracker_df.conversation_id.unique(): + load_tracker = tracker_df[tracker_df['conversation_id'] == conversation_id]
61-68: Missing stream close might leak sockets
get_stream_from_s3(...).read()leaves the stream open. Wrap it in a context manager or explicitlyclose()afterwards to free up the underlying HTTP connection.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (7)
echo/docs/litellm_config.md(1 hunks)echo/server/dembrane/audio_lightrag/main/run_etl.py(6 hunks)echo/server/dembrane/audio_lightrag/pipelines/audio_etl_pipeline.py(3 hunks)echo/server/dembrane/audio_lightrag/pipelines/contextual_chunk_etl_pipeline.py(4 hunks)echo/server/dembrane/audio_lightrag/pipelines/directus_etl_pipeline.py(3 hunks)echo/server/dembrane/audio_lightrag/utils/process_tracker.py(0 hunks)echo/server/dembrane/config.py(2 hunks)
💤 Files with no reviewable changes (1)
- echo/server/dembrane/audio_lightrag/utils/process_tracker.py
🧰 Additional context used
🧬 Code Graph Analysis (2)
echo/server/dembrane/audio_lightrag/pipelines/audio_etl_pipeline.py (2)
echo/server/dembrane/audio_lightrag/utils/audio_utils.py (1)
create_directus_segment(97-106)echo/server/dembrane/audio_lightrag/utils/process_tracker.py (1)
update_value_for_chunk_id(34-35)
echo/server/dembrane/audio_lightrag/pipelines/contextual_chunk_etl_pipeline.py (3)
echo/server/dembrane/s3.py (1)
get_stream_from_s3(175-179)echo/server/dembrane/api/stateless.py (2)
insert_item(80-115)InsertRequest(70-73)echo/server/dembrane/api/dependency_auth.py (1)
DirectusSession(13-22)
🪛 LanguageTool
echo/docs/litellm_config.md
[uncategorized] ~47-~47: A comma may be missing after the conjunctive/linking adverb ‘Currently’.
Context: ...essed if uploaded earlier than cooloff. Currently disabled, pass the current tz in run_et...
(SENT_START_CONJUNCTIVE_LINKING_ADVERB_COMMA)
[uncategorized] ~57-~57: Loose punctuation mark.
Context: ... the ETL pipeline. - REDIS_LOCK_EXPIRY: Time in seconds before a Redis lock exp...
(UNLIKELY_OPENING_PUNCTUATION)
⏰ Context from checks skipped due to timeout of 90000ms (1)
- GitHub Check: ci-check-server
🔇 Additional comments (3)
echo/server/dembrane/config.py (1)
280-287: Constant rename LGTM – confirm callers are updatedThe switch to
AUDIO_LIGHTRAG_COOL_OFF_TIME_SECONDSis clean and the default of60keeps existing behaviour. Just grep the repo once to be 100 % sure no module is still referencing the retiredAUDIO_LIGHTRAG_TIME_THRESHOLD_SECONDS— otherwise you’ll trip aNameErrorat runtime.echo/server/dembrane/audio_lightrag/pipelines/directus_etl_pipeline.py (1)
7-7: Import rename is on pointNo issues – matches the config change.
echo/server/dembrane/audio_lightrag/pipelines/audio_etl_pipeline.py (1)
9-10: Import ofcreate_directus_segmentLGTMBalances the new non-audio path nicely.
echo/server/dembrane/audio_lightrag/pipelines/audio_etl_pipeline.py
Outdated
Show resolved
Hide resolved
echo/server/dembrane/audio_lightrag/pipelines/contextual_chunk_etl_pipeline.py
Show resolved
Hide resolved
- Updated LiteLLM configuration documentation to rename `AUTO_SELECT_ENABLED` to `ENABLE_CHAT_AUTO_SELECT` for clarity. - Improved Redis lock handling in `run_etl.py` to atomically acquire locks and provide informative logging for conversation processing. - Enhanced `AudioETLPipeline` to streamline chunk processing by fetching transcripts in bulk and improving mapping logic for conversation segments. - Refactored `ContextualChunkETLPipeline` to improve error handling and response validation during item insertion in Directus.
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (2)
echo/docs/litellm_config.md (1)
47-47: Cooloff param docs could use a grammar boostThe description is solid but needs a comma after "Currently". Also, "tz" is dev shorthand - spell out "timezone" for maximum clarity.
-AUDIO_LIGHTRAG_COOL_OFF_TIME_SECONDS`: Time threshold for audio processing in seconds (default: 60). Files will not be processed if uploaded earlier than cooloff. Currently disabled, pass the current tz in run_etl to enable +AUDIO_LIGHTRAG_COOL_OFF_TIME_SECONDS`: Time threshold for audio processing in seconds (default: 60). Files will not be processed if uploaded earlier than cooloff. Currently disabled, pass the current timezone in run_etl to enable🧰 Tools
🪛 LanguageTool
[uncategorized] ~47-~47: A comma may be missing after the conjunctive/linking adverb ‘Currently’.
Context: ...essed if uploaded earlier than cooloff. Currently disabled, pass the current tz in run_et...(SENT_START_CONJUNCTIVE_LINKING_ADVERB_COMMA)
echo/server/dembrane/audio_lightrag/main/run_etl.py (1)
71-72: Fix indentation in run call for consistencyThe indentation is a bit off. Let's clean it up.
- process_tracker = directus_pl.run(filtered_conv_ids, - run_timestamp=None) # pass timestamp to avoid processing files uploaded earlier than cooloff + process_tracker = directus_pl.run(filtered_conv_ids, + run_timestamp=None) # pass timestamp to avoid processing files uploaded earlier than cooloff
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
echo/docs/litellm_config.md(1 hunks)echo/server/dembrane/audio_lightrag/main/run_etl.py(6 hunks)echo/server/dembrane/audio_lightrag/pipelines/audio_etl_pipeline.py(3 hunks)echo/server/dembrane/audio_lightrag/pipelines/contextual_chunk_etl_pipeline.py(4 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
- echo/server/dembrane/audio_lightrag/pipelines/contextual_chunk_etl_pipeline.py
- echo/server/dembrane/audio_lightrag/pipelines/audio_etl_pipeline.py
🧰 Additional context used
🪛 LanguageTool
echo/docs/litellm_config.md
[uncategorized] ~47-~47: A comma may be missing after the conjunctive/linking adverb ‘Currently’.
Context: ...essed if uploaded earlier than cooloff. Currently disabled, pass the current tz in run_et...
(SENT_START_CONJUNCTIVE_LINKING_ADVERB_COMMA)
[uncategorized] ~57-~57: Loose punctuation mark.
Context: ... the ETL pipeline. - REDIS_LOCK_EXPIRY: Time in seconds before a Redis lock exp...
(UNLIKELY_OPENING_PUNCTUATION)
⏰ Context from checks skipped due to timeout of 90000ms (1)
- GitHub Check: ci-check-server
🔇 Additional comments (7)
echo/docs/litellm_config.md (2)
53-53: Feature flag naming synced - LGTMSolid fix syncing the feature flag name with what's in the code. No confusion = happy devs.
55-57: Redis lock documentation is 💯Clear and comprehensive explanation of the new Redis lock params. Default values are explicitly called out, 10/10 docs.
🧰 Tools
🪛 LanguageTool
[uncategorized] ~57-~57: Loose punctuation mark.
Context: ... the ETL pipeline. -REDIS_LOCK_EXPIRY: Time in seconds before a Redis lock exp...(UNLIKELY_OPENING_PUNCTUATION)
echo/server/dembrane/audio_lightrag/main/run_etl.py (5)
4-7: Redis imports look sharpClean imports for Redis client and config constants. Exactly what's needed, nothing more.
28-28: Docstring update is on pointConcise explanation of the Redis lock behavior matching the 1-hour default from the config.
42-66: Redis lock implementation is fire 🔥Atomic lock acquisition with
SET ... NXis exactly what we needed to squash race conditions. Smart TTL conversion to minutes for human-readable logs, and early bailout if all IDs are locked. 10x engineering.
73-73: Progress indicators in logs are a nice touchAdding "1/3...", "2/3...", "3/3..." to the logs makes it much easier to track pipeline progress. Simple but effective UX improvement.
Also applies to: 82-82, 91-91
101-108: Lock cleanup on failure is crucial - LGTMSolid error handling with proper lock cleanup on failure. Reinitializing the Redis client in the except block is a smart defensive move in case the original connection is borked.
… handling - Updated LiteLLM configuration documentation to clarify the `AUDIO_LIGHTRAG_COOL_OFF_TIME_SECONDS` setting and its dependency on the current timezone stamp. - Renamed `REDIS_LOCK_PREFIX` to `AUDIO_LIGHTRAG_REDIS_LOCK_PREFIX` in the configuration to improve clarity and consistency. - Adjusted `run_etl.py` to utilize the new `AUDIO_LIGHTRAG_REDIS_LOCK_PREFIX` for Redis lock management, ensuring proper handling of conversation IDs during processing.
…o-implement-lock-for-processing
There was a problem hiding this comment.
Actionable comments posted: 0
♻️ Duplicate comments (1)
echo/server/dembrane/audio_lightrag/main/run_etl.py (1)
99-109: 🛠️ Refactor suggestionMove filtered_conv_ids initialization earlier to prevent UnboundLocalError
The filtered_conv_ids initialization should happen before any code that might throw exceptions to ensure it's always defined in the exception handler.
try: if not conv_id_list: logger.error("Empty conversation ID list provided") return None + filtered_conv_ids = [] # Initialize early to ensure it's defined for cleanup paths # Filter conversation IDs that are already being processed (via Redis locks) redis_client = redis.from_url(REDIS_URL) - filtered_conv_ids = []
🧹 Nitpick comments (2)
echo/server/dembrane/audio_lightrag/main/run_etl.py (1)
99-104: DRY up the Redis client initializationYou're initializing the Redis client twice - once in the main flow and once in the exception handler. Extract this to a single initialization at the beginning of the function to keep things DRY.
try: + # Initialize Redis client once for both normal flow and exception handling + redis_client = redis.from_url(REDIS_URL) + filtered_conv_ids = [] # Initialize early to ensure it's defined for cleanup paths + if not conv_id_list: logger.error("Empty conversation ID list provided") return None # Filter conversation IDs that are already being processed (via Redis locks) - redis_client = redis.from_url(REDIS_URL) - filtered_conv_ids = []echo/docs/litellm_config.md (1)
47-47: Fix duplicate word in cool-off time documentationYou've got a word duplication - "current current" should just be "current".
- `AUDIO_LIGHTRAG_COOL_OFF_TIME_SECONDS`: Time threshold for audio processing in seconds (default: 60). Files will not be processed if uploaded earlier than cooloff. Currently disabled, pass the current current tz stamp of directus in run_etl to enable + `AUDIO_LIGHTRAG_COOL_OFF_TIME_SECONDS`: Time threshold for audio processing in seconds (default: 60). Files will not be processed if uploaded earlier than cooloff. Currently disabled, pass the current tz stamp of directus in run_etl to enable🧰 Tools
🪛 LanguageTool
[uncategorized] ~47-~47: A comma may be missing after the conjunctive/linking adverb ‘Currently’.
Context: ...essed if uploaded earlier than cooloff. Currently disabled, pass the current current tz s...(SENT_START_CONJUNCTIVE_LINKING_ADVERB_COMMA)
[duplication] ~47-~47: Possible typo: you repeated a word.
Context: ...n cooloff. Currently disabled, pass the current current tz stamp of directus in run_etl to enab...(ENGLISH_WORD_REPEAT_RULE)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
echo/docs/litellm_config.md(1 hunks)echo/server/dembrane/audio_lightrag/main/run_etl.py(6 hunks)echo/server/dembrane/config.py(2 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- echo/server/dembrane/config.py
🧰 Additional context used
🪛 LanguageTool
echo/docs/litellm_config.md
[uncategorized] ~47-~47: A comma may be missing after the conjunctive/linking adverb ‘Currently’.
Context: ...essed if uploaded earlier than cooloff. Currently disabled, pass the current current tz s...
(SENT_START_CONJUNCTIVE_LINKING_ADVERB_COMMA)
[duplication] ~47-~47: Possible typo: you repeated a word.
Context: ...n cooloff. Currently disabled, pass the current current tz stamp of directus in run_etl to enab...
(ENGLISH_WORD_REPEAT_RULE)
[uncategorized] ~57-~57: Loose punctuation mark.
Context: ... the ETL pipeline. - REDIS_LOCK_EXPIRY: Time in seconds before a Redis lock exp...
(UNLIKELY_OPENING_PUNCTUATION)
⏰ Context from checks skipped due to timeout of 90000ms (1)
- GitHub Check: ci-check-server
🔇 Additional comments (3)
echo/server/dembrane/audio_lightrag/main/run_etl.py (2)
42-67: LGTM! You've crushed the Redis lock implementation! 🚀Solid implementation using atomic SET with NX and EX flags - avoids race conditions like a boss. The TTL check for informative logging is a nice touch for observability.
71-73: LGTM! Progress indicators in logs = 💯Adding progress indicators (1/3, 2/3, 3/3) to log messages is a killer observability improvement.
echo/docs/litellm_config.md (1)
55-57: LGTM! Redis lock config docs - critical for engineers to understand the lock mechanismSolid documentation of the Redis lock configuration parameters. The expiry timing and lock prefix are well explained, making it clear how the ETL process avoids duplicate processing.
🧰 Tools
🪛 LanguageTool
[uncategorized] ~57-~57: Loose punctuation mark.
Context: ... the ETL pipeline. -REDIS_LOCK_EXPIRY: Time in seconds before a Redis lock exp...(UNLIKELY_OPENING_PUNCTUATION)
|
Made requested changes. Ok to merge i think @spashii |
- Renamed `REDIS_LOCK_EXPIRY` to `AUDIO_LIGHTRAG_REDIS_LOCK_EXPIRY` in the configuration for consistency. - Adjusted `run_etl.py` to utilize the new `AUDIO_LIGHTRAG_REDIS_LOCK_EXPIRY` variable for Redis lock management, ensuring proper handling of conversation IDs during processing. - Updated documentation to reflect the changes in Redis lock configuration.
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (1)
echo/docs/litellm_config.md (1)
47-47: Accurate renaming but minor text duplicationThe renaming from TIME_THRESHOLD to COOL_OFF_TIME makes the parameter's purpose much clearer. However, there's a word duplication: "current current".
- `AUDIO_LIGHTRAG_COOL_OFF_TIME_SECONDS`: Time threshold for audio processing in seconds (default: 60). Files will not be processed if uploaded earlier than cooloff. Currently disabled, pass the current current tz stamp of directus in run_etl to enable + `AUDIO_LIGHTRAG_COOL_OFF_TIME_SECONDS`: Time threshold for audio processing in seconds (default: 60). Files will not be processed if uploaded earlier than cooloff. Currently disabled, pass the current tz stamp of directus in run_etl to enable🧰 Tools
🪛 LanguageTool
[uncategorized] ~47-~47: A comma may be missing after the conjunctive/linking adverb ‘Currently’.
Context: ...essed if uploaded earlier than cooloff. Currently disabled, pass the current current tz s...(SENT_START_CONJUNCTIVE_LINKING_ADVERB_COMMA)
[duplication] ~47-~47: Possible typo: you repeated a word.
Context: ...n cooloff. Currently disabled, pass the current current tz stamp of directus in run_etl to enab...(ENGLISH_WORD_REPEAT_RULE)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
echo/docs/litellm_config.md(1 hunks)echo/server/dembrane/audio_lightrag/main/run_etl.py(6 hunks)echo/server/dembrane/config.py(2 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- echo/server/dembrane/config.py
🧰 Additional context used
🧬 Code Graph Analysis (1)
echo/server/dembrane/audio_lightrag/main/run_etl.py (3)
echo/server/dembrane/audio_lightrag/pipelines/directus_etl_pipeline.py (2)
DirectusETLPipeline(14-129)run(122-129)echo/server/dembrane/audio_lightrag/pipelines/audio_etl_pipeline.py (1)
run(151-154)echo/server/dembrane/audio_lightrag/pipelines/contextual_chunk_etl_pipeline.py (1)
run(123-126)
🪛 LanguageTool
echo/docs/litellm_config.md
[uncategorized] ~47-~47: A comma may be missing after the conjunctive/linking adverb ‘Currently’.
Context: ...essed if uploaded earlier than cooloff. Currently disabled, pass the current current tz s...
(SENT_START_CONJUNCTIVE_LINKING_ADVERB_COMMA)
[duplication] ~47-~47: Possible typo: you repeated a word.
Context: ...n cooloff. Currently disabled, pass the current current tz stamp of directus in run_etl to enab...
(ENGLISH_WORD_REPEAT_RULE)
[uncategorized] ~57-~57: Loose punctuation mark.
Context: ...ne. - AUDIO_LIGHTRAG_REDIS_LOCK_EXPIRY: Time in seconds before a Redis lock exp...
(UNLIKELY_OPENING_PUNCTUATION)
⏰ Context from checks skipped due to timeout of 90000ms (1)
- GitHub Check: ci-check-server
🔇 Additional comments (9)
echo/server/dembrane/audio_lightrag/main/run_etl.py (7)
7-11: Solid Redis config imports! 🚀Clean separation of Redis config params with explicit naming conventions. This makes it clear these are specific to the audio_lightrag ETL pipeline and avoids namespace collisions.
32-32: Slick docstring update!Accurate reflection of the new Redis locking capability makes for self-documenting code.
46-65: Atomic lock acquisition FTW! 💯Excellent implementation of the Redis lock system. You've properly used Redis's atomic SET with NX and expiry options, crushing any race conditions. Great job adding TTL checking for informative logging too - that'll be a huge help during debugging.
66-70: Perfect early return pattern!Smart check to bail out early if all conversation IDs are locked - no point in spinning up the ETL pipeline machinery if there's nothing to process.
75-77: Clean pipeline execution update!Setting
run_timestamp=Noneexplicitly makes it clear this is an intentional choice rather than a forgotten parameter. The numbered logging (1/3) provides great visibility into the pipeline progress.
86-86: Consistent progress logging! 👌Maintaining the X/3 format across all pipeline stages helps engineers tracking logs to understand exactly where something failed.
Also applies to: 95-95
105-112: Robust failure handling!Great defensive programming - releasing locks on failure ensures the system can recover gracefully. The try/except around the release operation itself is particularly thorough.
One small optimization I'd consider: initialize
filtered_conv_idsbefore the first try block to ensure it's always defined even if Redis initialization fails.def run_etl_pipeline(conv_id_list: list[str]) -> Optional[bool]: """ ... """ + filtered_conv_ids: list[str] = [] try: if not conv_id_list: logger.error("Empty conversation ID list provided") return None # Filter conversation IDs that are already being processed (via Redis locks) redis_client = redis.from_url(REDIS_URL) - filtered_conv_ids = []echo/docs/litellm_config.md (2)
53-54: Flag renaming for consistencyGreat, renamed the feature flag to match the naming pattern of other flags.
55-57: Redis lock docs looking sharp! 🔐Clean documentation of the Redis lock config parameters. The explanations are clear and give users enough context to understand what these settings control without diving into the implementation.
🧰 Tools
🪛 LanguageTool
[uncategorized] ~57-~57: Loose punctuation mark.
Context: ...ne. -AUDIO_LIGHTRAG_REDIS_LOCK_EXPIRY: Time in seconds before a Redis lock exp...(UNLIKELY_OPENING_PUNCTUATION)
…o-implement-lock-for-processing
…essing (#124) * Enhance ETL pipeline logging and audio processing (ECHO-165) - Updated logging messages in run_etl.py to indicate progress through the ETL pipeline stages. - Refactored AudioETLPipeline to separate processing of audio and non-audio files, improving clarity and functionality. - Enhanced ContextualChunkETLPipeline to handle non-audio segments more effectively, including transcript management and error handling. - Improved DirectusETLPipeline to ensure proper handling of missing audio paths by introducing a placeholder value. - Streamlined ProcessTracker methods for better data management and clarity. * Enhance LiteLLM configuration and ETL pipeline with Redis lock support - Updated LiteLLM configuration documentation to include new settings for Redis lock management, including `REDIS_LOCK_PREFIX` and `REDIS_LOCK_EXPIRY`. - Refactored `run_etl.py` to implement Redis locks, preventing concurrent processing of the same conversation ID and ensuring smoother ETL pipeline execution. - Changed `AUDIO_LIGHTRAG_TIME_THRESHOLD_SECONDS` to `AUDIO_LIGHTRAG_COOL_OFF_TIME_SECONDS` for clarity in audio processing settings. - Improved error handling and logging in the ETL pipeline to track Redis lock status and processing flow. - Cleaned up unused code in the audio ETL pipeline for better maintainability. * Refactor LiteLLM configuration and enhance ETL pipeline functionality - Updated LiteLLM configuration documentation to rename `AUTO_SELECT_ENABLED` to `ENABLE_CHAT_AUTO_SELECT` for clarity. - Improved Redis lock handling in `run_etl.py` to atomically acquire locks and provide informative logging for conversation processing. - Enhanced `AudioETLPipeline` to streamline chunk processing by fetching transcripts in bulk and improving mapping logic for conversation segments. - Refactored `ContextualChunkETLPipeline` to improve error handling and response validation during item insertion in Directus. * Refactor LiteLLM configuration and update ETL pipeline for Redis lock handling - Updated LiteLLM configuration documentation to clarify the `AUDIO_LIGHTRAG_COOL_OFF_TIME_SECONDS` setting and its dependency on the current timezone stamp. - Renamed `REDIS_LOCK_PREFIX` to `AUDIO_LIGHTRAG_REDIS_LOCK_PREFIX` in the configuration to improve clarity and consistency. - Adjusted `run_etl.py` to utilize the new `AUDIO_LIGHTRAG_REDIS_LOCK_PREFIX` for Redis lock management, ensuring proper handling of conversation IDs during processing. * Update LiteLLM configuration for Redis lock handling - Renamed `REDIS_LOCK_EXPIRY` to `AUDIO_LIGHTRAG_REDIS_LOCK_EXPIRY` in the configuration for consistency. - Adjusted `run_etl.py` to utilize the new `AUDIO_LIGHTRAG_REDIS_LOCK_EXPIRY` variable for Redis lock management, ensuring proper handling of conversation IDs during processing. - Updated documentation to reflect the changes in Redis lock configuration.
…o-implement-lock-for-processing
…o-implement-lock-for-processing
…essing (#124) * Enhance ETL pipeline logging and audio processing (ECHO-165) - Updated logging messages in run_etl.py to indicate progress through the ETL pipeline stages. - Refactored AudioETLPipeline to separate processing of audio and non-audio files, improving clarity and functionality. - Enhanced ContextualChunkETLPipeline to handle non-audio segments more effectively, including transcript management and error handling. - Improved DirectusETLPipeline to ensure proper handling of missing audio paths by introducing a placeholder value. - Streamlined ProcessTracker methods for better data management and clarity. * Enhance LiteLLM configuration and ETL pipeline with Redis lock support - Updated LiteLLM configuration documentation to include new settings for Redis lock management, including `REDIS_LOCK_PREFIX` and `REDIS_LOCK_EXPIRY`. - Refactored `run_etl.py` to implement Redis locks, preventing concurrent processing of the same conversation ID and ensuring smoother ETL pipeline execution. - Changed `AUDIO_LIGHTRAG_TIME_THRESHOLD_SECONDS` to `AUDIO_LIGHTRAG_COOL_OFF_TIME_SECONDS` for clarity in audio processing settings. - Improved error handling and logging in the ETL pipeline to track Redis lock status and processing flow. - Cleaned up unused code in the audio ETL pipeline for better maintainability. * Refactor LiteLLM configuration and enhance ETL pipeline functionality - Updated LiteLLM configuration documentation to rename `AUTO_SELECT_ENABLED` to `ENABLE_CHAT_AUTO_SELECT` for clarity. - Improved Redis lock handling in `run_etl.py` to atomically acquire locks and provide informative logging for conversation processing. - Enhanced `AudioETLPipeline` to streamline chunk processing by fetching transcripts in bulk and improving mapping logic for conversation segments. - Refactored `ContextualChunkETLPipeline` to improve error handling and response validation during item insertion in Directus. * Refactor LiteLLM configuration and update ETL pipeline for Redis lock handling - Updated LiteLLM configuration documentation to clarify the `AUDIO_LIGHTRAG_COOL_OFF_TIME_SECONDS` setting and its dependency on the current timezone stamp. - Renamed `REDIS_LOCK_PREFIX` to `AUDIO_LIGHTRAG_REDIS_LOCK_PREFIX` in the configuration to improve clarity and consistency. - Adjusted `run_etl.py` to utilize the new `AUDIO_LIGHTRAG_REDIS_LOCK_PREFIX` for Redis lock management, ensuring proper handling of conversation IDs during processing. * Update LiteLLM configuration for Redis lock handling - Renamed `REDIS_LOCK_EXPIRY` to `AUDIO_LIGHTRAG_REDIS_LOCK_EXPIRY` in the configuration for consistency. - Adjusted `run_etl.py` to utilize the new `AUDIO_LIGHTRAG_REDIS_LOCK_EXPIRY` variable for Redis lock management, ensuring proper handling of conversation IDs during processing. - Updated documentation to reflect the changes in Redis lock configuration. --------- Co-authored-by: Sameer Pashikanti <63326129+spashii@users.noreply.github.com>
Pull Request: Enhance ETL Pipeline with Redis Lock Support and Improved Audio Processing
Description
This PR introduces Redis lock support for the ETL pipeline while enhancing audio processing capabilities and improving overall logging. The changes focus on preventing concurrent processing conflicts and streamlining the audio processing workflow.
Redis Lock Integration
run_etl.pyto prevent concurrent processing of the same conversation IDREDIS_LOCK_PREFIX: Prefix for Redis lock keysREDIS_LOCK_EXPIRY: Lock expiration time settingsAudio Processing Improvements
AUDIO_LIGHTRAG_TIME_THRESHOLD_SECONDStoAUDIO_LIGHTRAG_COOL_OFF_TIME_SECONDSfor better clarityAudioETLPipelineto separate audio and non-audio file processingContextualChunkETLPipelinefor better handling of non-audio segmentsDirectusETLPipelineLogging and Code Quality
ProcessTrackermethods for improved data managementTesting Focus Areas
Related Ticket
ECHO-165
This PR enhances the robustness and reliability of our ETL pipeline while improving code maintainability and observability.
Summary by CodeRabbit