Skip to content

add worker readiness/liveness checks#93

Merged
spashii merged 2 commits intomainfrom
fix/20250401-0
Apr 1, 2025
Merged

add worker readiness/liveness checks#93
spashii merged 2 commits intomainfrom
fix/20250401-0

Conversation

@spashii
Copy link
Copy Markdown
Member

@spashii spashii commented Apr 1, 2025

  • Introduced sanitizeImageUrl function to ensure proper handling of image URLs during local development.
  • Updated AspectCard and ProjectLibraryAspect components to utilize the new sanitization function for image URLs.
  • Added scripts for worker liveness and readiness checks to enhance monitoring and reliability of Celery workers.
  • Modified task configurations to include a new CPU queue and improved task acknowledgment settings.

Summary by CodeRabbit

  • New Features

    • Enhanced image display by processing URLs securely, ensuring consistent rendering and appropriate fallback images.
    • Improved image storage with unique identification and dynamic file handling for better consistency.
    • Introduced enhanced background process monitoring and dedicated task routing for smoother, more reliable performance.
    • Added scripts for liveness and readiness checks to monitor worker status effectively.
  • Chores

    • Updated system configurations and added operational scripts to streamline worker management and readiness checks.

spashii added 2 commits April 1, 2025 10:42
…ecks

- Introduced `sanitizeImageUrl` function to ensure proper handling of image URLs during local development.
- Updated `AspectCard` and `ProjectLibraryAspect` components to utilize the new sanitization function for image URLs.
- Added scripts for worker liveness and readiness checks to enhance monitoring and reliability of Celery workers.
- Modified task configurations to include a new CPU queue and improved task acknowledgment settings.
- Updated method parameters in the LivenessProbe class to use underscore-prefixed names for unused variables, improving code readability.
- Added type ignore comments for Celery signal imports to suppress type checking warnings.
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Apr 1, 2025

Walkthrough

This pull request introduces enhanced image URL handling in the frontend and more robust image processing in the backend. It adds a new utility to sanitize URLs, ensuring proper redirection for local development. The image generation functions now use dynamically generated UUIDs and dynamically determined file extensions. Additionally, the Celery worker configuration is updated with new task queues, a liveness probe for heartbeat monitoring, and readiness checks, alongside several new scripts that launch and monitor dedicated workers.

Changes

File(s) Change Summary
echo/frontend/.../AspectCard.tsx, echo/frontend/.../ProjectLibraryAspect.tsx, echo/frontend/.../utils.ts Updated image sources to use sanitizeImageUrl for proper URL sanitization; added new utility function to transform URLs for local development.
echo/server/dembrane/image_utils.py, echo/server/dembrane/quote_utils.py Enhanced image generation: added UUID-based unique S3 paths, public flag, error handling, and dynamic file extension extraction.
echo/server/dembrane/tasks.py, echo/server/dembrane/tasks_config.py Introduced LivenessProbe for worker heartbeat monitoring, added signal handlers for readiness, and revised task decorators to use a dedicated cpu queue with updated Celery configurations.
echo/server/prod-worker-cpu.sh, echo/server/prod-worker-liveness.py, echo/server/prod-worker-readiness.py, echo/server/prod-worker.sh, echo/server/run-worker.sh Added and modified scripts to launch and monitor dedicated Celery workers (including a CPU-specific worker), and adjusted worker naming conventions.

Sequence Diagram(s)

sequenceDiagram
    participant Component as UI Component
    participant Util as sanitizeImageUrl
    Component->>Util: Call sanitizeImageUrl(rawUrl)
    alt URL starts with "http://minio:9000"
        Util-->>Component: Returns transformed URL ("http://localhost:9000...")
    else URL does not match
        Util-->>Component: Returns original URL
    end
    Component->>Component: Set <img> src with sanitized URL
Loading
sequenceDiagram
    participant Worker as Celery Worker
    participant Probe as LivenessProbe
    participant HB as Heartbeat File
    participant RD as Readiness File
    Worker->>Probe: Start()
    Probe->>HB: Create & update heartbeat file periodically
    Worker->>Probe: Emit worker_ready signal
    Probe->>RD: Create readiness file
    Note over Probe,Worker: Workers execute tasks...
    Worker->>Probe: Initiate Shutdown
    Probe->>HB: Stop heartbeat & remove file
    Probe->>RD: Remove readiness file
Loading

Possibly related PRs

  • Build/ci #73: The changes in the main PR, which involve the addition of the sanitizeImageUrl function and its usage in the AspectCard component, are related to the changes in the retrieved PR, as both involve the introduction and utilization of the same function for processing image URLs.

Poem

In our code, a spark ignites the night,
Images get cleansed with pristine might.
Workers pulse with heartbeats, true and bold,
Ready, alive—new stories told.
With each commit, our dreams take flight 🚀,
LGTM—code shining ever so bright!


📜 Recent review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between fa04395 and 54d8825.

📒 Files selected for processing (1)
  • echo/server/dembrane/tasks.py (8 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • echo/server/dembrane/tasks.py

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai plan to trigger planning for file edits and PR creation.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (3)
echo/server/prod-worker-liveness.py (1)

1-18: Bulletproof liveness check implementation! 💯

Killer implementation of the worker heartbeat checking. The 60-second threshold for staleness is a reasonable default. Using stat().st_mtime is the right approach for timestamp comparison.

One minor typo in your output message that should be fixed:

-    print("Celery Worker liveness file timestamp DOES NOT matches the given constraint.")
+    print("Celery Worker liveness file timestamp DOES NOT match the given constraint.")
echo/server/dembrane/tasks.py (2)

63-64: Unused argument “worker”.
stop(self, worker) doesn’t use worker. Consider removing or referencing it if you need worker context.

-def stop(self, worker):
+def stop(self, _worker):
🧰 Tools
🪛 Ruff (0.8.2)

63-63: Unused method argument: worker

(ARG002)


66-67: Unused argument “worker”.
Same note as above. Either rename to _worker or remove if it’s never utilized.

-def update_heartbeat_file(self, worker):
+def update_heartbeat_file(self, _worker):
🧰 Tools
🪛 Ruff (0.8.2)

66-66: Unused method argument: worker

(ARG002)

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 4196011 and fa04395.

📒 Files selected for processing (12)
  • echo/frontend/src/components/aspect/AspectCard.tsx (2 hunks)
  • echo/frontend/src/lib/utils.ts (1 hunks)
  • echo/frontend/src/routes/project/library/ProjectLibraryAspect.tsx (2 hunks)
  • echo/server/dembrane/image_utils.py (2 hunks)
  • echo/server/dembrane/quote_utils.py (1 hunks)
  • echo/server/dembrane/tasks.py (8 hunks)
  • echo/server/dembrane/tasks_config.py (1 hunks)
  • echo/server/prod-worker-cpu.sh (1 hunks)
  • echo/server/prod-worker-liveness.py (1 hunks)
  • echo/server/prod-worker-readiness.py (1 hunks)
  • echo/server/prod-worker.sh (1 hunks)
  • echo/server/run-worker.sh (1 hunks)
🧰 Additional context used
🧬 Code Definitions (4)
echo/frontend/src/routes/project/library/ProjectLibraryAspect.tsx (1)
echo/frontend/src/lib/utils.ts (1)
  • sanitizeImageUrl (37-43)
echo/server/dembrane/image_utils.py (2)
echo/server/dembrane/utils.py (1)
  • generate_uuid (13-14)
echo/server/dembrane/s3.py (1)
  • save_to_s3_from_url (84-114)
echo/server/dembrane/quote_utils.py (2)
echo/server/dembrane/s3.py (1)
  • save_to_s3_from_url (84-114)
echo/server/dembrane/utils.py (1)
  • generate_uuid (13-14)
echo/frontend/src/components/aspect/AspectCard.tsx (1)
echo/frontend/src/lib/utils.ts (1)
  • sanitizeImageUrl (37-43)
🪛 Ruff (0.8.2)
echo/server/dembrane/tasks.py

63-63: Unused method argument: worker

(ARG002)


66-66: Unused method argument: worker

(ARG002)

⏰ Context from checks skipped due to timeout of 90000ms (1)
  • GitHub Check: ci-check-server
🔇 Additional comments (30)
echo/frontend/src/lib/utils.ts (1)

37-43: Solid URL sanitization solution 🚀

This function elegantly handles the container-to-localhost redirection needed for local dev environments. Clean, focused implementation with proper typing.

echo/frontend/src/routes/project/library/ProjectLibraryAspect.tsx (2)

19-19: Clean import addition

Import looks good. Nice modular approach to leverage the utility function.


82-82: URL sanitization properly applied

Excellent implementation of the sanitization function while preserving the fallback placeholder. This ensures consistent image loading behavior across environments.

echo/frontend/src/components/aspect/AspectCard.tsx (2)

2-2: Import looks good

Clean import statement extension to include the sanitizeImageUrl utility.


50-50: URL sanitization properly applied

Good implementation mirroring the pattern used in ProjectLibraryAspect. Consistent approach across components.

echo/server/dembrane/image_utils.py (2)

5-5: UUID import added

Appropriate import to support the dynamic UUID generation for image paths.


119-119: Improved S3 storage pattern

Awesome enhancement to the image storage mechanism. Using a UUID in the path creates unique identifiers, preventing collisions, and the explicit public=True flag ensures proper ACL settings for image accessibility.

-image_url = save_to_s3_from_url(image_url)
+image_url = save_to_s3_from_url(image_url, "images/" + generate_uuid(), public=True)
echo/server/dembrane/quote_utils.py (2)

806-811: Solid error handling for extension extraction 🚀

Adding robust error handling for image extension extraction is a killer improvement. Using a try-except block to gracefully fall back to "png" when parsing fails is exactly what a production-ready system needs.


814-816: Sweet dynamic file naming convention 🔥

Love the dynamic approach to file naming using UUIDs and the extracted extension. This makes the S3 storage much more organized and the explicit public=True flag ensures proper accessibility.

echo/server/prod-worker.sh (1)

5-5: Worker name refactored, clean and consistent naming convention! 🚀

Love the standardized worker naming pattern - removing "normal" from the name while keeping the queue type separate is a solid architecture decision. Less redundancy in naming = more maintainable system. Matches perfectly with your new CPU worker implementation.

echo/server/prod-worker-cpu.sh (1)

1-5: Rock-solid CPU worker implementation! LGTM.

Excellent work creating a dedicated CPU worker script! The CPU-intensive task isolation pattern is exactly what we need for proper resource optimization. Script follows the same clean pattern as the normal worker, maintaining consistency across your worker fleet.

echo/server/prod-worker-readiness.py (1)

1-10: Clean k8s readiness probe implementation, nice!

Solid pythonic approach for the readiness probe. Using Path from pathlib is definitely the way to go for modern Python file operations - much cleaner than the old os.path approach. The script does exactly what it needs to do without overcomplicating things.

echo/server/run-worker.sh (1)

32-32: LGTM! Adding the CPU worker is consistent with the new CPU queue.
This is inline with the PR’s goal of segregating CPU-bound tasks onto a dedicated queue. Keep crushing it!

echo/server/dembrane/tasks_config.py (6)

6-6: Confirmed best practice for CPU-bound tasks.
Setting worker_prefetch_multiplier = 1 can help ensure tasks are distributed more fairly among workers, especially for CPU-intensive tasks. Looks good.


10-10: Queue for CPU-bound tasks looks legit.
Introducing "cpu" queue aligns with the new worker. Great separation of concerns.


17-17: Check for performance overhead when storing results.
Switching to task_ignore_result = False means task results are now stored. Awesome for debugging, but keep an eye on resource usage.


19-20: Late ack & requeue can prevent task loss.
task_acks_late = True and task_reject_on_worker_lost = True ensure tasks rerun if a worker crashes mid-execution. But as you know, it can create duplicates if partial work is done. Use it wisely.


22-26: Robust broker connection settings.
Reconnection and retry settings help keep tasks flowing in ephemeral or shaky network environments. Stay unstoppable.


27-30: Visibility timeout & socket keepalive optimize reliability.
Ensure that your tasks won’t vanish if a worker times out and that connections remain stable. Very SF 100x approach.

echo/server/dembrane/tasks.py (11)

3-3: Nice import expansions for liveness checks.
Pulling in Path, bootsteps, and the new signals sets up the readiness and liveness logic. LGTM.

Also applies to: 5-5, 7-7


40-45: Using local files for readiness & heartbeat checks.
Writing these files to /tmp is handy. Double-check that ephemeral storage meets your container orchestration environment’s needs.


47-62: The LivenessProbe is a killer addition.
Bootsteps with a timer is a very clean approach to track the worker’s heartbeat. Perfect for monitoring.


88-88: Seamless addition of LivenessProbe to worker steps.
Brilliant. The worker will adopt these readiness and liveness checks automatically.


91-93: Creating readiness file on worker startup.
That’s exactly how you make sure orchestration can detect readiness instantly.


96-98: Cleaning up readiness file on shutdown.
Super thorough. Ensures no stale readiness indicators remain.


153-153: Propagating ignore_result for chunk transcription.
Allowing the chunk transcription tasks to forgo storing results is often wise for large-volume workloads. Nice.


177-177: Routing to “cpu” queue.
Smart to isolate audio-splitting on the CPU queue if it’s heavier processing.


389-389: Targeting “cpu” queue for insight initialization.
Insights often involve heavy-lifting computations. Good call.


494-494: Aspect centroid assigned on CPU queue.
Centroid calculations can be resource-intensive. Good planning.


512-512: Clustering quotes on dedicated CPU queue.
Perfect for heavy clustering logic. Nicely done.

@spashii spashii merged commit 159cbcd into main Apr 1, 2025
7 checks passed
@spashii spashii deleted the fix/20250401-0 branch October 30, 2025 12:03
spashii added a commit that referenced this pull request Nov 18, 2025
* Implement image URL sanitization and add worker readiness/liveness checks

- Introduced `sanitizeImageUrl` function to ensure proper handling of image URLs during local development.
- Updated `AspectCard` and `ProjectLibraryAspect` components to utilize the new sanitization function for image URLs.
- Added scripts for worker liveness and readiness checks to enhance monitoring and reliability of Celery workers.
- Modified task configurations to include a new CPU queue and improved task acknowledgment settings.

* Refactor liveness probe task methods for clarity

- Updated method parameters in the LivenessProbe class to use underscore-prefixed names for unused variables, improving code readability.
- Added type ignore comments for Celery signal imports to suppress type checking warnings.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant