Skip to content

feat(ui): batch worker failure notifications#372

Merged
geodro merged 1 commit into
mainfrom
feat/batch-worker-failure-notifications
May 18, 2026
Merged

feat(ui): batch worker failure notifications#372
geodro merged 1 commit into
mainfrom
feat/batch-worker-failure-notifications

Conversation

@geodro
Copy link
Copy Markdown
Owner

@geodro geodro commented May 18, 2026

Summary

A systemd cascade like start-limit-hit hitting half a dozen queue workers at once, or an OOM kill taking out every worker for a site in the same tick, used to fire one push per unit and spam the user with five or six near-identical "Worker failed on …" notifications back to back. The watcher detects new failures every five seconds, so the bursts were tightly clustered and unavoidable on the wearer side.

The watcher now hands its new-failure delta to a small batcher that buffers by unit and arms a single five-second flush timer. Failures arriving later in the same window join the in-flight batch without resetting the timer, so the grouped push lands at most five seconds after the first failure even under a sustained cascade. A single isolated failure still goes out with the existing per-unit shape (deep link to the affected site, per-unit tag for browser dedupe), while two or more collapse into one "N workers failed" payload listing every worker@site and tagged lerd-workers-group so a later grouped push supersedes the earlier one in the notification tray instead of stacking.

A systemd cascade like start-limit-hit hitting half a dozen queue workers at once, or an OOM kill taking out every worker for a site in the same tick, used to fire one push per unit and spam the user with five or six near-identical "Worker failed on …" notifications back to back. The watcher detects new failures every five seconds, so the bursts were tightly clustered and unavoidable on the wearer side.

The watcher now hands its new-failure delta to a small batcher that buffers by unit and arms a single five-second flush timer. Failures arriving later in the same window join the in-flight batch without resetting the timer, so the grouped push lands at most five seconds after the first failure even under a sustained cascade. A single isolated failure still goes out with the existing per-unit shape (deep link to the affected site, per-unit tag for browser dedupe), while two or more collapse into one "N workers failed" payload listing every worker@site and tagged lerd-workers-group so a later grouped push supersedes the earlier one in the notification tray instead of stacking.

Added matching notify_worker_failed_group_title and _body keys to all eight locale message files, with the Turkish strings translated and the rest falling back to English the same way the existing worker_failed entries do.
@geodro geodro merged commit c9b7e0a into main May 18, 2026
3 checks passed
@geodro geodro mentioned this pull request May 18, 2026
geodro added a commit that referenced this pull request May 18, 2026
First beta of the 1.21.0 line. The headline is desktop notifications via Web Push (#353), with a per-category settings page polished alongside a dashboard health row (#354). The PHP-FPM image grows a real shell environment, zsh plus starship plus eza, bat, fzf, zoxide, isolated from the host (#358), then loses around 800 MB of build toolchain in a multi-stage split that drops the image from 1.36 GB to 535 MB without losing any of its 68 PHP modules (#364). A new on-demand commands feature surfaces one-shot framework actions across the dashboard, the lerd run CLI, the command palette, and four new MCP tools, all backed by a generalised Dropdown component that replaces every native select in the UI (#363). The site detail header gets a browser-style address bar with the favicon, TLS lock, LAN-share chip, and worktrees promoted from a dropdown to tabs (#365), an Env tab joins Overview, Tinker, and Dumps to show the project .env verbatim (#366), and the tray menu picks up Dump bridge and Notifications toggles that update live via a new KindDumpsStatus event (#373). Postgres grows 17 and 18 alternates alongside a new MySQL 9.7 LTS line, all gated by a canonical-version pin so flipping the yaml canonical no longer silently major-jumps existing installs (#361). Türkçe joins the dashboard languages (#355), a public_dir override lands in .lerd.yaml for projects with a non-standard document root (#370), every git invocation in the tree now flows through internal/git (#356), and worker-failure pushes are batched so a systemd cascade no longer fires six near-identical notifications back to back (#372). Plus the post-1.20.2 fix queue covers the worktree-manager button rendering on non-git sites (#357), TLS certs not refreshing when a secured site's domain set changed (#367), streamed worktree install and a wave of audit follow-ups (#368), and tinker swallowing bare-expression results when the dump bridge was on (#371).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant