Move RestartDelayedEventServlet to workers#19207
Conversation
9511351 to
5677215
Compare
To allow delayed event timeouts to be extended even if the main process is currently restarting (for instance, during a pod resize).
5677215 to
d3eaadf
Compare
| "^/_matrix/client/(api/v1|r0|v3|unstable)/keys/upload", | ||
| "^/_matrix/client/(api/v1|r0|v3|unstable)/keys/device_signing/upload$", | ||
| "^/_matrix/client/(api/v1|r0|v3|unstable)/keys/signatures/upload$", | ||
| "^/_matrix/client/unstable/org.matrix.msc4140/delayed_events/.*/restart$", |
There was a problem hiding this comment.
This routes the endpoint to workers when running the relevant Complement tests.
| # Workers don't need to wait for delayed events queued in the DB to be | ||
| # sent as that will be handled by the main process. | ||
| # | ||
| # If a worker attempts to schedule a timer for the next delayed event | ||
| # while the main process is pulling delayed events from the DB, the two | ||
| # will not race, as the main process will mark the delayed events as | ||
| # processed in the DB immediately before sending them. | ||
| self._initialized_from_db = defer.succeed(None) |
There was a problem hiding this comment.
I think the changes here are okay, but I'm mostly taking you at your word that this comment is true.
Would you be able to provide a more thorough explanation / links to the relevant code which validates this?
There was a problem hiding this comment.
Ah sorry. See the code just below where self._initialized_from_db is set. I failed to realise that someone reading this from top-to-bottom wouldn't yet know what self._initialized_from_db was.
Essentially _initialized_from_db is a deferred that completes once all pending delayed events in the DB have been scheduled. The main process will manage that. The restart handler will update the timeout in the DB, so is not dependent
It's probably simpler if await make_deferred_yieldable(self._initialized_from_db) below is only called on the main process. I've made that visual change in 6f6227c.
We're effectively letting the restart handler code on workers and _schedule_db_events on the main process run in parallel. I had a think about whether this would result in a race condition.
The restart handler calls restart_delayed_event, which updates a delayed event's timeout as long as it hasn't been marked as processed (is_processed) yet.
is_processed means "sending of this event has begun". Once a delayed event has been successfully sent, it is deleted from the delayed_events table.
I don't believe there's a point between the queries in _schedule_db_events where running UPDATE delayed_events SET send_ts = ? + delay WHERE delay_id = ? AND NOT is_processed would be detrimental.
The only thing that stands out to me is between self._store.unprocess_delayed_events and self._store.process_timeout_delayed_events, when all events marked as "processed" in the DB are set to unprocessed again (because processing may have failed partway during homeserver shutdown). But in that case, the event in question will just have its send_ts successfully extended and self._store.process_timeout_delayed_events won't send it yet, which sounds like what the client intended anyhow.
And then given that, if the restart code can't race with the initialisation code anyhow, we could just remove await make_deferred_yieldable(self._initialized_from_db)fromrestart` altogether, even when running on the main process.
cc @AndrewFerr who originally wrote this code.
There was a problem hiding this comment.
The only thing that stands out to me is between
self._store.unprocess_delayed_eventsandself._store.process_timeout_delayed_events, when all events marked as "processed" in the DB are set to unprocessed again (because processing may have failed partway during homeserver shutdown). But in that case, the event in question will just have itssend_tssuccessfully extended andself._store.process_timeout_delayed_eventswon't send it yet, which sounds like what the client intended anyhow.
There is a slight amount of "destructiveness" here. The startup also handles sending delayed events that should have been sent while the homeserver was offline, i.e. delayed events with an effective send time in the past. Calling restart on such a delayed event during this startup would prevent it from being sent as it ought to be.
There was a problem hiding this comment.
Here's an idea: instead of the startup calling unprocess_delayed_events before process_timeout_delayed_events (so that the latter will retry "processed" sends that failed to persist), process_timeout_delayed_events could simply have a flag for whether it should skip over "processed" events or not (which the startup would set to "not skip").
The scenario in my last post is possible only because calling unprocess_delayed_events on startup makes "past" delayed events eligible targets of restart. This issue would go away if the startup were to retry past delayed events without having to alter the "processed" state of delayed events in the DB.
GET is the only method on `/delayed_events$`, so it's fine to put it into the above section. Thanks Ben BZ for the suggestion.
So that we test the GET `/delayed_events$` endpoint on workers as well.
Rework the code to be clearer. There should be no functional difference in this commit.
| if self._is_master: | ||
| # Wait for the processing of existing delayed events from the DB to | ||
| # complete before accepting any modifications. | ||
| await make_deferred_yieldable(self._initialized_from_db) | ||
|
|
||
| next_send_ts = await self._store.restart_delayed_event( |
There was a problem hiding this comment.
If you want, a worker handling restart may skip calling _schedule_next_at and instead only update the DB with the new send time of the restarted delayed event. The main process will notice the new send time on its next scheduled _send_on_timeout.
Currently, if a worker schedules calls of _send_on_timeout, those calls will often end up being scheduled at the same time as the main process' calls of it, i.e. the worker & main process will often both try to send the same delayed event. Only one of them will "win" at a time, so the work of actually sending a delayed event won't be repeated between them, but there isn't any explicit load balancing done to try to spread that work evenly. So maybe it's better to keep the main process as the sole manager of scheduled sends.
There was a problem hiding this comment.
That certainly sounds easier to reason about. I walked through an example to convince myself that this would still result in events being sent on time:
- User says they want to send an event at timestamp 10.
- At timestamp 8, the user pushes the send time to 20 on a worker.
- The main process still tries to send the event at 10. It's no longer there. Now schedules a timer for 20.
- Main process sends the event at timestamp 20.
Since we're always pushing the event forwards, and calling _send_on_timeout without any events to send only results in some small overhead, I see no issue with your proposal. Updated in 66f026e.
The main process won't miss sending the event. When the original timeout is hit, the main process will simply pick up the new timeout and schedule a timer for that moment instead.
per @AndrewFerr's suggestion in #19207 (comment)
devonh
left a comment
There was a problem hiding this comment.
Awesome, this looks a lot better now with easier to understand rationale for why we can rely on certain behaviour. :)
…ayed_events_restart_workers
The team has decided to deprecate and stop publishing python wheels for MacOS. Synapse docker images will continue to work on MacOS, as will building Synapse from source (though note this requires a Rust compiler). Admins using the unstable [MSC2666](matrix-org/matrix-spec-proposals#2666) endpoint (`/_matrix/client/unstable/uk.half-shot.msc2666/user/mutual_rooms`), please check [the relevant section in the upgrade notes](https://github.com/element-hq/synapse/blob/develop/docs/upgrade.md#upgrading-to-v11440) as this release contains changes that disable that endpoint by default. No significant changes since 1.144.0rc1. Admins using the unstable [MSC2666](matrix-org/matrix-spec-proposals#2666) endpoint (`/_matrix/client/unstable/uk.half-shot.msc2666/user/mutual_rooms`), please check [the relevant section in the upgrade notes](https://github.com/element-hq/synapse/blob/develop/docs/upgrade.md#upgrading-to-v11440) as this release contains changes that disable that endpoint by default. - Add experimentatal implememntation of [MSC4380](matrix-org/matrix-spec-proposals#4380) (invite blocking). ([\#19203](#19203)) - Allow restarting delayed event timeouts on workers. ([\#19207](#19207)) - Fix a bug in the database function for fetching state deltas that could result in unnecessarily long query times. ([\#18960](#18960)) - Fix v12 rooms when running with `use_frozen_dicts: True`. ([\#19235](#19235)) - Fix bug where invalid `canonical_alias` content would return 500 instead of 400. ([\#19240](#19240)) - Fix bug where `Duration` was logged incorrectly. ([\#19267](#19267)) - Document in the `--config-path` help how multiple files are merged - by merging them shallowly. ([\#19243](#19243)) - Stop building release wheels for MacOS. ([\#19225](#19225)) - Improve event filtering for Simplified Sliding Sync. ([\#17782](#17782)) - Export `SYNAPSE_SUPPORTED_COMPLEMENT_TEST_PACKAGES` environment variable from `scripts-dev/complement.sh`. ([\#19208](#19208)) - Refactor `scripts-dev/complement.sh` logic to avoid `exit` to facilitate being able to source it from other scripts (composable). ([\#19209](#19209)) - Expire sliding sync connections that are too old or have too much pending data. ([\#19211](#19211)) - Require an experimental feature flag to be enabled in order for the unstable [MSC2666](matrix-org/matrix-spec-proposals#2666) endpoint (`/_matrix/client/unstable/uk.half-shot.msc2666/user/mutual_rooms`) to be available. ([\#19219](#19219)) - Prevent changelog check CI running on @dependabot's PRs even when a human has modified the branch. ([\#19220](#19220)) - Auto-fix trailing spaces in multi-line strings and comments when running the lint script. ([\#19221](#19221)) - Move towards using a dedicated `Duration` type. ([\#19223](#19223), [\#19229](#19229)) - Improve robustness of the SQL schema linting in CI. ([\#19224](#19224)) - Add log to determine whether clients are using `/messages` as expected. ([\#19226](#19226)) - Simplify README and add ESS Getting started section. ([\#19228](#19228), [\#19259](#19259)) - Add a unit test for ensuring associated refresh tokens are erased when a device is deleted. ([\#19230](#19230)) - Prompt user to consider adding future deprecations to the changelog in release script. ([\#19239](#19239)) - Fix check of the Rust compiled code being outdated when using source checkout and `.egg-info`. ([\#19251](#19251)) - Stop building macos wheels in CI pipeline. ([\#19263](#19263)) * Bump Swatinem/rust-cache from 2.8.1 to 2.8.2. ([\#19244](#19244)) * Bump actions/checkout from 5.0.0 to 6.0.0. ([\#19213](#19213)) * Bump actions/setup-go from 6.0.0 to 6.1.0. ([\#19214](#19214)) * Bump actions/setup-python from 6.0.0 to 6.1.0. ([\#19245](#19245)) * Bump attrs from 25.3.0 to 25.4.0. ([\#19215](#19215)) * Bump docker/metadata-action from 5.9.0 to 5.10.0. ([\#19246](#19246)) * Bump http from 1.3.1 to 1.4.0. ([\#19249](#19249)) * Bump pydantic from 2.12.4 to 2.12.5. ([\#19250](#19250)) * Bump pyopenssl from 25.1.0 to 25.3.0. ([\#19248](#19248)) * Bump rpds-py from 0.28.0 to 0.29.0. ([\#19216](#19216)) * Bump rpds-py from 0.29.0 to 0.30.0. ([\#19247](#19247)) * Bump sentry-sdk from 2.44.0 to 2.46.0. ([\#19218](#19218)) * Bump types-bleach from 6.2.0.20250809 to 6.3.0.20251115. ([\#19217](#19217)) * Bump types-jsonschema from 4.25.1.20250822 to 4.25.1.20251009. ([\#19252](#19252))
To allow delayed event timeouts to be extended even if the main process is currently restarting (for instance, during a pod resize). Addresses the
/restartportion of #19141.Note that this does not move the older, general
POST /org.matrix.msc4140/delayed_events/{delayID}endpoint to workers. That endpoint had the caller specify{restart,send,cancel}as a JSON body parameter. As such, clients will need to ensure they're using/restartto get the benefit of worker request handling.Part of MSC4140
Pull Request Checklist
EventStoretoEventWorkerStore.".code blocks.