Limit to-device EDU sizes by erikjohnston · Pull Request #19617 · element-hq/synapse

erikjohnston · 2026-03-27T15:45:04Z

This is based on #18416, which got reverted (#19614) due to it incorrectly rejecting to-device messages to users with many devices (and thus breaking message sending).

Fix #17035

A to-device message content looks like:

{
  "@user:domain": {"device1": {...}, "device2": {...}},
  ...
}

The previous PR would split up into multiple EDUs, each with a subset of the users. However, if one user's entry was too large it would not further split it up and then error out.

The main change in this PR is to allow splitting up a single user into multiple EDUs.

Other changes:

Rename to SOFT_MAX_EDU_SIZE to indicate that we sometimes send EDUs with larger size than that, and its more a target than a hard limit.
Check early if any to-device message (to a specific device) is too large to send, even if we're not going to send it over federation. This ensures that we catch issues where clients try to send too large to-device.

This still means that if a client send a large individual to-device message it will fail, but I don't believe we ever send such large to-device messages (normally they're in the range of a few KB).

I ended up changing the implementation a bunch to make it easy to reuse the code to split up dictionaries. Instead of repeatedly splitting up the EDU until each bit fits into the size, we instead record the size of each entry in the dict and instead split up based on cumulative size. This means we call encode_canonical_json on each entry rather than once on the entire struct, but its not significantly slower to do so.

--

cc @MatMaul @MadLittleMods

If a set of messages exceeds this limit, the messages are split across several EDUs. Fix #17035 (should) There is currently [no official specced limit for EDUs](matrix-org/matrix-spec#807), but the consensus seems to be that it would be useful to have one to avoid this bug by bounding the transaction size. As a side effect it also limits the size of a single to-device message to a bit less than 65536. This should probably be added to the spec similarly to the [message size limit.](https://spec.matrix.org/v1.14/client-server-api/#size-limits) Spec PR: matrix-org/matrix-spec#2340 --------- Co-authored-by: mcalinghee <mcalinghee.dev@gmail.com> Co-authored-by: Eric Eastwood <madlittlemods@gmail.com>

This is based on #18416, which got reverted due to it rejecting to-device messages to users with many devices. The main change here is that if a to-device EDU for a single user is too large, then we split it up into multiple EDUs.

It's not that much more costly to just keep re-encoding the values.

Copilot

Pull request overview

This PR updates Synapse’s handling of outbound to-device messages so that outgoing m.direct_to_device EDUs are kept to a target size by splitting payloads more granularly (including splitting a single recipient’s devices across multiple EDUs), while also rejecting individual to-device messages that are too large.

Changes:

Introduce a generic dict-splitting helper (split_dict_to_fit_to_size) and unit tests for its behavior.
Refactor to-device sending to split remote to-device payloads across multiple EDUs (including per-user device splitting) and queue them separately for federation.
Add constants for EDU size targets / transaction EDU budgeting and expand tests to cover splitting and size-limit failures.

Reviewed changes

Copilot reviewed 9 out of 9 changed files in this pull request and generated 5 comments.

Show a summary per file

File	Description
tests/util/test_split_dict.py	Adds unit tests for the new dict-splitting utility.
tests/rest/client/test_sendtodevice.py	Adds tests for to-device size rejection and EDU/transaction splitting behavior.
tests/handlers/test_appservice.py	Updates test setup to use the renamed datastore method for local to-device inbox seeding.
synapse/util/init.py	Adds `split_dict_to_fit_to_size` and helper state/size-calculation logic.
synapse/storage/databases/main/deviceinbox.py	Splits the old “add to-device inbox” method into separate local vs remote queueing APIs.
synapse/handlers/devicemessage.py	Enforces per-message size checking and implements splitting remote to-device messages into multiple EDUs.
synapse/federation/sender/per_destination_queue.py	Uses constants for EDU transaction limits and reserved EDU slots.
synapse/api/constants.py	Introduces `SOFT_MAX_EDU_SIZE` and transaction EDU constants.
changelog.d/19617.bugfix	Documents the bugfix around large to-device queues blocking outbound federation.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-04-29T12:13:32Z

+    ):
+        # The returned subset might be larger than the soft max size if it
+        # contains a single entry that is larger than the soft max size.
+        if estimated_size <= SOFT_MAX_EDU_SIZE - base_edu_size:


estimated_size returned by split_dict_to_fit_to_size(..., wrapping_object_size=base_edu_size) already includes base_edu_size, so comparing it to SOFT_MAX_EDU_SIZE - base_edu_size double-subtracts the header and will force unnecessary splitting (and can incorrectly treat fitting subsets as oversized). Compare estimated_size directly to SOFT_MAX_EDU_SIZE (or adjust what estimated_size represents) to make the fit/overflow decision consistent.

Suggested change

if estimated_size <= SOFT_MAX_EDU_SIZE - base_edu_size:

if estimated_size <= SOFT_MAX_EDU_SIZE:

I suspect you would need a unit test which tries to send a message that's just under SOFT_MAX_EDU_SIZE in order to stress this edge case.

Copilot · 2026-04-29T12:13:33Z

+def split_dict_to_fit_to_size(
+    original_dict: dict[str, Any],
+    *,
+    soft_max_size: int,
+    wrapping_object_size: int = 2,
+) -> Iterator[tuple[dict[str, JsonDict], int]]:
+    """Splits a dict up into a list of dicts, each of which is small enough to


The return type of split_dict_to_fit_to_size is too specific: it currently declares Iterator[tuple[dict[str, JsonDict], int]], but this helper is used (and tested) with non-dict values (e.g. strings) and should preserve the original value type. Consider making it generic over the value type (TypeVar) and returning dict[str, V] (or dict[str, Any]) to avoid mypy errors and misleading annotations.

Copilot · 2026-04-29T12:13:33Z

+    for key, payload in original_dict.items():
+        current_payload.subset[key] = payload
+        current_size = _len_with_wrapping_object(
+            current_payload.subset, wrapping_object_size
+        )


split_dict_to_fit_to_size recomputes encode_canonical_json(current_payload.subset) on every iteration as the subset grows, making the algorithm O(n^2) in the number of entries. For large to-device payloads (many devices/users), this can become a noticeable CPU cost; consider precomputing per-entry sizes once and tracking cumulative size (including commas/quotes/keys) to split without repeatedly re-encoding the whole subset.

The PR description mentions calling encode_canonical_json on each entry. But Copilot is correct; we are technically calling encode_canonical_json multiple times on each entry in current_payload.subset as we build it up.

I presume you'd have to store the length for each entry in subset to avoid this.

Copilot · 2026-04-29T12:13:33Z

+import logging
+from typing import TypeVar
+
+from canonicaljson import encode_canonical_json
+
+from synapse.util import split_dict_to_fit_to_size
+
+from tests.unittest import TestCase
+
+logger = logging.getLogger(__name__)
+
+T = TypeVar("T")
+
+


This new test module has unused symbols (logging, logger, and the TypeVar T). With the repo's ruff/pyflakes configuration, these will raise unused-import/unused-variable violations and fail CI; please remove them or use them.

Suggested change

import logging

from typing import TypeVar

from canonicaljson import encode_canonical_json

from synapse.util import split_dict_to_fit_to_size

from tests.unittest import TestCase

logger = logging.getLogger(__name__)

T = TypeVar("T")

from canonicaljson import encode_canonical_json

from synapse.util import split_dict_to_fit_to_size

from tests.unittest import TestCase

And yet all our linting passed 🤔

Copilot · 2026-04-29T12:13:34Z

+    This function takes many to-device messages and fits/splits them into several EDUs
+    as necessary. We split the messages up as the overall request can overrun the
+    `max_request_body_size` and prevent outbound federation traffic because of the size
+    of the transaction (cf. `MAX_EDU_SIZE`).


The docstring references MAX_EDU_SIZE, but the constant in use is now SOFT_MAX_EDU_SIZE (and MAX_EDU_SIZE doesn't appear to exist anymore). Updating the reference will avoid confusion for future readers.

Suggested change

of the transaction (cf. `MAX_EDU_SIZE`).

of the transaction (cf. `SOFT_MAX_EDU_SIZE`).

anoadragon453

Overall looks great. Just some minor things below.

anoadragon453 · 2026-04-29T14:24:22Z

+                    # https://github.com/matrix-org/matrix-spec/pull/2340 tracks adding
+                    # this to the spec
+                    raise EventSizeError(
+                        f"To-device message for {user_id}:{device_id} is too large to send",


Could we add the computed message size to the log line to make this slightly more useful for debugging?

anoadragon453 · 2026-04-29T14:32:36Z

+        original_dict: The dict to split. soft_max_size: The maximum size of
+            each dict when encoded as JSON.


Suggested change

original_dict: The dict to split. soft_max_size: The maximum size of

each dict when encoded as JSON.

original_dict: The dict to split.

soft_max_size: The maximum size of each dict when encoded as JSON.

anoadragon453 · 2026-04-29T14:33:55Z

+def split_dict_to_fit_to_size(
+    original_dict: dict[str, Any],
+    *,
+    soft_max_size: int,
+    wrapping_object_size: int = 2,
+) -> Iterator[tuple[dict[str, JsonDict], int]]:


We should note that this is a generator in the docstring.

anoadragon453 · 2026-04-29T14:40:26Z

+    for key, payload in original_dict.items():
+        current_payload.subset[key] = payload
+        current_size = _len_with_wrapping_object(
+            current_payload.subset, wrapping_object_size
+        )


The PR description mentions calling encode_canonical_json on each entry. But Copilot is correct; we are technically calling encode_canonical_json multiple times on each entry in current_payload.subset as we build it up.

I presume you'd have to store the length for each entry in subset to avoid this.

anoadragon453 · 2026-04-29T15:46:16Z

+    ):
+        # The returned subset might be larger than the soft max size if it
+        # contains a single entry that is larger than the soft max size.
+        if estimated_size <= SOFT_MAX_EDU_SIZE - base_edu_size:


I suspect you would need a unit test which tries to send a message that's just under SOFT_MAX_EDU_SIZE in order to stress this edge case.

anoadragon453 · 2026-04-29T15:47:37Z

+import logging
+from typing import TypeVar
+
+from canonicaljson import encode_canonical_json
+
+from synapse.util import split_dict_to_fit_to_size
+
+from tests.unittest import TestCase
+
+logger = logging.getLogger(__name__)
+
+T = TypeVar("T")
+
+


And yet all our linting passed 🤔

MatMaul and others added 3 commits March 27, 2026 12:01

Limit to-device EDU sizes

b98769a

This is based on #18416, which got reverted due to it rejecting to-device messages to users with many devices. The main change here is that if a to-device EDU for a single user is too large, then we split it up into multiple EDUs.

Newsfile

8c5b122

MadLittleMods added the A-Federation label Mar 27, 2026

MadLittleMods reviewed Mar 27, 2026

View reviewed changes

Comment thread synapse/handlers/devicemessage.py Outdated

Comment thread changelog.d/19617.bugfix

erikjohnston added 2 commits March 31, 2026 11:57

Avoid, where possible, using magic values to calculate json size

7afc60e

It's not that much more costly to just keep re-encoding the values.

Add fast path

ded7a66

MadLittleMods reviewed Mar 31, 2026

View reviewed changes

Comment thread synapse/handlers/devicemessage.py Outdated

Comment thread synapse/handlers/devicemessage.py

erikjohnston added 2 commits April 1, 2026 11:40

Add note about 413 not specced

32134a5

split_device_messages doesn't raise an error

448a17a

erikjohnston marked this pull request as ready for review April 1, 2026 10:55

erikjohnston requested a review from a team as a code owner April 1, 2026 10:55

anoadragon453 requested a review from Copilot April 29, 2026 12:07

Copilot started reviewing on behalf of anoadragon453 April 29, 2026 12:07 View session

Copilot AI reviewed Apr 29, 2026

View reviewed changes

anoadragon453 approved these changes Apr 29, 2026

View reviewed changes

MadLittleMods mentioned this pull request May 11, 2026

A long queue of to-device messages can prevent outgoing federation working #17035

Open

	if estimated_size <= SOFT_MAX_EDU_SIZE - base_edu_size:
	if estimated_size <= SOFT_MAX_EDU_SIZE:

	of the transaction (cf. `MAX_EDU_SIZE`).
	of the transaction (cf. `SOFT_MAX_EDU_SIZE`).

		original_dict: The dict to split. soft_max_size: The maximum size of
		each dict when encoded as JSON.

Conversation

erikjohnston commented Mar 27, 2026 • edited by MadLittleMods Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Apr 29, 2026

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 29, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 29, 2026

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 29, 2026

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 29, 2026

Choose a reason for hiding this comment

Uh oh!

anoadragon453 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

erikjohnston commented Mar 27, 2026 •

edited by MadLittleMods

Loading