Skip to content

Prevent updating display name causing time outs on large accounts#17074

Closed
neilisfragile wants to merge 9 commits into
developfrom
neilj/speed_up_profile_updates
Closed

Prevent updating display name causing time outs on large accounts#17074
neilisfragile wants to merge 9 commits into
developfrom
neilj/speed_up_profile_updates

Conversation

@neilisfragile
Copy link
Copy Markdown
Member

@neilisfragile neilisfragile commented Apr 11, 2024

Fixes #1297

Pull Request Checklist

  • Pull request is based on the develop branch
  • Pull request includes a changelog file. The entry should:
    • Be a short description of your change which makes sense to users. "Fixed a bug that prevented receiving messages from other servers." instead of "Moved X method from EventStore to EventWorkerStore.".
    • Use markdown where necessary, mostly for code blocks.
    • End with either a period (.) or an exclamation mark (!).
    • Start with a capital letter.
    • Feel free to credit yourself, by adding a sentence "Contributed by @github_username." or "Contributed by [Your Name]." to the end of the entry.
  • Code style is correct
    (run the linters)

@CLAassistant
Copy link
Copy Markdown

CLAassistant commented Apr 11, 2024

CLA assistant check
All committers have signed the CLA.

@neilisfragile neilisfragile force-pushed the neilj/speed_up_profile_updates branch from 1c058ff to aa147ed Compare August 28, 2025 21:20
# This allows us to verify that set_displayname doesn't block on room updates
from twisted.internet.defer import ensureDeferred

displayname_deferred = ensureDeferred(
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this the correct way to test that the method returns immediately? Using onSuccess caused the reactor to tick forwards.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem is "immediately" is relative. We expect to return after a little bit of work of scheduling a background task but not all of the work of updating all of the rooms like it did before.

Currently, this doesn't do what you want and we don't even wait for the work to be scheduled.

What we should probably do is just self.get_success(self.handler.set_displayname(...)) as we do expect this function to return before doing any work, check to see that no rooms have updated yet, then self.pump() until we see some updates.

Also see tests/util/test_task_scheduler.py and tests/rest/admin/test_room.py where we wait for the task scheduler to complete things.

if deactivation:
# During deactivation, run profile updates synchronously to ensure
# they complete before room forgetting logic runs
await self._update_join_states_direct(requester, target_user)
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I need to do this because deactivate account expects profiles and avatars to have been removed before deactivation. I could not see another way to make the flow wait for the background profile updates to complete, so opted for two versions of the update method.

Is there a better way?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To make this more clear, we could have a separate update_in_background argument.

(better name welcome)

@neilisfragile
Copy link
Copy Markdown
Member Author

PR now ready for review, there are two specific points I would like some feedback on, I have left comments inline.

Disclaimer: I used Claude Code in a consultative fashion to generate parts of this PR.

@neilisfragile neilisfragile marked this pull request as ready for review August 29, 2025 09:16
@neilisfragile neilisfragile requested a review from a team as a code owner August 29, 2025 09:16
LEFT JOIN receipts_linearized rl ON (
rl.room_id = cse.room_id
AND rl.user_id = ?
AND rl.receipt_type = 'm.read'
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should probably take into account the other read receipt types, m.read.private and m.fully_read

Both seem applicable for sorting by last activity

LEFT JOIN receipts_linearized rl ON (
rl.room_id = cse.room_id
AND rl.user_id = ?
AND rl.receipt_type = 'm.read'
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
AND rl.receipt_type = 'm.read'
AND rl.receipt_type = ?

To avoid typos, we can pass in our constants (ReceiptTypes.READ) as args

Comment on lines +790 to +792
rl.room_id = cse.room_id
AND rl.user_id = ?
AND rl.receipt_type = 'm.read'
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To confirm, we do have some indexes that I think will cover this (receipts_linearized_unique_index/receipts_linearized_uniqueness_thread):

$ psql synapse
> \d+ receipts_linearized
[...]
  Indexes:
      "receipts_linearized_event_id" btree (room_id, event_id)
      "receipts_linearized_id" btree (stream_id)
      "receipts_linearized_room_stream" btree (room_id, stream_id)
      "receipts_linearized_unique_index" UNIQUE, btree (room_id, receipt_type, user_id) WHERE thread_id IS NULL
      "receipts_linearized_uniqueness_thread" UNIQUE CONSTRAINT, btree (room_id, receipt_type, user_id, thread_id)
      "receipts_linearized_user" btree (user_id)

)
WHERE cse.type = 'm.room.member'
AND cse.membership = 'join'
AND cse.state_key = ?
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
AND cse.state_key = ?

I don't think we need to check state_key since membership will be the the source of truth here.

You can see an example where we don't check it here:

async def get_users_in_room(self, room_id: str) -> Sequence[str]:
"""Returns a list of users in the room.
Will return inaccurate results for rooms with partial state, since the state for
the forward extremities of those rooms will exclude most members. We may also
calculate room state incorrectly for such rooms and believe that a member is or
is not in the room when the opposite is true.
Note: If you only care about users in the room local to the homeserver, use
`get_local_users_in_room(...)` instead which will be more performant.
"""
return await self.db_pool.simple_select_onecol(
table="current_state_events",
keyvalues={
"type": EventTypes.Member,
"room_id": room_id,
"membership": Membership.JOIN,
},
retcol="state_key",
desc="get_users_in_room",
)

And here is where we insert the data which uses the room_memberships table to populate current_state_events

# We include the membership in the current state table, hence we do
# a lookup when we insert. This assumes that all events have already
# been inserted into room_memberships.
txn.execute_batch(
"""INSERT INTO current_state_events
(room_id, type, state_key, event_id, membership, event_stream_ordering)
VALUES (
?, ?, ?, ?,
(SELECT membership FROM room_memberships WHERE event_id = ?),
(SELECT stream_ordering FROM events WHERE event_id = ?)
)
""",
[
(room_id, key[0], key[1], ev_id, ev_id, ev_id)
for key, ev_id in to_insert.items()
],
)

AND rl.receipt_type = 'm.read'
)
WHERE cse.type = 'm.room.member'
AND cse.membership = 'join'
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
AND cse.membership = 'join'
AND cse.membership = ?

Membership.JOIN

Comment on lines +651 to +652
Iterates through all rooms where the target user is currently joined and
updates their membership event to reflect the new profile information.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This uses a lot of words to say pretty much nothing more than above.

Comment on lines +662 to +667
Implementation Details:
- Rooms are processed in read receipt order (most recently viewed first)
- Each membership update is treated as a "join" event with new profile data
- Rate limiting is disabled for these updates to hide that they're not atomic
- Failures in individual rooms are logged but don't affect other rooms
- Assumes target_user is not a guest (guests can't set profile data)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The spec says they can:

The following API endpoints are allowed to be accessed by guest accounts for their own account maintenance:

[...]

-- https://spec.matrix.org/v1.14/client-server-api/#guest-access

Comment on lines +669 to +670
Note:
This stomps over any custom display name or avatar URL in member events.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The note is weird to have below the description above

if deactivation:
# During deactivation, run profile updates synchronously to ensure
# they complete before room forgetting logic runs
await self._update_join_states_direct(requester, target_user)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To make this more clear, we could have a separate update_in_background argument.

(better name welcome)

@@ -236,7 +252,18 @@ async def set_displayname(
)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are so many arguments for this function and multiple of the same type right next to each other that could easily be confused if you write them without keywords.

We should force keywords on set_displayname and set_avatar_url with the *. See example here

@anoadragon453
Copy link
Copy Markdown
Member

Superseded by #19311.

@neilisfragile
Copy link
Copy Markdown
Member Author

Well that will teach me to sit on a PR for months!

@MadLittleMods
Copy link
Copy Markdown
Contributor

@neilisfragile You could still open a new PR with the sorting changes you made.

Here is how I described the benefit:

Since we're now processing rooms in the order of last read receipt, it will magically feel better because the rooms that the user looks at will be updated and hide the fact that we're still working on things in the background.

-- @MadLittleMods, #17074 (comment)

@anoadragon453
Copy link
Copy Markdown
Member

@neilisfragile You could still open a new PR with the sorting changes you made.

This will be slightly tricky as the new background task processes rooms by alphabetical ordering of room ID:

room_ids = sorted(await self.store.get_rooms_for_user(target_user.to_string()))
last_room_id = task.result.get("last_room_id", None) if task.result else None
if last_room_id:
# Filter out room IDs that have already been handled
# by finding the first room ID greater than the last handled room ID
# and slicing the list from that point onwards.
room_ids = room_ids[bisect_right(room_ids, last_room_id) :]

This was chosen to allow easily resuming the work after a crash/pause in the background task process, and is relatively stable (other than rooms that are joined partway-through).

In contrast, ordering by latest activity is far less stable. Doing so may require storing the ordering of rooms in a temporary table at the beginning of the profile update in order to allow resumption after a homeserver crash/restart.

@neilisfragile
Copy link
Copy Markdown
Member Author

okay, I'll wait for it to land on matrix.org and then try it with my account (as a pretty large one) to get a sense for the perceived experience.

Another approach could just be to take the top 20 (say), apply that, then work through the main list and filter those 20 as you come from them. If there is a restart, then the worst that happens is that the top 20 records are lost and the rename is applied twice (and is probably a no-op)

@anoadragon453
Copy link
Copy Markdown
Member

@neilisfragile the change has landed on matrix.org as of 13:00 UK today. Please give it a try and see what you think!

Another approach could just be to take the top 20 (say), apply that, then work through the main list and filter those 20 as you come from them. If there is a restart, then the worst that happens is that the top 20 records are lost and the rename is applied twice (and is probably a no-op)

Yes, that sounds like it could work.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Changing displayname is catastrophically slow. (SYN-311)

4 participants