This repository was archived by the owner on Apr 26, 2024. It is now read-only.
-
-
Notifications
You must be signed in to change notification settings - Fork 2.1k
Separated Room Statistics #5847
Closed
Closed
Changes from all commits
Commits
Show all changes
30 commits
Select commit
Hold shift + click to select a range
259014b
Schema delta for separated statistics
reivilibre 5216299
Use `threading.Lock` to prevent concurrent incremental position updates
reivilibre 4cb921c
Fix type signature in `get_current_state_deltas`
reivilibre 5ca4cd5
Track more stats positions
reivilibre cf9f7ae
Introduce `total_events` tracking and rework statistics tracking.
reivilibre 4a45fb5
Adapt state delta handling to match the new interface
reivilibre a59da51
Update public room visibility change handler
reivilibre 1c9732d
Update incremental processor to use new interfaces and track total_ev…
reivilibre 7f2ec2e
Handle user registration and ensure they start accruing statistics
reivilibre 4a3fec1
Fix tests
reivilibre b384445
Introduce `get_room_state`; a way to get state for a single room
reivilibre 6923407
Remove obsolete functions for updating stats absolutely.
reivilibre 9ee50cc
Fix stats_separated SQL
reivilibre 9064f28
Add initial batch of stats tests
reivilibre d54ae71
Move back to `defer.inlineCallbacks` from `async` as it makes stats
reivilibre 314567d
Split out partial indices from theschema delta, thus supporting SQLite.
reivilibre 182cdcb
Docstrings in `storage.stats`.
reivilibre 1e0bd9a
Fix stats tests and their expectations of the number of events in fresh
reivilibre fd184f6
Fix generality of query
reivilibre 5b54411
Add SQLite support by working around missing syntax
reivilibre 20ec969
Fix issue with not selecting a needed column
reivilibre 16e2ffd
Initial room and user statistics documentation
reivilibre 96fa239
Fix up `stats_separated1.sql`
reivilibre 2da4b41
Remove obsolete function
reivilibre 0e6f700
Remove clean-up handler and replace with no-op as not currently needed.
reivilibre de6b266
Linting
reivilibre f9f551f
Clarify docstrings in `storage.stats`
reivilibre 95a3025
Newsfile
reivilibre 703f9ff
Merge branch 'develop' into rei/room_stats_separated
reivilibre c964677
Remove non-ASCII-representable characters to fix py35-old tests.
reivilibre File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,2 @@ | ||
| Rework room and user statistics to separate current & historical rows, as well | ||
| as track stats correctly. | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,146 @@ | ||
| Room and User Statistics | ||
| ======================== | ||
|
|
||
| Synapse maintains room and user statistics (as well as a cache of room state), | ||
| in various tables. | ||
|
|
||
| These can be used for administrative purposes but are also used when generating | ||
| the public room directory. If these tables get stale or out of sync (possibly | ||
| after database corruption), you may wish to regenerate them. | ||
|
|
||
|
|
||
| # Synapse Administrator Documentation | ||
|
|
||
| ## Various SQL scripts that you may find useful | ||
|
|
||
| ### Delete stats, including historical stats | ||
|
|
||
| ```sql | ||
| DELETE FROM room_stats_current; | ||
| DELETE FROM room_stats_historical; | ||
| DELETE FROM user_stats_current; | ||
| DELETE FROM user_stats_historical; | ||
| ``` | ||
|
|
||
| ### Regenerate stats (all subjects) | ||
|
|
||
| ```sql | ||
| BEGIN; | ||
| DELETE FROM stats_incremental_position; | ||
| INSERT INTO stats_incremental_position ( | ||
| state_delta_stream_id, | ||
| total_events_min_stream_ordering, | ||
| total_events_max_stream_ordering, | ||
| is_background_contract | ||
| ) VALUES (NULL, NULL, NULL, FALSE), (NULL, NULL, NULL, TRUE); | ||
| COMMIT; | ||
|
|
||
| DELETE FROM room_stats_current; | ||
| DELETE FROM user_stats_current; | ||
| ``` | ||
|
|
||
| then follow the steps below for **'Regenerate stats (missing subjects only)'** | ||
|
|
||
| ### Regenerate stats (missing subjects only) | ||
|
|
||
| ```sql | ||
| -- Set up staging tables | ||
| -- we depend on current_state_events_membership because this is used | ||
| -- in our counting. | ||
| INSERT INTO background_updates (update_name, progress_json) VALUES | ||
| ('populate_stats_prepare', '{}', 'current_state_events_membership'); | ||
|
|
||
| -- Run through each room and update stats | ||
| INSERT INTO background_updates (update_name, progress_json, depends_on) VALUES | ||
| ('populate_stats_process_rooms', '{}', 'populate_stats_prepare'); | ||
|
|
||
| -- Run through each user and update stats. | ||
| INSERT INTO background_updates (update_name, progress_json, depends_on) VALUES | ||
| ('populate_stats_process_users', '{}', 'populate_stats_process_rooms'); | ||
|
|
||
| -- Clean up staging tables | ||
| INSERT INTO background_updates (update_name, progress_json, depends_on) VALUES | ||
| ('populate_stats_cleanup', '{}', 'populate_stats_process_users'); | ||
| ``` | ||
|
|
||
| then **restart Synapse**. | ||
|
|
||
|
|
||
| # Synapse Developer Documentation | ||
|
|
||
| ## High-Level Concepts | ||
|
|
||
| ### Definitions | ||
|
|
||
| * **subject**: Something we are tracking stats about – currently a room or user. | ||
| * **current row**: An entry for a subject in the appropriate current statistics | ||
| table. Each subject can have only one. | ||
| * **historical row**: An entry for a subject in the appropriate historical | ||
| statistics table. Each subject can have any number of these. | ||
|
|
||
| ### Overview | ||
|
|
||
| Stats are maintained as time series. There are two kinds of column: | ||
|
|
||
| * absolute columns – where the value is correct for the time given by `end_ts` | ||
| in the stats row. (Imagine a line graph for these values) | ||
| * per-slice columns – where the value corresponds to how many of the occurrences | ||
| occurred within the time slice given by `(end_ts − bucket_size)…end_ts` | ||
| or `start_ts…end_ts`. (Imagine a histogram for these values) | ||
|
|
||
| Currently, only absolute columns are in use. | ||
|
|
||
| Stats are maintained in two tables (for each type): current and historical. | ||
|
|
||
| Current stats correspond to the present values. Each subject can only have one | ||
| entry. | ||
|
|
||
| Historical stats correspond to values in the past. Subjects may have multiple | ||
| entries. | ||
|
|
||
| ## Concepts around the management of stats | ||
|
|
||
| ### current rows | ||
|
|
||
| #### dirty current rows | ||
|
|
||
| Current rows can be **dirty**, which means that they have changed since the | ||
| latest historical row for the same subject. | ||
| **Dirty** current rows possess an end timestamp, `end_ts`. | ||
|
|
||
| #### old current rows and old collection | ||
|
|
||
| When a (necessarily dirty) current row has an `end_ts` in the past, it is said | ||
| to be **old**. | ||
| Old current rows must be copied into a historical row, and cleared of their dirty | ||
| status, before further statistics can be tracked for that subject. | ||
| The process which does this is referred to as **old collection**. | ||
|
|
||
| #### incomplete current rows | ||
|
|
||
| There are also **incomplete** current rows, which are current rows that do not | ||
| contain a full count yet – this is because they are waiting for the regeneration | ||
| process to give them an initial count. Incomplete current rows DO NOT contain | ||
| correct and up-to-date values. As such, *incomplete rows are not old-collected*. | ||
| Instead, old incomplete rows will be extended so they are no longer old. | ||
|
|
||
| ### historical rows | ||
|
|
||
| Historical rows can always be considered to be valid for the time slice and | ||
| end time specified. (This, of course, assumes a lack of defects in the code | ||
| to track the statistics, and assumes integrity of the database). | ||
|
|
||
| Even still, there are two considerations that we may need to bear in mind: | ||
|
|
||
| * historical rows will not exist for every time slice – they will be omitted | ||
| if there were no changes. In this case, the following assumptions can be | ||
| made to interpolate/recreate missing rows: | ||
| - absolute fields have the same values as in the preceding row | ||
| - per-slice fields are zero (`0`) | ||
| * historical rows will not be retained forever – rows older than a configurable | ||
| time will be purged. | ||
|
|
||
| #### purge | ||
|
|
||
| The purging of historical rows is not yet implemented. | ||
|
|
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
one line for this please, btw