Separated Statistics [6/7ish]#5924
Conversation
This makes the rows 'completed' so that the stats regenerator need not touch them.
Signed-off-by: Olivier Wilkinson (reivilibre) <olivier@librepush.net>
| current_token = self._get_max_stream_id_in_current_state_deltas_txn(txn) | ||
| self._update_stats_delta_txn( | ||
| txn, now, "user", user_id, {}, complete_with_stream_id=current_token | ||
| ) |
There was a problem hiding this comment.
My concern here is that registration may happen on a different worker than the stats loop. Can we not just insert a new row for a user if we haven't seen them before?
There was a problem hiding this comment.
It will already do that – but this empty delta here is required to mark the row as completed so the stats regenerator doesn't pick it up and we can start collecting historical stats.
There was a problem hiding this comment.
Hmm, can we do the reverse and mark all existing users as needing stats regen?
There was a problem hiding this comment.
That sounds like it could be painful to do. Much like we mark a room as completed on either its stats regen or receiving its creation event (witnessing its creation), the plan was to do the same here — only mark as complete when we witness the user's creation or do a stats regen on that user.
What issue do you see with this current approach?
There was a problem hiding this comment.
It should just be a case of adding a bg update that adds all existing users to the table with completed set to false?
The issue is that now you're writing to the stats table from multiple places and multiple workers, which is probably fine but means that logic is no longer "this gets updated in one place and that's during the processing loop".
There was a problem hiding this comment.
That can be done, but it needs consideration — I'll look into it in a bit.
Writes to the stats table will already happen in multiple places (after all, the stats regenerator writes to it too). The upsert logic must ensure that everything is kept consistent -- indeed, I have been writing with this in mind, so if it doesn't do that, then that's an issue that should be addressed itself.
Assuming that something is complete, just because it's the first time the incremental processor has seen it, could easily lead to a bug if stats regenerations are performed.
There was a problem hiding this comment.
I was imagining that the stats re-generator ran before we started the normal writing?
There was a problem hiding this comment.
The stats regenerator will run at the same time as the normal writing because blocking normal writing on its completion would lead to a big debt that must be dealt with painfully.
This is why we have 'complete' and 'incomplete' current stats rows.
There was a problem hiding this comment.
Hmm, OK. Well I guess its fine for now and then we can see how the regen works in a later PR.
|
Sytest failure is: |
|
That perplexed me for a short while, but then I realised it's because Thanks for SyTesting it for me :) |
Signed-off-by: Olivier Wilkinson (reivilibre) <olivier@librepush.net>
Track new users in user statistics.
This makes the rows 'completed' so that the stats regenerator need not
touch them.