Skip to content
This repository was archived by the owner on Apr 26, 2024. It is now read-only.

Separated Statistics [6/7ish]#5924

Merged
reivilibre merged 7 commits into
rei/rss_targetfrom
rei/rss_inc6
Aug 30, 2019
Merged

Separated Statistics [6/7ish]#5924
reivilibre merged 7 commits into
rei/rss_targetfrom
rei/rss_inc6

Conversation

@reivilibre
Copy link
Copy Markdown
Contributor

Track new users in user statistics.

This makes the rows 'completed' so that the stats regenerator need not
touch them.

@reivilibre reivilibre requested a review from a team August 28, 2019 19:19
Signed-off-by: Olivier Wilkinson (reivilibre) <olivier@librepush.net>
current_token = self._get_max_stream_id_in_current_state_deltas_txn(txn)
self._update_stats_delta_txn(
txn, now, "user", user_id, {}, complete_with_stream_id=current_token
)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My concern here is that registration may happen on a different worker than the stats loop. Can we not just insert a new row for a user if we haven't seen them before?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It will already do that – but this empty delta here is required to mark the row as completed so the stats regenerator doesn't pick it up and we can start collecting historical stats.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, can we do the reverse and mark all existing users as needing stats regen?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That sounds like it could be painful to do. Much like we mark a room as completed on either its stats regen or receiving its creation event (witnessing its creation), the plan was to do the same here — only mark as complete when we witness the user's creation or do a stats regen on that user.

What issue do you see with this current approach?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should just be a case of adding a bg update that adds all existing users to the table with completed set to false?

The issue is that now you're writing to the stats table from multiple places and multiple workers, which is probably fine but means that logic is no longer "this gets updated in one place and that's during the processing loop".

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That can be done, but it needs consideration — I'll look into it in a bit.

Writes to the stats table will already happen in multiple places (after all, the stats regenerator writes to it too). The upsert logic must ensure that everything is kept consistent ­-- indeed, I have been writing with this in mind, so if it doesn't do that, then that's an issue that should be addressed itself.

Assuming that something is complete, just because it's the first time the incremental processor has seen it, could easily lead to a bug if stats regenerations are performed.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was imagining that the stats re-generator ran before we started the normal writing?

Copy link
Copy Markdown
Contributor Author

@reivilibre reivilibre Aug 29, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The stats regenerator will run at the same time as the normal writing because blocking normal writing on its completion would lead to a big debt that must be dealt with painfully.

This is why we have 'complete' and 'incomplete' current stats rows.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, OK. Well I guess its fine for now and then we can see how the regen works in a later PR.

@erikjohnston
Copy link
Copy Markdown
Member

Sytest failure is:

2019-08-30 13:12:10,056 - synapse.http.server - 108 - ERROR - POST-962 - Failed handle request via 'RegisterRestServlet': <SynapseRequest at 0x7f227cbce9b0 method='POST' uri='/_matrix/client/r0/register' clientproto='HTTP/1.1' site=8800>
Traceback (most recent call last):
  File "/src/synapse/http/server.py", line 76, in wrapped_request_handler
    await h(self, request)
  File "/src/synapse/http/server.py", line 315, in _async_render
    callback_return = await callback_return
  File "/venv/lib/python3.5/site-packages/twisted/internet/defer.py", line 654, in _runCallbacks
    current.result = callback(current.result, *args, **kw)
  File "/src/synapse/rest/client/v2_alpha/_base.py", line 94, in _catch_incomplete_interactive_auth
    f.trap(InteractiveAuthIncompleteError)
  File "/venv/lib/python3.5/site-packages/twisted/python/failure.py", line 460, in trap
    self.raiseException()
  File "/venv/lib/python3.5/site-packages/twisted/python/failure.py", line 488, in raiseException
    raise self.value.with_traceback(self.tb)
  File "/venv/lib/python3.5/site-packages/twisted/internet/defer.py", line 1416, in _inlineCallbacks
    result = result.throwExceptionIntoGenerator(g)
  File "/venv/lib/python3.5/site-packages/twisted/python/failure.py", line 512, in throwExceptionIntoGenerator
    return g.throw(self.type, self.value, self.tb)
  File "/src/synapse/rest/client/v2_alpha/register.py", line 459, in on_POST
    address=client_addr,
  File "/venv/lib/python3.5/site-packages/twisted/internet/defer.py", line 1416, in _inlineCallbacks
    result = result.throwExceptionIntoGenerator(g)
  File "/venv/lib/python3.5/site-packages/twisted/python/failure.py", line 512, in throwExceptionIntoGenerator
    return g.throw(self.type, self.value, self.tb)
  File "/src/synapse/handlers/register.py", line 213, in register_user
    address=address,
  File "/venv/lib/python3.5/site-packages/twisted/internet/defer.py", line 1416, in _inlineCallbacks
    result = result.throwExceptionIntoGenerator(g)
  File "/venv/lib/python3.5/site-packages/twisted/python/failure.py", line 512, in throwExceptionIntoGenerator
    return g.throw(self.type, self.value, self.tb)
  File "/src/synapse/storage/_base.py", line 506, in runInteraction
    **kwargs
  File "/venv/lib/python3.5/site-packages/twisted/internet/defer.py", line 1416, in _inlineCallbacks
    result = result.throwExceptionIntoGenerator(g)
  File "/venv/lib/python3.5/site-packages/twisted/python/failure.py", line 512, in throwExceptionIntoGenerator
    return g.throw(self.type, self.value, self.tb)
  File "/src/synapse/storage/_base.py", line 554, in runWithConnection
    result = yield self._db_pool.runWithConnection(inner_func, *args, **kwargs)
  File "/venv/lib/python3.5/site-packages/twisted/python/threadpool.py", line 250, in inContext
    result = inContext.theWork()
  File "/venv/lib/python3.5/site-packages/twisted/python/threadpool.py", line 266, in <lambda>
    inContext.theWork = lambda: context.call(ctx, func, *args, **kw)
  File "/venv/lib/python3.5/site-packages/twisted/python/context.py", line 122, in callWithContext
    return self.currentContext().callWithContext(ctx, func, *args, **kw)
  File "/venv/lib/python3.5/site-packages/twisted/python/context.py", line 85, in callWithContext
    return func(*args,**kw)
  File "/venv/lib/python3.5/site-packages/twisted/enterprise/adbapi.py", line 306, in _runWithConnection
    compat.reraise(excValue, excTraceback)
  File "/venv/lib/python3.5/site-packages/twisted/python/compat.py", line 464, in reraise
    raise exception.with_traceback(traceback)
  File "/venv/lib/python3.5/site-packages/twisted/enterprise/adbapi.py", line 297, in _runWithConnection
    result = func(conn, *args, **kw)
  File "/src/synapse/storage/_base.py", line 551, in inner_func
    return func(conn, *args, **kwargs)
  File "/src/synapse/storage/_base.py", line 423, in _new_transaction
    r = func(txn, *args, **kwargs)
  File "/src/synapse/storage/registration.py", line 880, in _register_user
    txn, now, "user", user_id, {}, complete_with_stream_id=current_token
  File "/src/synapse/storage/stats.py", line 309, in _update_stats_delta_txn
    additive_relatives=deltas_of_absolute_fields,
  File "/src/synapse/storage/stats.py", line 393, in _upsert_with_additive_relatives_txn
    current_row[key] += val
KeyError: 'public_rooms'

@reivilibre
Copy link
Copy Markdown
Contributor Author

reivilibre commented Aug 30, 2019

That perplexed me for a short while, but then I realised it's because chain is only once-iterable:

>>> import itertools
>>> l1 = [1,2,3]
>>> l2 = [4,5,6]
>>> ls = itertools.chain(l1, l2)
>>> list(ls)
[1, 2, 3, 4, 5, 6]
>>> list(ls)
[]

Thanks for SyTesting it for me :)

Signed-off-by: Olivier Wilkinson (reivilibre) <olivier@librepush.net>
@reivilibre reivilibre merged commit 8c02602 into rei/rss_target Aug 30, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants