fix: Fix a couple of unit test timeout problems when run concurrently with coverage by jason-famedly · Pull Request #39 · famedly/synapse

jason-famedly · 2025-05-21T10:29:33Z

No description provided.

codecov · 2025-05-21T10:44:04Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 78.66%. Comparing base (d9a117c) to head (e9d8966).
Report is 1 commits behind head on master.

Additional details and impacted files

@@            Coverage Diff             @@
##           master      #39      +/-   ##
==========================================
- Coverage   78.66%   78.66%   -0.01%     
==========================================
  Files         472      472              
  Lines       67661    67661              
  Branches    10419    10419              
==========================================
- Hits        53227    53223       -4     
- Misses      10918    10921       +3     
- Partials     3516     3517       +1

see 5 files with indirect coverage changes

Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update d9a117c...e9d8966. Read the comment docs.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

jason-famedly · 2025-05-21T13:03:40Z

Traceback in Github CI

Traceback (most recent call last):
  File "/home/runner/work/synapse/synapse/tests/handlers/test_worker_lock.py", line 61, in test_lock_contention
    self.assertEqual(self.get_success(d), nb_locks)
  File "/home/runner/work/synapse/synapse/tests/unittest.py", line 692, in get_success
    self.pump(by=by)
  File "/home/runner/work/synapse/synapse/tests/unittest.py", line 688, in pump
    self.reactor.pump([by] * 100)
  File "/home/runner/.cache/pypoetry/virtualenvs/matrix-synapse-pswDeSvb-py3.13/lib/python3.13/site-packages/twisted/internet/task.py", line 838, in pump
    self.advance(amount)
  File "/home/runner/work/synapse/synapse/tests/server.py", line 642, in advance
    super().advance(0)
  File "/home/runner/.cache/pypoetry/virtualenvs/matrix-synapse-pswDeSvb-py3.13/lib/python3.13/site-packages/twisted/internet/task.py", line 826, in advance
    self._sortCalls()
  File "/home/runner/.cache/pypoetry/virtualenvs/matrix-synapse-pswDeSvb-py3.13/lib/python3.13/site-packages/twisted/internet/task.py", line 790, in _sortCalls
    self.calls.sort(key=lambda a: a.getTime())
  File "/home/runner/.cache/pypoetry/virtualenvs/matrix-synapse-pswDeSvb-py3.13/lib/python3.13/site-packages/twisted/internet/task.py", line 790, in <lambda>
    self.calls.sort(key=lambda a: a.getTime())
  File "/home/runner/work/synapse/synapse/tests/utils.py", line 424, in handle_timeout
    raise TestTimeout(self.error_message)
tests.utils.TestTimeout: Test timed out after 5s

tests.handlers.test_worker_lock.WorkerLockTestCase.test_lock_contention

jason-famedly · 2025-05-21T13:05:21Z

(one example of) Traceback on my machine with twisted runners at -j48 (my machine is 8/16). I was sometimes successful with -j36

[FAIL]
Traceback (most recent call last):
  File "/home/jason/dev/famedly/synapse/tests/handlers/test_worker_lock.py", line 61, in test_lock_contention
    self.assertEqual(self.get_success(d), nb_locks)
  File "/home/jason/dev/famedly/synapse/tests/unittest.py", line 693, in get_success
    return self.successResultOf(deferred)
  File "/home/jason/.cache/pypoetry/virtualenvs/matrix-synapse-XNW4s4O2-py3.13/lib/python3.13/site-packages/twisted/trial/_synctest.py", line 732, in successResultOf
    self.fail(
twisted.trial.unittest.FailTest: Success result expected on <Deferred at 0x78cfe16fd450 current result: None>, found failure result instead:
Traceback (most recent call last):
  File "/home/jason/.cache/pypoetry/virtualenvs/matrix-synapse-XNW4s4O2-py3.13/lib/python3.13/site-packages/twisted/internet/defer.py", line 873, in callback
    self._startRunCallbacks(result)
  File "/home/jason/.cache/pypoetry/virtualenvs/matrix-synapse-XNW4s4O2-py3.13/lib/python3.13/site-packages/twisted/internet/defer.py", line 980, in _startRunCallbacks
    self._runCallbacks()
  File "/home/jason/.cache/pypoetry/virtualenvs/matrix-synapse-XNW4s4O2-py3.13/lib/python3.13/site-packages/twisted/internet/defer.py", line 1074, in _runCallbacks
    current.result = callback(  # type: ignore[misc]
  File "/home/jason/.cache/pypoetry/virtualenvs/matrix-synapse-XNW4s4O2-py3.13/lib/python3.13/site-packages/twisted/internet/defer.py", line 1960, in _gotResultInlineCallbacks
    _inlineCallbacks(r, gen, status, context)
--- <exception caught here> ---
  File "/home/jason/.cache/pypoetry/virtualenvs/matrix-synapse-XNW4s4O2-py3.13/lib/python3.13/site-packages/twisted/internet/defer.py", line 2014, in _inlineCallbacks
    result = context.run(gen.send, result)
  File "/home/jason/dev/famedly/synapse/tests/handlers/test_worker_lock.py", line 71, in _take_locks
    async with lock:
  File "/home/jason/dev/famedly/synapse/synapse/handlers/worker_lock.py", line 229, in __aenter__
    lock = await self.store.try_acquire_lock(
  File "/home/jason/dev/famedly/synapse/synapse/storage/databases/main/lock.py", line 135, in try_acquire_lock
    return await self._try_acquire_lock(lock_name, lock_key)
  File "/home/jason/dev/famedly/synapse/synapse/storage/databases/main/lock.py", line 188, in _try_acquire_lock
    did_lock = await self.db_pool.runInteraction(
  File "/home/jason/dev/famedly/synapse/synapse/storage/database.py", line 953, in runInteraction
    return await delay_cancellation(_runInteraction())
  File "/home/jason/dev/famedly/synapse/synapse/util/async_helpers.py", line 892, in delay_cancellation
    def delay_cancellation(awaitable: Awaitable[T]) -> Awaitable[T]:
  File "/home/jason/dev/famedly/synapse/tests/utils.py", line 424, in handle_timeout
    raise TestTimeout(self.error_message)
tests.utils.TestTimeout: Test timed out after 5s


tests.handlers.test_worker_lock.WorkerLockTestCase.test_lock_contention
===============================================================================
[ERROR]
Traceback (most recent call last):
  File "/home/jason/.cache/pypoetry/virtualenvs/matrix-synapse-XNW4s4O2-py3.13/lib/python3.13/site-packages/twisted/trial/_asynctest.py", line 209, in deferRunCleanups
    yield func(*args, **kwargs)
  File "/home/jason/dev/famedly/synapse/tests/test_utils/__init__.py", line 88, in cleanup
    raise exc
  File "/usr/lib/python3.13/warnings.py", line 692, in _warn_unawaited_coroutine
    warn(msg, category=RuntimeWarning, stacklevel=2, source=coro)
builtins.RuntimeWarning: coroutine 'DatabasePool.runInteraction.<locals>._runInteraction' was never awaited

tests.handlers.test_worker_lock.WorkerLockTestCase.test_lock_contention
------------------------------------------------------------------------

(note: other runs at -j36 however would collapse the entire sqlite database with missing tables, so 🤷‍♂️ )

jason-famedly · 2025-05-21T13:40:16Z

Largely my conclusion is that this is what was reported to me: a concurrency problem. I see two solutions: increase the timeout for that particular test or reduce the number of jobs being run in Github CI.

Based on some(really loose) math, it looks like a timeout of 30 seconds allows all the database calls to write the lock and then handle all the sorted deferreds and database reads in the main reactor threadpool(see reactor.callFromThread())

EDIT: Found the pull request this came from: element-hq/synapse#16840
Evidently, this test used to take 30 second to complete. Even with my timeout adjustment it seems to take no more than 2 seconds locally. I imagine it has to do with the call_after adjustments that were made being the improvement, and then using the signal module to judge time passing. Well you don't want to do that in unit tests where time is not real, this isn't a benchmark

jason-famedly · 2025-05-21T16:39:16Z

Looks like other than the worker lock tests, there was another test for redactions that was using 'real time' to judge for it's timeout. Since the unit tests don't operate on real time, but on the fake reactor time, that can clash when under heavier concurrency. Swapping it to the fake time on the test reactor seems to sort that out.

…ran concurrently

Contains a partial revert of #39 where in test_lock_contention() of test_worker_locks.py the timeout of the test was bumped to 15 seconds to allow testing with 36 trial runner jobs. I just will not run 36 jobs any more instead

jason-famedly force-pushed the jason/speedup-tests-fix branch from 4606d06 to a34507e Compare May 21, 2025 13:39

jason-famedly force-pushed the jason/speedup-tests-fix branch 4 times, most recently from 34a7928 to 1b7b96e Compare May 21, 2025 16:20

jason-famedly changed the title ~~Try and fix the unit test failures~~ fix: Fix a couple of unit test timeout problems when run concurrently with coverage May 22, 2025

jason-famedly marked this pull request as ready for review May 22, 2025 10:00

jason-famedly requested a review from a team as a code owner May 22, 2025 10:00

nico-famedly reviewed May 22, 2025

View reviewed changes

Comment thread .ci/scripts/calculate_jobs.py Outdated

jason-famedly force-pushed the jason/speedup-tests-fix branch 3 times, most recently from b3b690b to dfa1b0f Compare May 22, 2025 18:43

nico-famedly approved these changes May 23, 2025

View reviewed changes

fix: Adjust some tests to account for higher than expected load when …

e9d8966

…ran concurrently

jason-famedly force-pushed the jason/speedup-tests-fix branch from dfa1b0f to e9d8966 Compare May 23, 2025 14:12

jason-famedly merged commit e9d8966 into master May 23, 2025
9 checks passed

jason-famedly added a commit that referenced this pull request Jul 17, 2025

Maintain timeout increase for test_lock_contention() from #39

8fe99b0

jason-famedly deleted the jason/speedup-tests-fix branch August 8, 2025 10:12

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: Fix a couple of unit test timeout problems when run concurrently with coverage#39

fix: Fix a couple of unit test timeout problems when run concurrently with coverage#39
jason-famedly merged 1 commit into
masterfrom
jason/speedup-tests-fix

jason-famedly commented May 21, 2025

Uh oh!

codecov Bot commented May 21, 2025 •

edited

Loading

Uh oh!

jason-famedly commented May 21, 2025

Uh oh!

jason-famedly commented May 21, 2025 •

edited

Loading

Uh oh!

jason-famedly commented May 21, 2025 •

edited

Loading

Uh oh!

jason-famedly commented May 21, 2025

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

jason-famedly commented May 21, 2025

Uh oh!

codecov Bot commented May 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

jason-famedly commented May 21, 2025

Uh oh!

jason-famedly commented May 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jason-famedly commented May 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jason-famedly commented May 21, 2025

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

codecov Bot commented May 21, 2025 •

edited

Loading

jason-famedly commented May 21, 2025 •

edited

Loading

jason-famedly commented May 21, 2025 •

edited

Loading