Skip to content

fix: Fix a couple of unit test timeout problems when run concurrently with coverage#39

Merged
jason-famedly merged 1 commit into
masterfrom
jason/speedup-tests-fix
May 23, 2025
Merged

fix: Fix a couple of unit test timeout problems when run concurrently with coverage#39
jason-famedly merged 1 commit into
masterfrom
jason/speedup-tests-fix

Conversation

@jason-famedly
Copy link
Copy Markdown
Member

No description provided.

@codecov
Copy link
Copy Markdown

codecov Bot commented May 21, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 78.66%. Comparing base (d9a117c) to head (e9d8966).
Report is 1 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master      #39      +/-   ##
==========================================
- Coverage   78.66%   78.66%   -0.01%     
==========================================
  Files         472      472              
  Lines       67661    67661              
  Branches    10419    10419              
==========================================
- Hits        53227    53223       -4     
- Misses      10918    10921       +3     
- Partials     3516     3517       +1     

see 5 files with indirect coverage changes


Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update d9a117c...e9d8966. Read the comment docs.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@jason-famedly
Copy link
Copy Markdown
Member Author

Traceback in Github CI

Traceback (most recent call last):
  File "/home/runner/work/synapse/synapse/tests/handlers/test_worker_lock.py", line 61, in test_lock_contention
    self.assertEqual(self.get_success(d), nb_locks)
  File "/home/runner/work/synapse/synapse/tests/unittest.py", line 692, in get_success
    self.pump(by=by)
  File "/home/runner/work/synapse/synapse/tests/unittest.py", line 688, in pump
    self.reactor.pump([by] * 100)
  File "/home/runner/.cache/pypoetry/virtualenvs/matrix-synapse-pswDeSvb-py3.13/lib/python3.13/site-packages/twisted/internet/task.py", line 838, in pump
    self.advance(amount)
  File "/home/runner/work/synapse/synapse/tests/server.py", line 642, in advance
    super().advance(0)
  File "/home/runner/.cache/pypoetry/virtualenvs/matrix-synapse-pswDeSvb-py3.13/lib/python3.13/site-packages/twisted/internet/task.py", line 826, in advance
    self._sortCalls()
  File "/home/runner/.cache/pypoetry/virtualenvs/matrix-synapse-pswDeSvb-py3.13/lib/python3.13/site-packages/twisted/internet/task.py", line 790, in _sortCalls
    self.calls.sort(key=lambda a: a.getTime())
  File "/home/runner/.cache/pypoetry/virtualenvs/matrix-synapse-pswDeSvb-py3.13/lib/python3.13/site-packages/twisted/internet/task.py", line 790, in <lambda>
    self.calls.sort(key=lambda a: a.getTime())
  File "/home/runner/work/synapse/synapse/tests/utils.py", line 424, in handle_timeout
    raise TestTimeout(self.error_message)
tests.utils.TestTimeout: Test timed out after 5s

tests.handlers.test_worker_lock.WorkerLockTestCase.test_lock_contention

@jason-famedly
Copy link
Copy Markdown
Member Author

jason-famedly commented May 21, 2025

(one example of) Traceback on my machine with twisted runners at -j48 (my machine is 8/16). I was sometimes successful with -j36

[FAIL]
Traceback (most recent call last):
  File "/home/jason/dev/famedly/synapse/tests/handlers/test_worker_lock.py", line 61, in test_lock_contention
    self.assertEqual(self.get_success(d), nb_locks)
  File "/home/jason/dev/famedly/synapse/tests/unittest.py", line 693, in get_success
    return self.successResultOf(deferred)
  File "/home/jason/.cache/pypoetry/virtualenvs/matrix-synapse-XNW4s4O2-py3.13/lib/python3.13/site-packages/twisted/trial/_synctest.py", line 732, in successResultOf
    self.fail(
twisted.trial.unittest.FailTest: Success result expected on <Deferred at 0x78cfe16fd450 current result: None>, found failure result instead:
Traceback (most recent call last):
  File "/home/jason/.cache/pypoetry/virtualenvs/matrix-synapse-XNW4s4O2-py3.13/lib/python3.13/site-packages/twisted/internet/defer.py", line 873, in callback
    self._startRunCallbacks(result)
  File "/home/jason/.cache/pypoetry/virtualenvs/matrix-synapse-XNW4s4O2-py3.13/lib/python3.13/site-packages/twisted/internet/defer.py", line 980, in _startRunCallbacks
    self._runCallbacks()
  File "/home/jason/.cache/pypoetry/virtualenvs/matrix-synapse-XNW4s4O2-py3.13/lib/python3.13/site-packages/twisted/internet/defer.py", line 1074, in _runCallbacks
    current.result = callback(  # type: ignore[misc]
  File "/home/jason/.cache/pypoetry/virtualenvs/matrix-synapse-XNW4s4O2-py3.13/lib/python3.13/site-packages/twisted/internet/defer.py", line 1960, in _gotResultInlineCallbacks
    _inlineCallbacks(r, gen, status, context)
--- <exception caught here> ---
  File "/home/jason/.cache/pypoetry/virtualenvs/matrix-synapse-XNW4s4O2-py3.13/lib/python3.13/site-packages/twisted/internet/defer.py", line 2014, in _inlineCallbacks
    result = context.run(gen.send, result)
  File "/home/jason/dev/famedly/synapse/tests/handlers/test_worker_lock.py", line 71, in _take_locks
    async with lock:
  File "/home/jason/dev/famedly/synapse/synapse/handlers/worker_lock.py", line 229, in __aenter__
    lock = await self.store.try_acquire_lock(
  File "/home/jason/dev/famedly/synapse/synapse/storage/databases/main/lock.py", line 135, in try_acquire_lock
    return await self._try_acquire_lock(lock_name, lock_key)
  File "/home/jason/dev/famedly/synapse/synapse/storage/databases/main/lock.py", line 188, in _try_acquire_lock
    did_lock = await self.db_pool.runInteraction(
  File "/home/jason/dev/famedly/synapse/synapse/storage/database.py", line 953, in runInteraction
    return await delay_cancellation(_runInteraction())
  File "/home/jason/dev/famedly/synapse/synapse/util/async_helpers.py", line 892, in delay_cancellation
    def delay_cancellation(awaitable: Awaitable[T]) -> Awaitable[T]:
  File "/home/jason/dev/famedly/synapse/tests/utils.py", line 424, in handle_timeout
    raise TestTimeout(self.error_message)
tests.utils.TestTimeout: Test timed out after 5s


tests.handlers.test_worker_lock.WorkerLockTestCase.test_lock_contention
===============================================================================
[ERROR]
Traceback (most recent call last):
  File "/home/jason/.cache/pypoetry/virtualenvs/matrix-synapse-XNW4s4O2-py3.13/lib/python3.13/site-packages/twisted/trial/_asynctest.py", line 209, in deferRunCleanups
    yield func(*args, **kwargs)
  File "/home/jason/dev/famedly/synapse/tests/test_utils/__init__.py", line 88, in cleanup
    raise exc
  File "/usr/lib/python3.13/warnings.py", line 692, in _warn_unawaited_coroutine
    warn(msg, category=RuntimeWarning, stacklevel=2, source=coro)
builtins.RuntimeWarning: coroutine 'DatabasePool.runInteraction.<locals>._runInteraction' was never awaited

tests.handlers.test_worker_lock.WorkerLockTestCase.test_lock_contention
------------------------------------------------------------------------

(note: other runs at -j36 however would collapse the entire sqlite database with missing tables, so 🤷‍♂️ )

@jason-famedly jason-famedly force-pushed the jason/speedup-tests-fix branch from 4606d06 to a34507e Compare May 21, 2025 13:39
@jason-famedly
Copy link
Copy Markdown
Member Author

jason-famedly commented May 21, 2025

Largely my conclusion is that this is what was reported to me: a concurrency problem. I see two solutions: increase the timeout for that particular test or reduce the number of jobs being run in Github CI.

Based on some(really loose) math, it looks like a timeout of 30 seconds allows all the database calls to write the lock and then handle all the sorted deferreds and database reads in the main reactor threadpool(see reactor.callFromThread())

EDIT: Found the pull request this came from: element-hq/synapse#16840
Evidently, this test used to take 30 second to complete. Even with my timeout adjustment it seems to take no more than 2 seconds locally. I imagine it has to do with the call_after adjustments that were made being the improvement, and then using the signal module to judge time passing. Well you don't want to do that in unit tests where time is not real, this isn't a benchmark

@jason-famedly jason-famedly force-pushed the jason/speedup-tests-fix branch 4 times, most recently from 34a7928 to 1b7b96e Compare May 21, 2025 16:20
@jason-famedly
Copy link
Copy Markdown
Member Author

Looks like other than the worker lock tests, there was another test for redactions that was using 'real time' to judge for it's timeout. Since the unit tests don't operate on real time, but on the fake reactor time, that can clash when under heavier concurrency. Swapping it to the fake time on the test reactor seems to sort that out.

@jason-famedly jason-famedly changed the title Try and fix the unit test failures fix: Fix a couple of unit test timeout problems when run concurrently with coverage May 22, 2025
@jason-famedly jason-famedly marked this pull request as ready for review May 22, 2025 10:00
@jason-famedly jason-famedly requested a review from a team as a code owner May 22, 2025 10:00
Comment thread .ci/scripts/calculate_jobs.py Outdated
@jason-famedly jason-famedly force-pushed the jason/speedup-tests-fix branch 3 times, most recently from b3b690b to dfa1b0f Compare May 22, 2025 18:43
@jason-famedly jason-famedly force-pushed the jason/speedup-tests-fix branch from dfa1b0f to e9d8966 Compare May 23, 2025 14:12
@jason-famedly jason-famedly merged commit e9d8966 into master May 23, 2025
9 checks passed
jason-famedly added a commit that referenced this pull request Jul 17, 2025
Contains a partial revert of #39 where in test_lock_contention() of
test_worker_locks.py the timeout of the test was bumped to 15 seconds to
allow testing with 36 trial runner jobs. I just will not run 36 jobs any
more instead
@jason-famedly jason-famedly deleted the jason/speedup-tests-fix branch August 8, 2025 10:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants