Skip to content

Conversation

@allrob23
Copy link
Contributor

Scope and purpose

Fixes #12439

This PR improves the performance of the LogPublisher class by replacing the list used for storing observers with a set. This change optimizes the lookup operation from O(N) to O(1) when checking if an observer is already registered. The scope of this PR is limited to this performance improvement and does not introduce any behavioral changes to the class.

Contributor Checklist:

This process applies to all pull requests - no matter how small.
Have a look at our developer documentation before submitting your Pull Request.

Below is a non-exhaustive list (as a reminder):

  • The title of the PR should describe the changes and starts with the associated issue number, like “#9782 Remove twisted.news. #1234 Brief description”.
  • A release notes news fragment file was create in src/twisted/newsfragments/ (see: Release notes fragments docs.)
  • The automated tests were updated.
  • Once all checks are green, request a review by leaving a comment that contains exactly the string please review.
    Our bot will trigger the review process, by applying the pending review label
    and requesting a review from the Twisted dev team.

@codspeed-hq
Copy link

codspeed-hq bot commented Mar 25, 2025

CodSpeed Performance Report

Merging #12440 will degrade performances by 6.61%

Comparing allrob23:trunk (a4caa4b) with trunk (aedee3e)

Summary

⚡ 2 improvements
❌ 1 regressions
✅ 33 untouched benchmarks

⚠️ Please fix the performance issues or acknowledge them on CodSpeed.

Benchmarks breakdown

Benchmark BASE HEAD Change
test_log_publisher_add_remove 137.2 ms 2.4 ms ×57
test_log_publisher_call_dispatch 1.3 ms 1.4 ms -6.61%
test_http11_server_chunked_request 1.8 ms 1.8 ms +2.31%

Copy link
Member

@glyph glyph left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @allrob23 !

A couple of notes:

  • obviously the fact that this breaks the test suite needs to be addressed :)
  • please make a dedicated branch for this feature rather than using your fork's trunk branch for it
  • can you do this in 2 PRs, where the first adds a benchmark, and the second improves the benchmark? if this is a performance-oriented fix, given that we are lucky enough to have a performance regression suite on https://codspeed.io, we should use that suite to make sure it actually makes the difference that we think it does.

@glyph
Copy link
Member

glyph commented Mar 25, 2025

And, let's not forget the most important part: thank you for taking the time to contribute to Twisted.

@allrob23
Copy link
Contributor Author

Hi @glyph, do you have any insights on this test failure? Could it be a flaky test? It’s passing in my environment:

Test: test_trace
File: src/twisted/logger/test/test_observer.py

image

@adiroiban
Copy link
Member

adiroiban commented Mar 25, 2025

@allrob23 The twisted/twisted project is not using pytest to run the tests.

The tests are executed using twisted trial tool. Some docs is here

You can use trial directly or use tox

@glyph
Copy link
Member

glyph commented Mar 25, 2025

Running the test in isolation, I also see it pass, but only once: if I do trial -u twisted.logger.test.test_observer it quickly fails (on the second observer). see here:
image

glyph@gale:~/Projects/Twisted  🏰 Twisted  🔀 allrob23/trunk  🐚 🔑   🐳 🫛  ⮐
  ★ trial -u twisted.logger.test.test_observer
 ↩ Tue Mar 25 14:32:01 PDT 2025
Test Pass 1
twisted.logger.test.test_observer
  LogPublisherTests
    test_addObserver ...                                                   [OK]
    test_addObserverNotCallable ...                                        [OK]
    test_fanOut ...                                                        [OK]
    test_interface ...                                                     [OK]
    test_observerRaises ...                                                [OK]
    test_observerRaisesAndLoggerHatesMe ...                                [OK]
    test_observers ...                                                     [OK]
    test_removeObserver ...                                                [OK]
    test_removeObserverNotRegistered ...                                   [OK]
    test_trace ...                                                         [OK]

-------------------------------------------------------------------------------
Ran 10 tests in 0.002s

PASSED (successes=10)
Test Pass 2
twisted.logger.test.test_observer
  LogPublisherTests
    test_addObserver ...                                                   [OK]
    test_addObserverNotCallable ...                                        [OK]
    test_fanOut ...                                                        [OK]
    test_interface ...                                                     [OK]
    test_observerRaises ...                                                [OK]
    test_observerRaisesAndLoggerHatesMe ...                                [OK]
    test_observers ...                                                     [OK]
    test_removeObserver ...                                                [OK]
    test_removeObserverNotRegistered ...                                   [OK]
    test_trace ...                                                       [FAIL]

===============================================================================
[FAIL]
Traceback (most recent call last):
  File "/Users/glyph/Projects/Twisted/src/twisted/logger/test/test_observer.py", line 191, in test_trace
    self.assertEqual(traces[1], ((publisher, o1),))
  File "/Users/glyph/Projects/Twisted/src/twisted/trial/_synctest.py", line 444, in assertEqual
    super().assertEqual(first, second, msg)
  File "/Library/Frameworks/Python.framework/Versions/3.13/lib/python3.13/unittest/case.py", line 907, in assertEqual
    assertion_func(first, second, msg=msg)
  File "/Library/Frameworks/Python.framework/Versions/3.13/lib/python3.13/unittest/case.py", line 1124, in assertTupleEqual
    self.assertSequenceEqual(tuple1, tuple2, msg, seq_type=tuple)
  File "/Library/Frameworks/Python.framework/Versions/3.13/lib/python3.13/unittest/case.py", line 1095, in assertSequenceEqual
    self.fail(msg)
twisted.trial.unittest.FailTest: Tuples differ: ((<tw[104 chars]ls>.o2 at 0x104fe2980>), (<twisted.logger._obs[106 chars]60>)) != ((<tw[104 chars]ls>.o1 at 0x104fe2b60>),)

First differing element 0:
(<twi[63 chars]ction LogPublisherTests.test_trace.<locals>.o2 at 0x104fe2980>)
(<twi[63 chars]ction LogPublisherTests.test_trace.<locals>.o1 at 0x104fe2b60>)

First tuple contains 1 additional elements.
First extra element 1:
(<twisted.logger._observer.LogPublisher object at 0x104f7d9b0>, <function LogPublisherTests.test_trace.<locals>.o1 at 0x104fe2b60>)

  ((<twisted.logger._observer.LogPublisher object at 0x104f7d9b0>,
-   <function LogPublisherTests.test_trace.<locals>.o2 at 0x104fe2980>),
-  (<twisted.logger._observer.LogPublisher object at 0x104f7d9b0>,
-   <function LogPublisherTests.test_trace.<locals>.o1 at 0x104fe2b60>))
+   <function LogPublisherTests.test_trace.<locals>.o1 at 0x104fe2b60>),)
?                                                                      +


twisted.logger.test.test_observer.LogPublisherTests.test_trace
-------------------------------------------------------------------------------
Ran 10 tests in 0.002s

FAILED (failures=1, successes=9)

The behavior here is pretty predictable, which is that by moving observers into a set(), their previously deterministic order of execution is now non-deterministic. This seems like a potential problem for unit tests in other parts of Twisted as well; I don't think you can make this change exactly as-is without potentially making large numbers of other tests unreliable. Perhaps we need both a list and a set()? But then of course it's more work to add and remove observers.

All the more reason that we should really start performance work with a benchmark, since maintaining reliability here adds a potentially complex tradeoff.

@allrob23
Copy link
Contributor Author

I understand, thanks for the explanation, maybe adding a sorted() in the test assertion of the tests will solve the problem of reliability?

@allrob23
Copy link
Contributor Author

I understand the overall message. I'm short on time right now, but I'll take care of it as soon as I’m available.

@glyph
Copy link
Member

glyph commented Mar 26, 2025

I understand, thanks for the explanation, maybe adding a sorted() in the test assertion of the tests will solve the problem of reliability?

It's possible that code was relying upon the ordering of log observers previously, and it's not entirely clear to me that they would have been wrong to do so. If we want to make the ordering non-deterministic, we should probably have some tooling to allow developers to test with an intentionally randomized or at least intentionally-different order (perhaps reversed) before unilaterally changing the behavior.

@allrob23
Copy link
Contributor Author

I ran this benchmark separately in PR #12445, and on my machine the results were encouraging.

After the changes, the observer add/remove performance improved significantly, especially for larger sets.

Before:
image

After:
image

@allrob23
Copy link
Contributor Author

please review

@chevah-robot chevah-robot requested a review from a team April 30, 2025 00:32
adiroiban added a commit that referenced this pull request May 2, 2025
Copy link
Member

@adiroiban adiroiban left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR.

I see that a performance test is failing, this needs to be fixed before merging.

I have never used the Twisted logger in production so I am not familiar with it's use cases.

In the documentation I see

Furthermore, no guarantees are made as to the order in which observers are called, so the effect of such modifications on other observers may be non-deterministic.


self.assertEqual(traces[1], ((publisher, o1),))
self.assertEqual(traces[2], ((publisher, o1), (publisher, o2)))
self.assertIn((publisher, o1), traces[1])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just asking. Can we use the stdlib assertCountEqual here ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be honest, I don't know if it would work to use assertCountEqual here

@allrob23
Copy link
Contributor Author

allrob23 commented May 2, 2025

About the performance check:

The change in test_log_publisher_call_dispatch (from 1.3 ms to 1.4 ms, -6.71%) justifies the significant performance gains in other benchmarks. Notably, test_log_publisher_add_remove improved dramatically from 137.2 ms to 2.4 ms (57x faster), and test_lineReceiver[64] gained 3.25% (8.5 ms to 8.2 ms).

BTW, on my machine, the test_log_publisher_call_dispatch slowdown does not occur.

Further highlight the overall positive impact. @adiroiban what do you think?

@adiroiban
Copy link
Member

Sorry for the delay.
I don't know what to say here.

This PR has some improvemtns, but at the same time it contains a regression.

I don't have much experience with writing/handling performance tests.

Maybe there is a defect in the way test_log_publisher_call_dispatch is written.

I think that log publishing is much more important than add/remove of observers... and it matters that we have a performance issue here.

My suggestion is try to get more eyes for this PR by asking for feedback over the mailing list https://mail.python.org/mailman3/lists/twisted.python.org/

@glyph
Copy link
Member

glyph commented May 3, 2025

I think that log publishing is much more important than add/remove of observers... and it matters that we have a performance issue here.

I am also inclined to think that this is the more significant code path in most applications, but obviously I'd prefer it if both were fast.

@allrob23 — I guess I'm curious for a more in-depth explanation of the motivating use-case for #12439 — is there a type of application where you want to add and remove log observers in a tight loop, and/or is there a place where Twisted is doing this itself leading to some other, higher-level slowdown?

Thanks!

@allrob23
Copy link
Contributor Author

allrob23 commented May 4, 2025

Thank you for the responses! I quickly noticed that a benchmark test was incorrect, and the DummyObserver was taking up some time. I opened another PR (#12446) to fix this. I'm a bit busy with other tasks right now, but I can dive into the details of how we spotted these issues later. Please review PR #12446, and we can revisit this discussion once it's merged.

@chevah-robot chevah-robot requested a review from a team May 4, 2025 15:36
Copy link
Member

@glyph glyph left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like we are blocked on #12465 to get this to the point where we could even reasonably ack any perf regression, so I am taking this out of the queue for now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Optimize LogPublisher.addObserver by using a set instead of a list

4 participants