KAFKA-8557: system tests - add support for (optional) interbroker listener with the same security protocol as client listeners#6938
Conversation
|
@ewencp @rajinisivaram @brianbushree @jeffhuang26 @jonchiu @cmccabe @rnpridgeon @C0urante |
rajinisivaram
left a comment
There was a problem hiding this comment.
@stan-confluent Thanks for the PR. Left a few comments. I dont think this change requires more than one or atmost two additional tests. There seems to be a large number of new test combinations which would add a lot of time to a full test run.
| authorizer_class_name=None, topics=None, version=DEV_BRANCH, jmx_object_names=None, | ||
| jmx_attributes=None, zk_connect_timeout=5000, zk_session_timeout=6000, server_prop_overides=None, zk_chroot=None): | ||
| jmx_attributes=None, zk_connect_timeout=5000, zk_session_timeout=6000, server_prop_overides=None, zk_chroot=None, | ||
| use_separate_interbroker_listener=False): |
There was a problem hiding this comment.
Have we considered making this a bit more flexible? For example, would it be better to allow tests to specify client_listener_name and interbroker_listener_name rather than a flag. There is a lot of tech debt here since this code was written before listener names were introduced. I was wondering if this could be an opportunity to clear some of that. If you have already considered that and found it a lot of work, then we merge this and consider that later.
There was a problem hiding this comment.
Yes, all good points. I originally intended to do a bigger change, encapsulating all listener-related stuff (security protocol, port, sasl config) in a separate Listener class, and instead of pre-defined dict with ports we could actually add and remove listener objects to/from the dict. But that's a massive and not particularly backwards compatible change.
As for 'client_listener_name' and 'interbroker_listener_name' - I originally wrote a PR like that (only for interbroker one though, client change, again, is much bigger and harder to keep backwards compatible) - but went for boolean for practical reasons, since in the tests that we envision so far we don't care about listener name, just the fact that there are two listeners on the same security protocol - so effectively I was just calling constructor with interbroker_listener_name=KafkaService.INTERBROKER_LISTENER_NAME.
I can send a follow-up PR to enable custom name interbroker listener or update this one, whichever you prefer. I would suggest making changes to client listeners in separate PRs.
There was a problem hiding this comment.
I played a bit with 'interbroker_listener_name' and its tricky to marry two concepts - existing one and custom named one, without having quite a messy code. I'd suggest keeping this PR as is, with boolean value, since the tests are passing and send a follow up with custom names - this way I can unblock several people who depend on separate interbroker listener, but don't care about custom names. Wdyt?
| self.set_authorizer_and_bounce(security_protocol, security_protocol) | ||
|
|
||
| @cluster(num_nodes=9) | ||
| @matrix(broker_protocol=[SecurityConfig.SSL, SecurityConfig.SASL_PLAINTEXT, SecurityConfig.SASL_SSL], |
There was a problem hiding this comment.
Do we need all these combinations of tests? Also, it is unrelated to upgrade, so could be in a different test class?
There was a problem hiding this comment.
Good point, I don't think we need all combinations here - we're interested in the following cases:
- client and broker on the different protocols
- client and broker on the same protocol, different sasl mechanisms
- client and broker on the same protocol, same sasl mechanism.
I'll update the PR with this change, please let me know if I'm missing something here.
There was a problem hiding this comment.
Changed the tests as you suggested, moving out interbroker listener tests into a separate test class and reducing their number.
|
|
||
| @cluster(num_nodes=9) | ||
| @matrix(broker_protocol=[SecurityConfig.SSL, SecurityConfig.SASL_PLAINTEXT, SecurityConfig.SASL_SSL], | ||
| client_protocol=[SecurityConfig.SSL, SecurityConfig.SASL_PLAINTEXT, SecurityConfig.SASL_SSL]) |
There was a problem hiding this comment.
Again, not sure why we have all thse combinations, and why the test is with upgrade tests.
There was a problem hiding this comment.
Agreed, will move it out.
|
@stan-confluent Also, since this change can affect several tests, can you start a system test run on your branch in the system test branch builder? If you have already done that, can you include the results here? Thanks. |
…d reduced number of test cases significantly.
|
This is a full system test run on the original PR, as you can see there's one test failure directly related to my changes. I'm investigating it and will update the PR with the fix if any. |
|
@rajinisivaram I cannot reproduce the issue on this PR anymore. Might have been fixed when I moved the tests out. I pushed a change to specifically test what was broken (client on SSL, broker on SASL_SSL) and it passes. Rerunning that specific test on jenkins now. |
rajinisivaram
left a comment
There was a problem hiding this comment.
@stan-confluent Thanks for the update. Left another comment about reducing the number of tests for this change. Apart from that looks good.
|
|
||
| # run produce/consume/validate loop while enabling a separate interbroker listener via rolling restart | ||
| self.run_produce_consume_validate( | ||
| self.roll_in_interbroker_listener, broker_protocol, broker_sasl_mechanism, True) |
There was a problem hiding this comment.
Is there value in an upgrade test for this? Listener names are internal to the broker and as long as security protocols match, everything should just work. That is the reason why we didn't need to add upgrade tests when listener names were added all those releases ago. I think it is useful to have a test with listener names, but a single test, rather than all these combinations - there are six tests, each with rolling restarts in this file, adding a lot of time to the system tests. I would suggest limiting to one combination for each test and without rolling restart. We could include this in tests/kafkatest/tests/core/security_test.py if it was just one test, but keeping it separate as it is now is fine too.
There was a problem hiding this comment.
@rajinisivaram I was going to argue that we do need to cover a use case where two listeners have the same security protocol since this is new functionality for this pr, but then this bit is covered in newly added test_rolling_upgrade_phase_two_separate_interbroker_listener() and two test_rolling_upgrade_sasl_mechanism... methods in security_rolling_upgrade_test.py so I might get rid of this file altogether - I can just add a single case for removing a separate listener with rolling restart (I think there is some value in this, more for testing KafkaService itself, to ensure it closes the port correctly) and that's it. Wdyt?
There was a problem hiding this comment.
@rajinisivaram per your advice, reduced a number of tests to two and both live in security_rolling_upgrade_test now.
…I missed initially and Brian pointed out. Simplified tests a bit more. Got rid of automatic \ opening of interbroker port in the setup method since it was inconsistent with how client port behaves
|
@brianbushree @rajinisivaram - I pushed another small but important update, here's what's changed:
|
|
The latest update leads to two failing tests, investigating |
…er's sasl.enabled.mechanisms
|
aaaand rolled back the change where we have per listener sasl.enabled.mechanisms. Its still the right thing to do long term, but it breaks couple of tests that rely on a shared list of sasl mechanisms specifically. Until we support multiple sasl mechanisms per listener in KafkaService and SecurityConfig (which is doable, but quite a big change), we should keep a shared list. |
|
Test result for security_rolling_upgrade_test.py on the latest version (spoiler - all passed): |
rajinisivaram
left a comment
There was a problem hiding this comment.
@stan-confluent Thanks for the updates, LGTM
|
@stan-confluent Thanks for the PR, merging to trunk |
…tener with the same security protocol as client listeners (apache#6938) Reviewers: Brian Bushree <bbushree@confluent.io>, Rajini Sivaram <rajinisivaram@googlemail.com>
…tener with the same security protocol as client listeners (#6938) Reviewers: Brian Bushree <bbushree@confluent.io>, Rajini Sivaram <rajinisivaram@googlemail.com>
This PR adds an option to
kafka.pyto start a separate interbroker listener using the same security protocol as one of the predefined listeners in 'port_mappings' dict.This is a first baby-step to properly support named listeners in system tests.
Kafka currently supports named listeners, where you can have two or more listeners with the same security protocol but different names. Current
KafkaServiceimplementation, however, wouldn't allow that, since listeners inport_mappingsare keyed bysecurity_protocol, so there's 1-1 relationship. Clients usebootstrap_serversmethod which also accepts security_protocol.This PR adds a parameter
use_separate_interbroker_listenerto KafkaService constructor as well as adds asetup_interbroker_listener()method, which, if provided/invoked, will enable a separate listener inport_mappingsmap, keyed byINTERBROKER_LISTENER_NAME. This way client implementations don't need to change and can still pick a port based on security protocol, but underneath there will be an additional port open used exclusively for broker-to-broker communication. This opens up additional scenarios for testing.This change should be backwards compatible - default value for
use_separate_interbroker_listenerisFalse, and I tried to make sure that behavior ofKafkaServicedoesn't change when it'sFalse. Even when it isTrue, it should not affect behavior of clients, which, as described above, would continue to pick ports based on security_protocol field.I intentionally didn't change behavior more than this. We'll work on adding wider support for named listeners, but its better to make smaller incremental changes rather than trying to boil the ocean.
Testing:
Updated
security_rolling_upgrade_testto include scenarios when separate interbroker listener is present. Ran these subset of these scenarios locally (didn't run all security_protocol permutations, just some of them) usingducker-ak.Ran updated
security_rolling_upgrade_testin branch builder, all passed:https://jenkins.confluent.io/job/system-test-kafka-branch-builder/2713/
Committer Checklist (excluded from commit message)