Skip to content

MINOR: System tests for Raft-based metadata quorums#10105

Merged
cmccabe merged 38 commits intoapache:trunkfrom
rondagostino:kip500_full
Feb 22, 2021
Merged

MINOR: System tests for Raft-based metadata quorums#10105
cmccabe merged 38 commits intoapache:trunkfrom
rondagostino:kip500_full

Conversation

@rondagostino
Copy link
Copy Markdown
Contributor

@rondagostino rondagostino commented Feb 10, 2021

Adds a new sanity_checks/bounce_test.py system test for a simple produce/bounce/produce series of events. Augments this and other sanity_checks tests so they run all metadata quorum types: ZooKeeper, remote Raft, and co-located Raft.

Augments several tests/client and tests/core system tests to run for remote Raft-based metadata quorums in addition to ZooKeeper. Co-located Raft controllers are not tested here as this configuration is presumed to work assuming sanity_checks as well as any unit/integration tests pass.

Augments a simple set of tests in tests/connect, tests/streams, and tests/tools to run for remote Raft-based metadata quorums in addition to ZooKeeper.

Adds missing @cluster annotations for dozens of system tests that were missing them -- this was causing these tests to grab the entire cluster of nodes and negatively impact the parallelism of the system test run.

Committer Checklist (excluded from commit message)

  • Verify design and implementation
  • Verify test coverage and CI build status
  • Verify documentation (including upgrade notes)

@rondagostino
Copy link
Copy Markdown
Contributor Author

There are 6 failing tests across these 3 test classes:

kafka.server.KafkaConfigTest

KafkaConfigTest > testFromPropsInvalid() FAILED
    org.opentest4j.AssertionFailedError: Expected exception for property `broker.heartbeat.interval.ms` with invalid value `-1` was not thrown ==> Expected java.lang.Exception to be thrown, but nothing was thrown.

AutoTopicCreationManagerTest

AutoTopicCreationManagerTest > testCreateTxnTopic() FAILED
    Wanted but not invoked:
    brokerToControllerChannelManager.sendRequest(
        CreateTopicsRequestData(topics=[CreatableTopic(name='__transaction_state', numPartitions=2, replicationFactor=2, assignments=[], configs=[])], timeoutMs=100, validateOnly=false),
        <any kafka.server.ControllerRequestCompletionHandler>
    );
    -> at kafka.server.AutoTopicCreationManagerTest.testCreateTopic(AutoTopicCreationManagerTest.scala:119)
    Actually, there were zero interactions with this mock.

AutoTopicCreationManagerTest > testCreateOffsetTopic() FAILED
    Wanted but not invoked:
    brokerToControllerChannelManager.sendRequest(
        CreateTopicsRequestData(topics=[CreatableTopic(name='__consumer_offsets', numPartitions=2, replicationFactor=2, assignments=[], configs=[])], timeoutMs=100, validateOnly=false),
        <any kafka.server.ControllerRequestCompletionHandler>
    );
    -> at kafka.server.AutoTopicCreationManagerTest.testCreateTopic(AutoTopicCreationManagerTest.scala:119)
    Actually, there were zero interactions with this mock.

AutoTopicCreationManagerTest > testCreateNonInternalTopic() FAILED
    Wanted but not invoked:
    brokerToControllerChannelManager.sendRequest(
        CreateTopicsRequestData(topics=[CreatableTopic(name='topic', numPartitions=1, replicationFactor=1, assignments=[], configs=[])], timeoutMs=100, validateOnly=false),
        <any kafka.server.ControllerRequestCompletionHandler>
    );
    -> at kafka.server.AutoTopicCreationManagerTest.testCreateTopic(AutoTopicCreationManagerTest.scala:119)
    Actually, there were zero interactions with this mock.

MetadataRequestWithForwardingTest

MetadataRequestWithForwardingTest > testAutoCreateOfCollidingTopics() FAILED
    org.opentest4j.AssertionFailedError: expected: <Set(LEADER_NOT_AVAILABLE, INVALID_TOPIC_EXCEPTION)> but was: <Set(UNKNOWN_TOPIC_OR_PARTITION)>

MetadataRequestWithForwardingTest > testAutoTopicCreation() FAILED
    org.opentest4j.AssertionFailedError: expected: <LEADER_NOT_AVAILABLE> but was: <UNKNOWN_TOPIC_OR_PARTITION>

cmccabe and others added 10 commits February 17, 2021 12:44
No need to invoke dynamicConfig.initialize() at all when using Raft

Fix formatting and make variables private as per review

Revert optional nodeId in SocketServer: aliasing makes it unnecessary

Exclude Raft controller nodes from Produce magic version determination

Fix responsibility for startup/shutdown of alterisr/forwarding managers

Fix BrokerServer log ident

Invoke replicaManager.endMetadataChangeDeferral() before starting up

Fix compile error

Return random active broker ID for Raft-based controller ID

Ignore reserved.broker.max.id for Raft-based cases

Cleanup KafkaConfig and tests related to {broker,node}.id

Fix bug in initializeLogDirs()

Add NetworkChannel.updateEndpoint(int id, RaftConfig.InetAddressSpec address)

fix compile error

revert old forwarding code

Refactor envelope handling logic into a common place

Add EnvelopeUtils

Add ControllerServer
Comment thread build.gradle
compile project(':clients')
compile libs.jacksonDatabind
compile libs.jacksonJDK8Datatypes
compile libs.metrics
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's file a JIRA to discuss whether the metadata module should use Kafka metrics or Yammer metrics.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

file a JIRA to discuss whether the metadata module should use Kafka metrics or Yammer metrics.

https://issues.apache.org/jira/browse/KAFKA-12348

cc: @cmccabe

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks. I think this will be a post-2.8 thing

@rondagostino rondagostino changed the title Kip500 full MINOR: System tests for Raft-based metadata quorums Feb 20, 2021
@rondagostino rondagostino marked this pull request as ready for review February 20, 2021 02:46
@rondagostino
Copy link
Copy Markdown
Contributor Author

http://confluent-kafka-branch-builder-system-test-results.s3-us-west-2.amazonaws.com/2021-02-20--001.1613812911--rondagostino--kip500_full--f59453faa/report.html

Tests|Passes|Failures|Ignored|Time
940|558|185|197|359 minutes

Some observations. First, adding the missing @cluster annotations really helped. Running almost 750 tests in under 6 hours is a dramatic improvement. Here’s the stats from a run 2 days ago:

Tests|Passes|Failures|Ignored|Time
729|427|105|197|388 minutes

We are running 743/532 = 40% more tests, but we are able to finish in 10% less time. So that’s a big win.

We have 130 more tests passing and 80 more tests failing. It is VERY difficult to find the signal amid so much noise. A quick browse through our results shows lots of test failures that are not augmented for Raft at all, and many test that are augmented are failing for both ZK and Raft cases.

Comment thread tests/kafkatest/services/console_consumer.py
Comment thread tests/kafkatest/services/performance/end_to_end_latency.py Outdated
Comment thread tests/kafkatest/tests/client/client_compatibility_features_test.py Outdated
Comment thread tests/kafkatest/tests/client/client_compatibility_features_test.py
Comment thread tests/kafkatest/tests/client/client_compatibility_produce_consume_test.py Outdated
Comment thread tests/kafkatest/tests/core/round_trip_fault_test.py
@cmccabe
Copy link
Copy Markdown
Contributor

cmccabe commented Feb 22, 2021

Good work spotting the missing @cluster annotations! That is quite a performance win. I wonder if we should disallow tests without this annotation in the future.

@rondagostino
Copy link
Copy Markdown
Contributor Author

I wonder if we should disallow tests without [the @cluster] annotation in the future.

I think perhaps yes. Is there a need to allow it? A simple oversight generates a significant parallelism hit at this point -- and they compound quickly as evidenced here.

Copy link
Copy Markdown
Contributor

@cmccabe cmccabe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks, @rondagostino !

@cmccabe cmccabe merged commit 0711d15 into apache:trunk Feb 22, 2021
cmccabe pushed a commit that referenced this pull request Feb 22, 2021
Add the necessary test annotations to test the new KIP-500 quorum broker mode
in many of our ducktape tests. This mode is tested in addition to the classic
Apache ZooKeeper mode.

This PR also adds a new sanity_checks/bounce_test.py system test that runs
through a simple produce/bounce/produce series of events.

Finally, this PR adds @cluster annotations to dozens of system tests that were
missing them. The lack of this annotation was causing these tests to grab the
entire cluster of nodes.  Adding the @cluster annotation dramatically reduced
the time needed to run these tests.

Reviewers: Colin P. McCabe <cmccabe@apache.org>, Ismael Juma <ismael@juma.me.uk>

Conflicts:
	tests/kafkatest/tests/core/get_offset_shell_test.py
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants