Skip to content

Merge remote-tracking branch 'ak-upstream/trunk' into AK_CCS_Mar_07_22#673

Merged
Vikas Singh (soondenana) merged 29 commits intoconfluentinc:masterfrom
soondenana:AK_CCS_Mar_07_22
Mar 8, 2022
Merged

Merge remote-tracking branch 'ak-upstream/trunk' into AK_CCS_Mar_07_22#673
Vikas Singh (soondenana) merged 29 commits intoconfluentinc:masterfrom
soondenana:AK_CCS_Mar_07_22

Conversation

@soondenana
Copy link
Copy Markdown
Member

Conflict in Jenkinsfile from AK commit: bbb2dc54a0f45bc5455f22a0671adde206dcfa29 from PR: 11833

I have dropped the change as it doesn't pertain to CCS.

All other files merged cleanly.

…#11806)

KRaft authorizer should support AclOperation.ALL correctly.

Reviewers: Colin P. McCabe <cmccabe@apache.org>
List all the named topologies that have been added to this client

Reviewers: Anna Sophie Blee-Goldman <ableegoldman@apache.org>
…pache#11810)

We should be able to change the topologies while still in the CREATED state. We already allow adding them, but this should include removing them as well

Reviewers: Anna Sophie Blee-Goldman <ableegoldman@apache.org>
…apache#11807)

Use `InetAddress.getHostAddress` in `StandardAuthorizer` instead of `InetAddress.getHostName`.

Reviewers: Colin Patrick McCabe <cmccabe@confluent.io>
Add rebalance reason in Kafka Streams.

Reviewers: Luke Chen <showuon@gmail.com>, Bruno Cadonna <cadonna@apache.org>
…pache#7082)

Make SetSchemaMetadata SMT ignore records with null value and valueSchema or key and keySchema.

The transform has been unit tested for handling null values gracefully while still providing the necessary validation for non-null values.

Reviewers: Konstantine Karantasis<konstantine@confluent.io>, Bill Bejeck <bbejeck@apache.org>
…pache#11814)

Issue:
Imagine a scenario where two threads T1 and T2 are inside UnifiedLog.flush() concurrently:

KafkaScheduler thread T1 -> The periodic work calls LogManager.flushDirtyLogs() which in turn calls UnifiedLog.flush(). For example, this can happen due to log.flush.scheduler.interval.ms here.
KafkaScheduler thread T2 -> A UnifiedLog.flush() call is triggered asynchronously during segment roll here.
Supposing if thread T1 advances the recovery point beyond the flush offset of thread T2, then this could trip the check within LogSegments.values() here for thread T2, when it is called from LocalLog.flush() here. The exception causes the KafkaScheduler thread to die, which is not desirable.

Fix:
We fix this by ensuring that LocalLog.flush() is immune to the case where the recoveryPoint advances beyond the flush offset.

Reviewers: Jun Rao <junrao@gmail.com>
… of PR apache#11787 (apache#11812)

This PR addresses the remaining nits from the final review of apache#11787

It also deletes two integration test classes which had only one test in them, and moves the tests to another test class file to save on the time to bring up an entire embedded kafka cluster just for a single run

Reviewers: Guozhang Wang <guozhang@confluent.io>, Walker Carlson <wcarlson@confluent.io>
…ogyIntegrationTest tests (apache#11824)

Seen a few of the new tests added fail on PR builds lately with 

"java.lang.AssertionError: Expected all streams instances in [org.apache.kafka.streams.processor.internals.namedtopology.KafkaStreamsNamedTopologyWrapper@7fb3e6b0] to be RUNNING within 30000 ms"

We already had some tests using the 30s timeout while others were bumped all the way up to 60s, I figured we should try out a default timeout of 45s and if we still see failures in specific tests we can go from there
Reviewer: Bill Bejeck <bbejeck@apache.org>
…or Streams tests (apache#11823)

Pretty much any time we have an integration test failure that's flaky or only exposed when running on Jenkins through the PR builds, it's impossible to debug if it cannot be reproduced locally as the logs attached to the test results have truncated the entire useful part of the logs. This is due to the logs being flooded at the beginning of the test when the Kafka cluster is coming up, eating up all of the allotted characters before we even get to the actual Streams test. Setting log4j.logger.kafka to ERROR greatly improves the situation and cuts down on most of the excessive logging in my local runs. To improve things even more and have some hope of getting the part of the logs we actually need, I also set the loggers for all of the Config objects to ERROR, as these print out the value of every single config (of which there are a lot) and are not useful as we can easily figure out what the configs were if necessary by just inspecting the test locally.

Reviewers:  Luke Chen <showuon@confluent.io>,  Guozhang Wang <guozhang@confluent.io>
Create KafkaConfigSchema to encapsulate the concept of determining the types of configuration keys.
This is useful in the controller because we can't import KafkaConfig, which is part of core. Also
introduce the TimelineObject class, which is a more generic version of TimelineInteger /
TimelineLong.

Reviewers: David Arthur <mumrah@gmail.com>
This PR is part of KIP-708 and adds rack aware standby task assignment logic.

Reviewer: Bruno Cadonna <cadonna@apache.org>, Luke Chen <showuon@gmail.com>, Vladimir Sitnikov <vladimirsitnikov.apache.org>
…he#11777)

* KAFKA-10000: Add producer fencing API to admin client

Reviewers: Luke Chen <showuon@gmail.com>, Tom Bentley <tbentley@redhat.com>
Disable idempotence when conflicting config values for acks, retries
and max.in.flight.requests.per.connection are set by the user. For the
former two configs, we log at info level when we disable idempotence
due to conflicting configs. For the latter, we log at warn level since
it's due to an implementation detail that is likely to be surprising.

This mitigates compatibility impact of enabling idempotence by default.

Added unit tests to verify the change in behavior.

Reviewers: Ismael Juma <ismael@juma.me.uk>, Jason Gustafson <jason@confluent.io>, Mickael Maison <mickael.maison@gmail.com>
…secured (apache#11811)

Reviewers:  Luke Chen <showuon@confluent.io>, Jun Rao <junrao@gmail.com>
…r.sh (apache#11517)

delete unused config batch.size in kafka-console-producer.sh

Reviewer: Andrew Eugene Choi <andrew.choi@uwaterloo.ca>, Luke Chen <showuon@gmail.com>,
…is in CREATED (apache#11813)

Currently the #add/removeNamedTopology APIs behave a little wonky when the application is still in CREATED. Since adding and removing topologies runs some validation steps there is valid reason to want to add or remove a topology on a dummy app that you don't plan to start, or a real app that you haven't started yet. But to actually check the results of the validation you need to call get() on the future, so we need to make sure that get() won't block forever in the case of no failure -- as is currently the case

Reviewers: Guozhang Wang <guozhang@confluent.io>, Walker Carlson <wcarlson@confluent.io>
…cessing (apache#11827)

This test has been failing somewhat regularly due to going into the ERROR state before reaching RUNNING during the startup phase. The problem is that we are reusing the DELAYED_INPUT_STREAM topics, which had previously been assumed to be uniquely owned by a particular test. We should make sure to delete and re-create these topics for any test that uses them.
…rd fails on brokers. (apache#11830)

Reviewers: Guozhang Wang <wangguoz@gmail.com>
Differentiate between unused and unknown configs during log output.

Reviewer: Luke Chen <showuon@gmail.com>
…tter balance load between threads (apache#11493)

Balance standby and active stateful tasks evenly across threads

Reviewer: Luke Chen <showuon@gmail.com>
Reviewers: David Arthur <mumrah@gmail.com>
…nd removing a topology (apache#11847)

While debugging the flaky NamedTopologyIntegrationTest. shouldRemoveOneNamedTopologyWhileAnotherContinuesProcessing test, I did discover one real bug. The problem was that we update the TopologyMetadata's builders map (with the known topologies) inside the #removeNamedTopology call directly, whereas the StreamThread may not yet have reached the poll() in the loop and in case of an offset reset, we get an NP.e
I changed the NPE to just log a warning for now, going forward I think we should try to tackle some tech debt by keeping the processing tasks and the TopologyMetadata in sync

Also includes a quick fix on the side where we were re-adding the topology waiter/KafkaFuture for a thread being shut down

Reviewers: Guozhang Wang <guozhang@confluent.io>, Walker Carlson <wcarlson@confluent.io>
Conflict in Jenkinsfile from AK commit:
bbb2dc5 , PR: 11833

I have dropped the change as it doesn't pertain to CCS
Copy link
Copy Markdown
Member

@ijuma Ismael Juma (ijuma) left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM if Jenkins is green.

@soondenana
Copy link
Copy Markdown
Member Author

After the merge, following 3 tests are failing:

kafka.admin.LeaderElectionCommandTest.[1] Type=Raft, Name=testElectionResultOutput, Security=PLAINTEXT
kafka.api.PlaintextProducerSendTest.testSendWithInvalidCreateTime()
kafka.server.ProduceRequestTest.testProduceWithInvalidTimestamp()

I will take a look at this tomorrow.

@soondenana
Copy link
Copy Markdown
Member Author

We will need to merge again as AK trunk is broken. We need apache#11853, I will wait for that to get merged.

@soondenana
Copy link
Copy Markdown
Member Author

Only one test failed after recently pullin AK trunk

kafka.admin.DescribeConsumerGroupTest.testDescribeOffsetsOfExistingGroupWithNoMembers()

Ran it locally and it passes, so going to treat it flaky and merge the PR.

@soondenana Vikas Singh (soondenana) merged commit 2ed563c into confluentinc:master Mar 8, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.