Skip to content

KAFKA-7660: fix parentSensors memory leak#5953

Merged
guozhangwang merged 4 commits intoapache:trunkfrom
vvcephei:parentsensors-memory-leak
Nov 29, 2018
Merged

KAFKA-7660: fix parentSensors memory leak#5953
guozhangwang merged 4 commits intoapache:trunkfrom
vvcephei:parentsensors-memory-leak

Conversation

@vvcephei
Copy link
Copy Markdown
Contributor

In StreamsMetricsImpl, the parentSensors map was keeping references to Sensors after the sensors themselves had been removed.

Committer Checklist (excluded from commit message)

  • Verify design and implementation
  • Verify test coverage and CI build status
  • Verify documentation (including upgrade notes)

@vvcephei
Copy link
Copy Markdown
Contributor Author

@guozhangwang @bbejeck do you mind taking a look at this if you have the chance?

@mjsax mjsax added the streams label Nov 27, 2018
Copy link
Copy Markdown
Member

@mjsax mjsax left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

}
}

/** visible for testing */
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: /** ... */ -> /* */

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We usually avoid the get prefix in cases like this

metrics.removeSensor(sensor.name());

final Sensor parent = parentSensors.get(sensor);
final Sensor parent = parentSensors.remove(sensor);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice catch!

Copy link
Copy Markdown
Member

@bbejeck bbejeck left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the patch @vvcephei LGMT

@bbejeck
Copy link
Copy Markdown
Member

bbejeck commented Nov 28, 2018

failures unrelated

retest this please

@mjsax
Copy link
Copy Markdown
Member

mjsax commented Nov 28, 2018

@vvcephei Build failed with checkstyle error.

@vvcephei
Copy link
Copy Markdown
Contributor Author

Hi all, I've addressed the comments and fixed the error.

I found an additional leak in removeAllNodeLevelSensors that left an empty Deque for each node that had been unregistered.

I took the opportunity to refactor the other (xLevelSensor) tracking mechanisms we use and added a test to make sure the metrics really do get cleaned up and idempotently registered during migrations. The refactored code is simpler and should be more resilient to mistakes like this.

I don't plan to tack on anything else to this PR.

Do you mind taking another look?

Thanks,
-John

Copy link
Copy Markdown
Contributor

@guozhangwang guozhangwang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vvcephei Thanks for the updated patch! I just had a minor question on unit test and lgtm otherwise.


assertThat(registry.metrics().size(), equalTo(numberOfTaskMetrics));

final Sensor parent2 = metrics.taskLevelSensor(taskName, operation, Sensor.RecordingLevel.DEBUG);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is this used for? Sensor parent1 and parent2 should be referring to exactly the same object right?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They should be, but in reality the same parent sensor is (re)registered each time a child sensor is created, so I wanted to (idempotently) create a second parent sensor to make sure there's no extra references.

Copy link
Copy Markdown
Member

@bbejeck bbejeck left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes LGTM

@vvcephei
Copy link
Copy Markdown
Contributor Author

The test failures are unrelated:

kafka.api.ClientIdQuotaTest.testThrottledProducerConsumer
kafka.api.DelegationTokenEndToEndAuthorizationTest.testNoConsumeWithDescribeAclViaAssign
kafka.api.UserQuotaTest.testThrottledProducerConsumer
kafka.server.DynamicBrokerReconfigurationTest.testAddRemoveSaslListeners

Retest this, please.

@guozhangwang guozhangwang merged commit bfbc32d into apache:trunk Nov 29, 2018
@vvcephei vvcephei deleted the parentsensors-memory-leak branch November 29, 2018 23:24
}

@Test
public void testMutiLevelSensorRemoval() {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo: testMu[l]tiLevelSensorRemoval() {

pengxiaolong pushed a commit to pengxiaolong/kafka that referenced this pull request Jun 14, 2019
In StreamsMetricsImpl, the parentSensors map was keeping references to Sensors after the sensors themselves had been removed.

Reviewers: Matthias J. Sax <matthias@confluent.io>, Bill Bejeck <bill@confluent.io>, Guozhang Wang <wangguoz@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants