Skip to content

KAFKA-7660: Fix child sensor memory leak#5974

Merged
guozhangwang merged 2 commits intoapache:trunkfrom
vvcephei:fix-child-sensor-memory-leak
Nov 30, 2018
Merged

KAFKA-7660: Fix child sensor memory leak#5974
guozhangwang merged 2 commits intoapache:trunkfrom
vvcephei:fix-child-sensor-memory-leak

Conversation

@vvcephei
Copy link
Copy Markdown
Contributor

A heap dump provided by Patrik Kleindl in
https://issues.apache.org/jira/browse/KAFKA-7660
identifies the childrenSensors map in Metrics as keeping
references to sensors alive after they have been removed.

This PR fixes it and adds a test to be sure.

Committer Checklist (excluded from commit message)

  • Verify design and implementation
  • Verify test coverage and CI build status
  • Verify documentation (including upgrade notes)

return this.name;
}

List<Sensor> parents() {
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add package-private methods without a KIP?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be fine.

@vvcephei
Copy link
Copy Markdown
Contributor Author

@guozhangwang do you mind taking a look at this?
I'm not sure who else to ask for reviews...

Copy link
Copy Markdown
Contributor

@guozhangwang guozhangwang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Waiting for unit test to pass.

return this.name;
}

List<Sensor> parents() {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be fine.

log.debug("Removed sensor with name {}", name);
childSensors = childrenSensors.remove(sensor);
for (final Sensor parent : sensor.parents()) {
childrenSensors.getOrDefault(parent, emptyList()).remove(sensor);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's really weird that we store a reference to the parent sensors inside the Sensor object, but children sensors are kept in a separate map. 😕

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I frowned upon it as well, and for the history the parent stored on sensor was there since day one, while the children map is added a year later. I think it's actually better to keep both the parents and children inside a single Sensor, but we can do that in another cleanup PR.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I fought the urge to refactor it into a more traditional bidirectionally linked graph structure.

Maybe later; it doesn't seem worth opening that can of worms right now.

Copy link
Copy Markdown
Contributor

@hachikuji hachikuji left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, left a couple minor comments.


final Sensor parent = metrics.sensor("parent");
metrics.sensor("child", parent);

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe worth a sanity assertion before removal of the child sensor? Something like this:

assertTrue(metrics.childrenSensors().get(parent).contains(child));

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good idea.

removeMetric(metric.metricName());
log.debug("Removed sensor with name {}", name);
childSensors = childrenSensors.remove(sensor);
for (final Sensor parent : sensor.parents()) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm.. On sensor creation, we have a check if parents is null. Should we have that check here as well or is it unneeded in the creation logic?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure I follow: since we check and avoid nulls at the creation time already, parents() call is hence always safe to return not-null, right?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm ashamed I didn't think of this. I think it's not needed here, since the sensor's constructor ensures that the parents field is not null. If the constructor arg is null, the field is initialized as an empty array.

Does that seem legit?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yeah, we do have a null check in the constructor. Good enough for me.

@vvcephei
Copy link
Copy Markdown
Contributor Author

Ok @hachikuji , I spruced up the test a bit.

@mjsax mjsax added the streams label Nov 30, 2018
Copy link
Copy Markdown
Contributor

@hachikuji hachikuji left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@guozhangwang guozhangwang merged commit b7d95da into apache:trunk Nov 30, 2018
@vvcephei vvcephei deleted the fix-child-sensor-memory-leak branch November 30, 2018 20:36
pengxiaolong pushed a commit to pengxiaolong/kafka that referenced this pull request Jun 14, 2019
A heap dump provided by Patrik Kleindl in https://issues.apache.org/jira/browse/KAFKA-7660 identifies the childrenSensors map in Metrics as keeping references to sensors alive after they have been removed.

This PR fixes it and adds a test to be sure.

Reviewers: Jason Gustafson <jason@confluent.io>, Guozhang Wang <wangguoz@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants