Skip to content

explicitly unmap hydrant files when abandonSegment to recycle mmap memory#4341

Merged
leventov merged 4 commits intoapache:masterfrom
kaijianding:kafka
Jun 1, 2017
Merged

explicitly unmap hydrant files when abandonSegment to recycle mmap memory#4341
leventov merged 4 commits intoapache:masterfrom
kaijianding:kafka

Conversation

@kaijianding
Copy link
Copy Markdown
Contributor

@kaijianding kaijianding commented May 30, 2017

I noticed the RSS memory grown to extremely large number in realtime node in one of my environments. The memory usage grows after index merge and never goes down after handoff like it in other environments.
Usually the FileUtils.deleteDirectory(target); can recycle mmap memory, but it doesn't work in some environment.
We should explicitly unmap hydrant files when abandonSegment to recycle mmap memory.

Also a test fix in TestKafkaExtractionCluster to use random port like in KafkaSupervisorTest. This test failed if the kafka was already started in environment(my environment is not as clean as in Travis environment)

serverProperties.put("zookeeper.connect", zkTestServer.getConnectString() + zkKafkaPath);
serverProperties.put("zookeeper.session.timeout.ms", "10000");
serverProperties.put("zookeeper.sync.time.ms", "200");
serverProperties.put("port", String.valueOf(new Random().nextInt(9999) + 10000));
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use ThreadLocalRandom.current()

mergedFile,
sink.getSegment().withDimensions(Lists.newArrayList(index.getAvailableDimensions()))
);
index.close();
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't seem that creating index is even needed here, IndexMerger.getMergedDimensions(indexes) could be used

);
for (FireHydrant hydrant : sink) {
cache.close(SinkQuerySegmentWalker.makeHydrantCacheIdentifier(hydrant));
hydrant.getSegment().close();
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Despite it's documentation, abandonSegment() is called not only from mergeExecutor, so races between persistAndMerge() and abandonSegment() are possible. It should be resolved before merging this change, because it may lead to JVM crashes.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kaijianding could you address this?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could you explain more on the race condition? @leventov

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

abandonSegment() is called in FlushingPlumber not from mergeExecutor. Well, probably it doesn't lead to race, because in the context of FlushingPlumber mergeExecutor is not used at all, but it's a dangerous situation. Could you please refactor RealtimePlumber/FlushingPlumber by extracting logic and fields which are used by both into a superclass, and make RealtimePlumber and FlushingPlumber both subclasses of that class. So that FlushingPlumber doesn't have unused Executor fields.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I got your point. You'd like to move the merge and handoff part of code out of the parent class and make them only in RealtimePlumber who really needs to care about merge and handoff.
I tried to refactor as you described, I found there are huge diff to make and I think it need be carefully tested. I think it's big risk to do it in this PR.
As FlushingPlumber doesn't really start the mergeExecuter, thus there is no race condition here. Should we do the refactor in another PR? @leventov

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO, another PR is ok. Or maybe even skipping it altogether, and instead, migrating users of Plumbers to Appenderator (which is meant to be an improved replacement).

The flushing plumber isn't really meant to be used in production anyway (I think it's not even documented). It was meant to be a way to set up some realtime demos that just throw away data after a period of time.

Copy link
Copy Markdown
Contributor

@gianm gianm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kaijianding, could you make a similar change in AppenderatorImpl as well, in the "abandonSegment" method?

I think there's no need to refactor flushing/realtime plumber, since:

  1. Plumbers could be rewritten in the future in terms of Appenderators, or alternatively replaced with Appenderators, which are more flexible (see original description of #2220)
  2. The flushing plumber is not expected to be used in prod anyway.

// We are materializing the list for performance reasons. Lists.transform
// only creates a "view" of the original list, meaning the function gets
// applied every time you access an element.
return Lists.transform(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The behavior is different here. The list used to be materialized (Lists.newArrayList) but now it's a view (Lists.transform). The comment says it should be materialized, so I think the old code was right.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also it's better to use Stream API for new code, instead of Guava

Copy link
Copy Markdown
Contributor

@gianm gianm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@gianm
Copy link
Copy Markdown
Contributor

gianm commented Jun 1, 2017

Got an error from the TeamCity build:

[00:03:17]Failed to load profile from '/mnt/agent/work/68c78b7e1c77925/.idea/inspectionProfiles/Druid.xml'
[00:03:19]Process exited with code 1
[00:03:19]Inspection output
[00:03:19][Inspection output] Starting up IntelliJ IDEA 2017.1.2 (build IU-171.4249.39) ...done.
[00:03:19][Inspection output] Opening project...done.
[00:03:19][Inspection output] Failed to load profile from '/mnt/agent/work/68c78b7e1c77925/.idea/inspectionProfiles/Druid.xml'
[00:03:19][Inspection output] Initializing project...

I clicked "Run" in its UI to see if it will work when run again.

@leventov
Copy link
Copy Markdown
Member

leventov commented Jun 1, 2017

@gianm this failure could be ignored until #4348 is merged

@gianm
Copy link
Copy Markdown
Contributor

gianm commented Jun 1, 2017

Ah, got it.

@leventov leventov closed this Jun 1, 2017
@leventov leventov reopened this Jun 1, 2017
@leventov leventov merged commit 0efd182 into apache:master Jun 1, 2017
@kaijianding kaijianding deleted the kafka branch June 2, 2017 02:03
@gianm gianm added this to the 0.10.1 milestone Jun 6, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants