KAFKA-7834: Extend collected logs in system test services to include heap dumps#6158
KAFKA-7834: Extend collected logs in system test services to include heap dumps#6158kkonstantine wants to merge 4 commits intoapache:trunkfrom
Conversation
There was a problem hiding this comment.
are these all under PERSISTENT_ROOT? if yes, can we just handle PERSISTENT_ROOT and not worry about individual files?
There was a problem hiding this comment.
the other_files are not necessarily under PERSISTENT_ROOT
What is under it for sure is:
self.CONFIG_FILE, self.LOG4J_CONFIG_FILE, self.PID_FILE, self.LOG_FILE, self.STDOUT_FILE, self.STDERR_FILE, self.EXTERNAL_CONFIGS_FILE which I removed from explicit listing and I'm removing with PERSISTENT_ROOT altogether.
There was a problem hiding this comment.
ok, I get that though I think that's just tech debt. for any test related files, we really shouldn't be using anything other than PERSISTENT_ROOT so that we can at least attempt to ensure each test/service gets a clean workspace
There was a problem hiding this comment.
we seem to duplicate this pattern of setting constants, is there any opportunity to relocate and reduce to one copy of these constants (given they should be same across all our java services)?
There was a problem hiding this comment.
With each service setting its security configs in KAFKA_OPTS using their own way, this seems difficult at the moment. I'd prefer if I could avoid to extend this PR to a general refactoring at this point.
There was a problem hiding this comment.
what is the logic for this, including the different ways env_opts is being encoded?
There was a problem hiding this comment.
I ended up adding the heap jvm arguments in KAFKA_OPTS instead of KAFKA_HEAP_OPTS because the latter has defaults for sizes if it's empty. However, if I needed to add values to an env variable, I can't just append them to it's current value, because the value is wrapped with \"
That's why I'm adding this method - unused here - that can be used by tests that inherit this class. Let me know if you think I should remove.
There was a problem hiding this comment.
yeah, in general the construction of commands could be cleaned up quite a bit for extensibility -- i think we tend to forget about the importance/value of doing this when writing tests in the Kafka repo, but these classes should be reusable and extensible. this approach seems pretty error-prone, although at least it should consistently fail if the quoting gets messed up since you'll have invalid extra args on the command line.
There was a problem hiding this comment.
what's happening with the various strip('\"') calls? why do we need this process? can we refactor to something that makes it clearer what's happening?
There was a problem hiding this comment.
As I mention above, we often wrap the value of an env var in escaped double quotes. To append contents, we first need to remove those. If they don't exist, strip is a no-op.
There was a problem hiding this comment.
this seems unrelated to this PR? and the bucket probably shouldn't change unless we're changing the target bucket -- we want only 1, permanent bucket if possible
There was a problem hiding this comment.
It is temporary. Was added with the commit named: TEMP commit to unblock system test branch builder in order to fix an issue with the jenkins builder and be able to run the tests. Should be removed before merging
38d4110 to
7775110
Compare
7775110 to
b964036
Compare
ewencp
left a comment
There was a problem hiding this comment.
LGTM. The JIRA is marked to be backported to 1.0, but it looks like this only backports cleanly to 2.0. Additionally, 2.1.1 is currently in flight, and although this is only test code and I don't think too risky, I don't want to merge until we get the next RC out the door. So I may hold off on the actual merge here until that passes.
There was a problem hiding this comment.
ok, I get that though I think that's just tech debt. for any test related files, we really shouldn't be using anything other than PERSISTENT_ROOT so that we can at least attempt to ensure each test/service gets a clean workspace
There was a problem hiding this comment.
yeah, in general the construction of commands could be cleaned up quite a bit for extensibility -- i think we tend to forget about the importance/value of doing this when writing tests in the Kafka repo, but these classes should be reusable and extensible. this approach seems pretty error-prone, although at least it should consistently fail if the quoting gets messed up since you'll have invalid extra args on the command line.
|
Actually, after further discussion, will merge to trunk, 2.2, and 2.0 so we can get debugging information out for some failing tests. Will follow up on 2.1 after it won't get in the way of 2.1.1 release, and we'll follow up separately with 1.x versions if we find the additional info to be valuable, but hopefully tests shouldn't be that unstable on long term maintenance branches. |
|
Excellent. Thanks @ewencp! I'll keep track progress with respect to |
…heap dumps * Enable heap dumps on OOM with -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=<file.bin> in the major services in system tests * Collect the heap dump from the predefined location as part of the result logs for each service * Change Connect service to delete the whole root directory instead of individual expected files * Tested by running the full suite of system tests Author: Konstantine Karantasis <konstantine@confluent.io> Reviewers: Ewen Cheslack-Postava <ewen@confluent.io> Closes #6158 from kkonstantine/KAFKA-7834 (cherry picked from commit 83c435f) Signed-off-by: Ewen Cheslack-Postava <me@ewencp.org>
…heap dumps * Enable heap dumps on OOM with -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=<file.bin> in the major services in system tests * Collect the heap dump from the predefined location as part of the result logs for each service * Change Connect service to delete the whole root directory instead of individual expected files * Tested by running the full suite of system tests Author: Konstantine Karantasis <konstantine@confluent.io> Reviewers: Ewen Cheslack-Postava <ewen@confluent.io> Closes #6158 from kkonstantine/KAFKA-7834 (cherry picked from commit 83c435f) Signed-off-by: Ewen Cheslack-Postava <me@ewencp.org>
|
Now that 2.1.1 has finally landed, cherry-picking this into 2.1 as well. Seems like it has been fine for awhile on other branches anyway so probably we could have risked it even earlier. |
…heap dumps * Enable heap dumps on OOM with -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=<file.bin> in the major services in system tests * Collect the heap dump from the predefined location as part of the result logs for each service * Change Connect service to delete the whole root directory instead of individual expected files * Tested by running the full suite of system tests Author: Konstantine Karantasis <konstantine@confluent.io> Reviewers: Ewen Cheslack-Postava <ewen@confluent.io> Closes #6158 from kkonstantine/KAFKA-7834
…heap dumps * Enable heap dumps on OOM with -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=<file.bin> in the major services in system tests * Collect the heap dump from the predefined location as part of the result logs for each service * Change Connect service to delete the whole root directory instead of individual expected files * Tested by running the full suite of system tests Author: Konstantine Karantasis <konstantine@confluent.io> Reviewers: Ewen Cheslack-Postava <ewen@confluent.io> Closes apache#6158 from kkonstantine/KAFKA-7834
Committer Checklist (excluded from commit message)