-
Notifications
You must be signed in to change notification settings - Fork 336
test/integration: Rebase to RHEL 8.0 #993
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
I have only compile tested v4.18 and RHEL 8.0, with all of the patch(s) applied. |
|
If I understand correctly, this PR rebases the integration tests on top of upstream v4.18. That's going to be very close to RHEL-8.0 GA kernel, but not exactly the same. For example, the tracepoints-sections.patch modifies run_timer_softirq() and expects to see FWIW, I think that is the only patch that had a v4.18 vs. RHEL-8.0 discrepancy. If you can update the PR with rebase against kernel-4.18.0-80.el8.src.rpm, I can verify that they build / run on one of our internal test boxes. |
|
@joe-lawrence Thanks for the review and for help with a test run. Agree, there is the difference between the upstream v4.18 and RHEL 8.0 kernel source. When I tried, it applied with minor fuzz on a couple of test cases over RHEL 8.0 sources. Yeah, it makes sense to have rebased against RHEL 8.0 sources, instead of closet upstream. I will redo it and push it. |
b8b5a69 to
c8535ea
Compare
|
v2 changes:
|
|
Thanks for rebasing @kamalesh-babulal. Here are results from an x86_64 test run: One build failure, perhaps a result of the objtool warnings? and one test script failure, due to print formatting changes (so something like I'll try to do a similar ppc64le test run when I get a chance. |
|
And the same for ppc64le make.out.txt Build errors: Test errors: data-new (same as x86_64 above) shadow-newpid crashes the machine: I'll look into this crash later today... |
|
@joe-lawrence Thanks a lots for testing, I was trying to |
|
Hmm, shadow-newpid.patch crashes even when I sub out all the code for a nop(). Some more files: |
|
FYI, shadow-newpid.patch no longer builds with kpatch v0.7.0, specifically 4f4870d ("create-diff-object: Don't allow jump labels"): but as the crash was in the patched proc_pid_status() and not do_exit(), I suspect that jump labels may not be the real culprit. |
|
I can confirm the proc_pid_status() panic is fixed by #1007. |
|
Nice, thanks for verifying @jpoimboe! |
c8535ea to
0bd41c9
Compare
|
For context:
The rest of the patches is just the actual rebase and silencing issues that won't be solved right now. |
When these from internal depths of Red Hat upstream paths changed and now we are one level deeper in directory tree. The issue probably also exist in rhel8.0 rebase pr dynup#993. Signed-off-by: Artem Savkov <asavkov@redhat.com>
joe-lawrence
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks mostly good (with a few nits noted) @jpoimboe could you sanity check the symbol correlation changes, in particular the ("kpatch-build: Look for local static variables in child functions") commit? It reads correct to me, but you know that part of the code better than myself.
kpatch-build/create-diff-object.c
Outdated
| { | ||
| parent->nb_children++; | ||
| parent->children = realloc(parent->children, | ||
| sizeof(*parent->children) * parent->nb_children); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A nit, but realloc() could fail so we should catch that like other allocation failures.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, will update.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It also now needs to be freed at some point.
kpatch-build/create-diff-object.c
Outdated
|
|
||
| if (!sym->parent) | ||
| ERROR("failed to find parent function for %s", sym->name); | ||
| if (!parent) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we missing ERROR("failed to find parent function for %s", parent) here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, seems I messed my patch splitting, thanks for spotting this!
I'll update as well.
| @@ -1,3 +1,5 @@ | |||
| Disabled due to https://github.com/dynup/kpatch/issues/940 | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps a follow up commit could rework these tests to avoid functions that use jump labels? I realize that's ignoring the problem, but would let us continue to test read-mostly and shadow variables in the meantime.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can have a look at it, but what might end up happening is that the test becomes completely different (patching other functions or maybe files) than the similar test in fedora/rhel folders. Which might be confusing if we keep the same name.
Perhaps we can keep this one disabled and add a second test "data-read-mostly-2.patch" that would (hopefully) work on RHEL-8 with the current state of kpatch, by avoiding to modify functions with jump labels.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right, creating a newly named test file would be less-confusing. Anyway, that was just an idea for a follow up PR if you wanted to explore that, not needed for this one.
| { | ||
| const unsigned char *b = buf; | ||
| @@ -2383,6 +2383,12 @@ static ssize_t n_tty_write(struct tty_struct *tty, struct file *file, | ||
| return (b - buf) ? b - buf : retval; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note for future work: we should probably explain this in the patch author guide.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note for future work: we should probably explain this in the patch author guide.
I'm working on it now, I'll post up a PR with author guide changes soon.
| #!/bin/bash | ||
|
|
||
| SCRIPTDIR="$(readlink -f $(dirname $(type -p $0)))" | ||
| ROOTDIR="$(readlink -f $SCRIPTDIR/../..)" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, will update.
jpoimboe
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be helpful if each commit description mentioned the patch which triggers the behavior.
Also we might want to consider adding unit tests for some of these issues.
Also it might make sense to split this up into several PRs. It's not clear to me how some of these fixes are related to the integration tests. We usually try to do one logical change per PR. So maybe the integration tests should be their own PR, and the other fixes should be in another (or maybe even multiple other) PRs.
kpatch-build/create-diff-object.c
Outdated
| * subfunctions. kpatch_detect_child_functions detects such subfunctions and | ||
| * subfunctions. Some functions can also be split into multiple *.part | ||
| * functions. | ||
| * kpatch_detect_child_functions detects such subfunctions and |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure a ".part" function can be considered a "child". In the context of this code, a child (or subfunction) means that it's a ".cold" extension of the parent function, where the parent function can branch to the child subfunction.
But in the case of ".part", I believe it's an optimized clone of the original function. But otherwise the functions have no relation -- i.e. they are independent and the parent doesn't branch to the clone.
Or maybe I'm misunderstanding. Was there a patch which showed this behavior?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So the initial motivation for this change was a failure on the test gcc-static-local-var-5. The failure was due to a static local variable "lock" in function audit_log_end() being moved in the patched object into audit_log_end.part.17 .
One particular thing in the patched object is that both the audit_log_end and audit_log_end.part.* symbols exists, while the original only has audit_log_end. So the original audit_log_end is twined with the patched audit_log_end and static local references are only looked in the patched section related to audit_log_end and kpatch misses the fact that a reference was moved to the .part. section.
So, maybe the "child" concideration is not correct. But there needs to be a link between .part. functions and their original symbol (when it is also present on the patched object, as most of the time I've noticed it isn't) and the static variable correlation needs to be able to follow that link when looking for relocations.
Should I create a separate list of "parts" of a symbol? Other suggestions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm. Does the patched audit_log_end() also reference the "lock" static local variable? IIRC, "part" functions are clones of the originals. So it seems like audit_log_end() should also reference the static local, unless I'm missing something.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, it doesn't have a reference to the lock. audit_log_end() almost only has a jump to audit_log_end.part in this case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah... I think I had a fundamental misunderstanding of this optimization. Looking at gcc/ipa-split.c in the GCC source confirms that. I had been thinking this was more like the constprop and isra optimizations.
So I take back my original comment! It's very similar to the .cold optimization and I think the parent/child relationship makes sense after all.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for having a look!
So how do you suggest I proceed with this PR? Split it in two, one with the test addition and patch modifications and the other with the create-diff-object changes?
Any suggestion for which one should be base on the other?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we don't have a CI running these yet, I think I'd suggest doing the integration tests first, in one PR. That way the other fixes can refer to them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense, I'll try sending that later today after rebasing on master and checking there aren't new issues!
Tests intentionaly disabled should be skipped by multiple.test Signed-off-by: Julien Thierry <jthierry@redhat.com>
Disabled patches won't trigger a build, but the combined load test will still attempt to run their associated LOADED.test script. The combined test will fail due to voluntarily disabled tests. Do not run tests scripts associated with disabled tests. Signed-off-by: Julien Thierry <jthierry@redhat.com>
Rebase the integration test cases on top of RHEL 8.0 kernel version
4.18.0-80.el8.
Suggested-by: Joe Lawrence <joe.lawrence@redhat.com>
Signed-off-by: Kamalesh Babulal <kamalesh@linux.vnet.ibm.com>
[JT: adapt data-new-LOADED to new meminfo format,
use common template for multiple.test]
Signed-off-by: Julien Thierry <jthierry@redhat.com>
Jump labels are unsupported, so tests modifying functions using them are expected to fail. So disable them, for now... Signed-off-by: Julien Thierry <jthierry@redhat.com>
On ppcle64, test gcc-static-local-var-4 impacts a jump label reference which is currently unsupported. Signed-off-by: Julien Thierry <jthierry@redhat.com>
new-function test fails on ppc64le with the following message:
create-diff-object: ERROR: n_tty.o: kpatch_no_sibling_calls_ppc64le: 3445: Found an unsupported sibling call at n_tty_write()+0x20. Add __attribute__((optimize("-fno-optimize-sibling-calls"))) to n_tty_write() definition.
Add the suggested attribute, as was done for rhel-7.[5-7] versions of
the test.
Signed-off-by: Julien Thierry <jthierry@redhat.com>
0bd41c9 to
0c96fd0
Compare
|
After the last update the following errors are withstanding:
This can be fixed with the following PR: x86:
This can be fixed with the following PR: |
sm00th
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see gcc-static-local-var-5 failing on x86_64 with:
ERROR: audit.o: reference to static local variable lock.65414 in audit_log_end was removed
And macro-printk on ppc64le with:
ERROR: fib_semantics.o: object size mismatch: __msg.66237
(which I guess is caused by one of recent merges)
But I think it will be easier to deal with those in a separate PR since this one is already hard to review, but it has improved drastically from where we started. Thank you.
Both of these issues are expected (in the current state) and are fixed by the pending PRs #1054 and #1053 |
Rebase the tests to the closet upstream kernel to RHEL 8.0, as per
https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/8/html-single/8.0_release_notes/index#overview
it is kernel v4.18. So rebase it on top of upstream v4.18.
Signed-off-by: Kamalesh Babulal kamalesh@linux.vnet.ibm.com