-
Notifications
You must be signed in to change notification settings - Fork 59
use journalctl as default kernel log reader #509
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't spend a lot of time on every line yet but this looks OK. However and as already requested in previous PRs: please submit independent bugs fixes and cleanups in other PRs. Either in PR(s) before this switch to journalctl, or in other PRs after this switch to journalctl but just not in the same PR.
This switch is a small number of lines but with a very high impact so it needs to be integrated alone so it can very easily be tested (I will test it locally with fake_kern_error), observed for a few days in CI and very easily reverted if needed. I'm running out of ways to repeat myself.
|
|
||
|
|
||
| begin_line=${1:-1} | ||
| begin_timestamp=${1:-0} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I already submitted this in September in PR #366 but something was wrong at the time. Is it OK now and why?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because I drop the usage for line number now. And with my comments below
# confirm begin_timestamp is in UNIX timestamp format, otherwise use dmesg
We will always fallback to use demsg if no checkpoint is provided or wrong format checkpoint is provided
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We will always fallback to use demsg if no checkpoint is provided or wrong format checkpoint is provided
The only user of this script is ourselves, so to reduce complexity a lot let's just fix the user instead of providing a fallback. Fail here when some test does not provide a valid checkpoint and then calls this script, which seems very wrong!? Let's fix all these broken tests and make them consistent before switching the checkpoint to journalctl
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought we have an agreement in #468 (comment)
@marc-hb Thanks for the tips, I am also think about how to spilt the patches. Please help to check my plan:
- replace LINE with TIMESTAMP, log check with no
--prioritybut still witherr_str- quoting clean up for the
$KERNEL_LAST_TIMESTAMP- remove
err_strand use--priority, add newignore_str- clean up test case to remove duplicate kernel log check.
Now we have done half of the step 1 by changing the LINE into CHECKPOINT in #499. Now I am not sure if I can do clean up in step 4 before we change to use journalctl with use --priority, it would bring more unrelated error in our test now.
Or we can do something more simpler, merge a big PR #468 I have already made and tested and fix bugs if we find any.
Any suggestions here @marc-hb @aiChaoSONG @plbossart @mengdonglin
I am not sure what we are reviewing here. I do not see any feedback about codes or implementation. I only see how to spilt patches. For such big implementation change, I am not sure if small steps can make thing easier because I need to do some un-needed migration clean up for each small change. EG: @marc-hb The fallback here is one of those things I do not need to have in big PR. But I have to keep it here for migration.
If we do not make progress before my vacation next week. I will use my super power to merge all the change in one PR.
| #kernel_buffer_size=$(du -k /var/log/kern.log |awk '{print $1;}') | ||
| # now decide using which to catch the kernel log | ||
| #[[ $kernel_buffer_size -lt $dmesg ]] && cmd="dmesg" || cmd="tail -n +${begin_line} /var/log/kern.log" | ||
| # confirm begin_timestamp is in UNIX timestamp format, otherwise use dmesg |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add a comment explaining why we still need to support a different begin_timestamp format.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not think we need a different begin_timestamp format. What I mean here is to keep the fallback we used to have to dmesg now to cover the small migration steps.
| #[[ $kernel_buffer_size -lt $dmesg ]] && cmd="dmesg" || cmd="tail -n +${begin_line} /var/log/kern.log" | ||
| # confirm begin_timestamp is in UNIX timestamp format, otherwise use dmesg | ||
| journalctlflag="-k -q --no-pager --utc --output=short-iso --no-hostname" | ||
| [[ $begin_timestamp =~ ^[0-9]{10} ]] && cmd="journalctl $journalctlflag --since=@$begin_timestamp" || cmd="dmesg" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why stick to dmesg? Its output looks very different. Why not just drop the --since= parameter?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
7cecccf to
17e77ef
Compare
add some style options for the journalctl to unify all journalctl output format Signed-off-by: Pan Xiuli <xiuli.pan@linux.intel.com>
Directly use journalctl in the testcase to avoid kernel log missing issues. Signed-off-by: Pan Xiuli <xiuli.pan@linux.intel.com>
use journalctl by default for any kernel log reader usage. Also add headter for the file. Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com> Signed-off-by: Pan Xiuli <xiuli.pan@linux.intel.com>
Add path for the sof-dump-status.py to avoid error: sof-dump-status.py: command not found Signed-off-by: Pan Xiuli <xiuli.pan@linux.intel.com>
use journalctl to get kernl logs. use timestamp as checkpoint to split logs for each test case. Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com> Signed-off-by: Pan Xiuli <xiuli.pan@linux.intel.com>
17e77ef to
cdeb363
Compare
|
updated and rebased @marc-hb @aiChaoSONG Ping for review. |
marc-hb
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, now I understand that there is some "dmesg fallback" for some "legacy users". Not sure what these are but please fix the legacy users first so we don't need any fallback and can keep the code simple.
| # TODO: clean up Sub-Test feature | ||
| alias | grep -q 'Sub-Test' || DMESG_LOG_START_LINE=$(sof-get-kernel-line.sh | tail -n 1 | awk '{print $1;}' ) | ||
|
|
||
| cmd="journalctl_cmd" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not 100% sure this has to be in the same PR than the rest.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I named this PR as use journalctl as default kernel log reader, so I think put all change there is OK.
I split them into different commit. If you want single commit in one PR I am OK with that.
|
|
||
|
|
||
| begin_line=${1:-1} | ||
| begin_timestamp=${1:-0} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We will always fallback to use demsg if no checkpoint is provided or wrong format checkpoint is provided
The only user of this script is ourselves, so to reduce complexity a lot let's just fix the user instead of providing a fallback. Fail here when some test does not provide a valid checkpoint and then calls this script, which seems very wrong!? Let's fix all these broken tests and make them consistent before switching the checkpoint to journalctl
Use timestamp as checkpoint for each test case.
use journalctl to read kernel logs in sof-kernel-log-check.sh
use journalctl in other kernel read scripts.