-
Notifications
You must be signed in to change notification settings - Fork 59
[RFC] use journalctl for kernel log #468
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
b690742 to
6787f71
Compare
6787f71 to
3763fbc
Compare
Directly use journalctl in the testcase to avoid kernel log missing issues. Signed-off-by: Pan Xiuli <xiuli.pan@linux.intel.com>
use journalctl by default for any kernel log reader usage. Also add header for the file. Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com> Signed-off-by: Pan Xiuli <xiuli.pan@linux.intel.com>
Add path for the sof-dump-status.py to avoid error: sof-dump-status.py: command not found Signed-off-by: Pan Xiuli <xiuli.pan@linux.intel.com>
use journalctl to get kernl logs. use timestamp to split logs for each test case. replace all test case to use the timestamp Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com> Signed-off-by: Pan Xiuli <xiuli.pan@linux.intel.com>
6f7c349 to
8d567e1
Compare
Add common error logs from systemd, initramfs, atkbd, i801_smbus, ata3, genirq, DMAR, dw_dmac, proc_thermal, SDWm, touch pad Those error level logs did not break functionality of the DUTs. Signed-off-by: Pan Xiuli <xiuli.pan@linux.intel.com>
remove the redundant kernel log check for all case. enable kernel log check for each iteration. Signed-off-by: Pan Xiuli <xiuli.pan@linux.intel.com>
remove the redundant kernel log check for all case. enable kernel log check for each iteration. Signed-off-by: Pan Xiuli <xiuli.pan@linux.intel.com>
remove the redundant kernel log check for all case. enable kernel log check for each iteration. Signed-off-by: Pan Xiuli <xiuli.pan@linux.intel.com>
remove the redundant kernel log check for all case. enable kernel log check for each iteration. Signed-off-by: Pan Xiuli <xiuli.pan@linux.intel.com>
…iteration remove the redundant kernel log check for all case. enable kernel log check for each iteration. also do some clean up for this script Signed-off-by: Pan Xiuli <xiuli.pan@linux.intel.com>
…teration remove the redundant kernel log check for all case. enable kernel log check for each iteration. Signed-off-by: Pan Xiuli <xiuli.pan@linux.intel.com>
remove the redundant kernel log check for all case. enable kernel log check for each iteration. Signed-off-by: Pan Xiuli <xiuli.pan@linux.intel.com>
…tion remove the redundant kernel log check for all case. enable kernel log check for each iteration. Signed-off-by: Pan Xiuli <xiuli.pan@linux.intel.com>
remove the redundant kernel log check for all case. enable kernel log check for each iteration. Signed-off-by: Pan Xiuli <xiuli.pan@linux.intel.com>
…eration remove the redundant kernel log check for all case. enable kernel log check for each iteration. Signed-off-by: Pan Xiuli <xiuli.pan@linux.intel.com>
8d567e1 to
91e4116
Compare
try to flush and sync journalctl to avoid empty logs. Signed-off-by: Pan Xiuli <xiuli.pan@linux.intel.com>
| # sof-kernel-log-check script parameter number is 0/Non-Number will force check from dmesg | ||
| sof-kernel-log-check.sh 0 || die "Catch error in dmesg" | ||
| # check kernel log for each iteration to catch issues | ||
| sof-kernel-log-check.sh $KERNEL_LAST_TIMESTAMP || die "Catch error in kernel log" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Caught error instead of Catch error
|
|
||
| sof-kernel-log-check.sh 0 || die "Catched error in dmesg" | ||
| # check kernel log for each iteration to catch issues | ||
| sof-kernel-log-check.sh $KERNEL_LAST_TIMESTAMP || die "Catch error in kernel log" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same comment here. lets fix it everywhere
marc-hb
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is fixing an extremely critical feature so this must be broken down in smaller pull requests and merged progressively = "continuous" integration. This is especially important considering the regular stream of other, unrelated CI changes and issues will not stop and will still need to be investigated and addressed at the same time. I mean we cannot stop everyone else (including you!) from making other CI fixes and improvements while performing this critical change. We also need the ability to quickly revert the last PR if anything goes wrong - and from past experience a number of things will go wrong.
-
As long as you use the
-kflags you do not need to modify theignore_strat the same time. The last thing we want is to change the implementation and the ignore_str at the same time. So let's forget about any ignore_str changes for now. -
The --priority flag must also be postponed.
-
The first PR to submit is the PR that renames
func_lib_setup_kernel_last_linetofunc_lib_setup_kernel_last_timestampand KERNEL_LAST_LINE to KERNEL_LAST_TIMESTAMP without making any real code change. That will first get a lot of changed lines out of the way with no risk. The name will not match the old implementation for the transition period but we don't care. -
The second PR to submit is quoting all the instances of $KERNEL_LAST_TIMESTAMP and then wait a day or two to make sure everything still works.
-
Then we can start flipping the switch with high impact but small number of lines PR.
Most importantly: what sort of (fake) errors did you test this with? Testing a change like this requires at least two throw-away PRs:
- one throw-away PR with strategically placed fake errors + old code
- a second throw-away PR with the exact same fake errors + the new code
The test results for both must be exactly the same.
| for i in $(seq 1 $loop_cnt) | ||
| do | ||
| # cleanup dmesg buffer for each iteration | ||
| sudo dmesg -c > /dev/null |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this still required?
|
@marc-hb Thanks for the tips, I am also think about how to spilt the patches. Please help to check my plan:
some discussion I want to have here:
Keep this PR for testing and I will start with task 1. |
|
close with latest part 1 PR #480 |
Much finer-grained, much better! For such a "core" change like this we really need "Totally Continuous" Integration: small, well tested changes, quick reviews, and then at least a couple days careful observation between merged PRs. See my request in #480.
We can talk about changing levels and filtering last, after everything else is in place. Then we should involve many more people. For now the priority is to switch to journalctl while preserving everything else exactly the same - and then find and fix everything that broke with just that before even tweaking the filtering. |
Still now after fixing all clock and timezone issues?
These options make absolutely no difference except if you use them a split second before of system crash or sudden power loss. We are not really dealing with crashes anyway so don't use them, they're just confusing.
BSW has been experiencing a lot of storage issues (see internal issue 539 + links), could that be the cause?
Please no, that sounds brittle and complex. Let's understand what the real issue is first and/or temporarily disable BSW in the mean time. |
Try to use journalctl to read kernel log from system.
Use UNIX style timestamp to split kernel logs from test case to test case.
Print same format like dmesg in journalctl kernel log.
Update all test case used to relay on demsg logs