Skip to content

Conversation

@xiulipan
Copy link
Contributor

Use timestamp as checkpoint for each test case.
use journalctl to read kernel logs in sof-kernel-log-check.sh
use journalctl in other kernel read scripts.

@xiulipan xiulipan requested a review from a team as a code owner November 11, 2020 06:48
Copy link
Collaborator

@marc-hb marc-hb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't spend a lot of time on every line yet but this looks OK. However and as already requested in previous PRs: please submit independent bugs fixes and cleanups in other PRs. Either in PR(s) before this switch to journalctl, or in other PRs after this switch to journalctl but just not in the same PR.

This switch is a small number of lines but with a very high impact so it needs to be integrated alone so it can very easily be tested (I will test it locally with fake_kern_error), observed for a few days in CI and very easily reverted if needed. I'm running out of ways to repeat myself.



begin_line=${1:-1}
begin_timestamp=${1:-0}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I already submitted this in September in PR #366 but something was wrong at the time. Is it OK now and why?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because I drop the usage for line number now. And with my comments below
# confirm begin_timestamp is in UNIX timestamp format, otherwise use dmesg

We will always fallback to use demsg if no checkpoint is provided or wrong format checkpoint is provided

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We will always fallback to use demsg if no checkpoint is provided or wrong format checkpoint is provided

The only user of this script is ourselves, so to reduce complexity a lot let's just fix the user instead of providing a fallback. Fail here when some test does not provide a valid checkpoint and then calls this script, which seems very wrong!? Let's fix all these broken tests and make them consistent before switching the checkpoint to journalctl

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought we have an agreement in #468 (comment)

@marc-hb Thanks for the tips, I am also think about how to spilt the patches. Please help to check my plan:

  1. replace LINE with TIMESTAMP, log check with no --priority but still with err_str
  2. quoting clean up for the $KERNEL_LAST_TIMESTAMP
  3. remove err_str and use --priority, add new ignore_str
  4. clean up test case to remove duplicate kernel log check.

Now we have done half of the step 1 by changing the LINE into CHECKPOINT in #499. Now I am not sure if I can do clean up in step 4 before we change to use journalctl with use --priority, it would bring more unrelated error in our test now.

Or we can do something more simpler, merge a big PR #468 I have already made and tested and fix bugs if we find any.

Any suggestions here @marc-hb @aiChaoSONG @plbossart @mengdonglin
I am not sure what we are reviewing here. I do not see any feedback about codes or implementation. I only see how to spilt patches. For such big implementation change, I am not sure if small steps can make thing easier because I need to do some un-needed migration clean up for each small change. EG: @marc-hb The fallback here is one of those things I do not need to have in big PR. But I have to keep it here for migration.

If we do not make progress before my vacation next week. I will use my super power to merge all the change in one PR.

#kernel_buffer_size=$(du -k /var/log/kern.log |awk '{print $1;}')
# now decide using which to catch the kernel log
#[[ $kernel_buffer_size -lt $dmesg ]] && cmd="dmesg" || cmd="tail -n +${begin_line} /var/log/kern.log"
# confirm begin_timestamp is in UNIX timestamp format, otherwise use dmesg
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a comment explaining why we still need to support a different begin_timestamp format.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not think we need a different begin_timestamp format. What I mean here is to keep the fallback we used to have to dmesg now to cover the small migration steps.

#[[ $kernel_buffer_size -lt $dmesg ]] && cmd="dmesg" || cmd="tail -n +${begin_line} /var/log/kern.log"
# confirm begin_timestamp is in UNIX timestamp format, otherwise use dmesg
journalctlflag="-k -q --no-pager --utc --output=short-iso --no-hostname"
[[ $begin_timestamp =~ ^[0-9]{10} ]] && cmd="journalctl $journalctlflag --since=@$begin_timestamp" || cmd="dmesg"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why stick to dmesg? Its output looks very different. Why not just drop the --since= parameter?

Copy link
Contributor Author

@xiulipan xiulipan Nov 18, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@marc-hb We need to keep dmesg as some user still use dmesg way.
I had following patches to drop the dmesg, see #468
commits like 8db6f8c

add some style options for the journalctl to unify all journalctl
output format

Signed-off-by: Pan Xiuli <xiuli.pan@linux.intel.com>
Directly use journalctl in the testcase to avoid kernel log missing
issues.

Signed-off-by: Pan Xiuli <xiuli.pan@linux.intel.com>
use journalctl by default for any kernel log reader usage.
Also add headter for the file.

Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
Signed-off-by: Pan Xiuli <xiuli.pan@linux.intel.com>
Add path for the sof-dump-status.py to avoid error:
sof-dump-status.py: command not found

Signed-off-by: Pan Xiuli <xiuli.pan@linux.intel.com>
use journalctl to get kernl logs.
use timestamp as checkpoint to split logs for each test case.

Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
Signed-off-by: Pan Xiuli <xiuli.pan@linux.intel.com>
@xiulipan
Copy link
Contributor Author

updated and rebased @marc-hb @aiChaoSONG Ping for review.

Copy link
Collaborator

@marc-hb marc-hb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, now I understand that there is some "dmesg fallback" for some "legacy users". Not sure what these are but please fix the legacy users first so we don't need any fallback and can keep the code simple.

# TODO: clean up Sub-Test feature
alias | grep -q 'Sub-Test' || DMESG_LOG_START_LINE=$(sof-get-kernel-line.sh | tail -n 1 | awk '{print $1;}' )

cmd="journalctl_cmd"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not 100% sure this has to be in the same PR than the rest.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I named this PR as use journalctl as default kernel log reader, so I think put all change there is OK.
I split them into different commit. If you want single commit in one PR I am OK with that.



begin_line=${1:-1}
begin_timestamp=${1:-0}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We will always fallback to use demsg if no checkpoint is provided or wrong format checkpoint is provided

The only user of this script is ourselves, so to reduce complexity a lot let's just fix the user instead of providing a fallback. Fail here when some test does not provide a valid checkpoint and then calls this script, which seems very wrong!? Let's fix all these broken tests and make them consistent before switching the checkpoint to journalctl

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants