Skip to content

Conversation

@marc-hb
Copy link
Collaborator

@marc-hb marc-hb commented Oct 15, 2021

2 commits:

  • move all the code to functions. Zero functional change
  • Don't inspect kernel boot logs on ADL.

You really want to review the commits separately

Rationale in thesofproject#740

Signed-off-by: Marc Herbert <marc.herbert@intel.com>
Having plaforms still under development in PR testing is already a
stretch, let's not push it too far.

The kernel boot logs should still be availble.

This will avoid recurring issues like thesofproject#767,

- *ERROR* bcs'0 reset request timed out: {request: 00000001, RESET_CTL:
https://sof-ci.01.org/sofpr/PR4777/build10384/devicetest/?model=ADLP_RVP_SDW&testcase=verify-kernel-boot-log

- [    3.262181] kernel: cpufreq: cpufreq_online: Failed to initialize policy for cpu: 0 (-19)
in daily tests,

Others in 9ba1215, etc.

Signed-off-by: Marc Herbert <marc.herbert@intel.com>
@marc-hb
Copy link
Collaborator Author

marc-hb commented Oct 15, 2021

[    3.682601] kernel: cpufreq: cpufreq_online: Failed to initialize policy for cpu: 0 (-19)
[    3.682773] kernel: cpufreq: cpufreq_online: Failed to initialize policy for cpu: 1 (-19)
[    3.683370] kernel: cpufreq: cpufreq_online: Failed to initialize policy for cpu: 2 (-19)
[    3.683936] kernel: cpufreq: cpufreq_online: Failed to initialize policy for cpu: 3 (-19)

seen at
https://sof-ci.01.org/sofpr/PR4844/build10721/devicetest/?model=ADLP_RVP_SDW&testcase=verify-kernel-boot-log
thesofproject/sof#4849 and others

Also spotted: [ 3.196398] kernel: e1000e 0000:00:1f.6: The NVM Checksum Is Not Valid

@marc-hb marc-hb marked this pull request as ready for review October 15, 2021 04:56
@marc-hb marc-hb requested a review from a team as a code owner October 15, 2021 04:56
@marc-hb
Copy link
Collaborator Author

marc-hb commented Oct 15, 2021

In https://sof-ci.01.org/softestpr/PR786/build899/devicetest , verify-kernel-boot-log.sh was skipped on ADLP_RVP_SDW as expected and the kernel logs are still available in the "dmesg" tab.

Copy link
Contributor

@fredoh9 fredoh9 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea

@keqiaozhang
Copy link
Contributor

keqiaozhang commented Oct 15, 2021

This cpufreq warnings are ADLP new CPU specific errors. I upgraded the new CPU on one ADLP-RVP-SDW(sh-adlp-rvp-sdw-02) and one ADLP-RVP-NOCODEC(sh-adlp-rvp-nocodec-02) yesterday and the warnings started after that.

@marc-hb
Copy link
Collaborator Author

marc-hb commented Oct 15, 2021

When I begged to catch all kernel errors and exclude specific ones a couple years ago, many people said "it's not going to work, too many random kernel errors especially with new platforms". We had to exclude many USB and other errors[*] but it mostly worked... except for new platforms indeed like ADL that keep spitting new random errors every time there is a kernel upgrade, or BIOS upgrade, or hardware upgrade, then the error goes away on the next upgrade and is replaced by new one(s). But no one ever cleans up our giant ignore_str. This is very distracting because it makes PR testing red all the time.

We can remove this SKIP around the time the products are out.

Note even with the SKIP the kernel logs are still collected and available.

[*] https://github.com/thesofproject/sof-test/issues?q=+label%3A%22area%3Anon-audio+Failure%22+ - 70 issues+PRs

@marc-hb marc-hb merged commit e1b364a into thesofproject:main Oct 18, 2021
@marc-hb marc-hb deleted the skip-adl-boot branch October 18, 2021 16:07
@marc-hb
Copy link
Collaborator Author

marc-hb commented Oct 20, 2021

Let's see if we can revert this the next time we have a big kernel "backmerge"

@XiaoyunWu6666
Copy link
Contributor

are we good to revert e1b364a now? @marc-hb The problem has gone

@marc-hb
Copy link
Collaborator Author

marc-hb commented Feb 22, 2022

The problem has gone

Maybe one error has gone but here's the first ADL platform and daily run I looked at :

10402?model=ADLP_BRYA_SDW&testcase=verify-kernel-boot-log

Start Time: 2022-02-21 22:27:31 UTC
Kernel Branch: topic/sof-dev
Kernel Commit: 31f2b481
SOF Branch: main
SOF Commit: 98ff79ea0c4b

[    6.855933] kernel: i2c_designware i2c_designware.5: i2c_dw_handle_tx_abort: lost arbitration
[    6.857012] kernel: i2c_designware i2c_designware.5: i2c_dw_handle_tx_abort: lost arbitration
[    6.857816] kernel: i2c_designware i2c_designware.5: i2c_dw_handle_tx_abort: lost arbitration
[    6.858513] kernel: i2c_designware i2c_designware.5: i2c_dw_handle_tx_abort: lost arbitration
[    7.080407] kernel: Setting dangerous option force_probe - tainting kernel
[    7.203199] kernel: i915 0000:00:02.0: vgaarb: deactivate vga console
[    7.275530] kernel: i915 0000:00:02.0: vgaarb: changed VGA decodes: olddecodes=io+mem,decodes=io+mem:owns=io+mem
[    7.280462] kernel: i915 0000:00:02.0: Direct firmware load for i915/adlp_dmc_ver2_12.bin failed with error -2
[    7.280503] kernel: i915 0000:00:02.0: [drm] Failed to load DMC firmware i915/adlp_dmc_ver2_12.bin. Disabling runtime power management.
[    7.280517] kernel: i915 0000:00:02.0: [drm] DMC firmware homepage: https://git.kernel.org/pub/scm/linux/kernel/git/firmware/linux-firmware.git/tree/i915
[    7.698527] kernel: i915 0000:00:02.0: Direct firmware load for i915/adlp_guc_62.0.3.bin failed with error -2
[    7.698574] kernel: i915 0000:00:02.0: [drm] GuC firmware i915/adlp_guc_62.0.3.bin: fetch failed with error -2
[    7.698593] kernel: i915 0000:00:02.0: [drm] GuC firmware(s) can be downloaded from https://git.kernel.org/pub/scm/linux/kernel/git/firmware/linux-firmware.git/tree/i915
[    7.702930] kernel: i915 0000:00:02.0: GuC initialization failed -2
[    7.702948] kernel: i915 0000:00:02.0: Enabling uc failed (-5)
[    7.702959] kernel: i915 0000:00:02.0: Failed to initialize GPU, declaring it wedged!

Still too much of a mess... What would we get from failing this test all the time?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants