-
Notifications
You must be signed in to change notification settings - Fork 42
Open
Description
After a random amount of time, the OCC locks up and downclocks the host if default Linux power saving modes are enabled. When this happens, not only is the machine slowed down but fan controls are completely disabled and a full host power-off reboot is required to restore functionality. opal-prd occ reset is sometimes able to restore proper clocks without a reboot but the fan controls remain offline without a full reboot.
Host dmesg:
[32809.485411] powernv-cpufreq: PMSR = 6363630040000000
[32809.520296] powernv-cpufreq: CPU Frequency could be throttled
[32844.323314] powernv-cpufreq: PMSR = 12d630000000000
[32844.356229] powernv-cpufreq: CPU Frequency could be throttled
BMC dmesg:
[390435.513335] sbefifo 00:00:00:06: SBE FFDC package len 9 words but only 6 remaining
[390435.521174] occ sbefifo1-dev0: SRAM attn returned failure status: 00fe000a
[390436.569249] sbefifo 00:00:00:06: SBE error cmd a4:04 status=00fe:000a
[390436.575975] sbefifo 00:00:00:06: SBE FFDC package len 9 words but only 6 remaining
[390436.583663] occ sbefifo1-dev0: SRAM attn returned failure status: 00fe000a
[390436.706403] sbefifo 01:01:00:06: SBE error cmd a4:04 status=00fe:000a
[390436.713044] sbefifo 01:01:00:06: SBE FFDC package len 9 words but only 6 remaining
[390436.720885] occ sbefifo2-dev0: SRAM attn returned failure status: 00fe000a
[390437.780615] sbefifo 01:01:00:06: SBE error cmd a4:04 status=00fe:000a
[390437.787314] sbefifo 01:01:00:06: SBE FFDC package len 9 words but only 6 remaining
[390437.795153] occ sbefifo2-dev0: SRAM attn returned failure status: 00fe000a
[390441.406828] sbefifo 01:01:00:06: SBE error cmd a4:04 status=00fe:000a
[390441.413412] sbefifo 01:01:00:06: SBE FFDC package len 9 words but only 6 remaining
[390441.421240] occ sbefifo2-dev0: SRAM attn returned failure status: 00fe000a
[390442.895970] sbefifo 01:01:00:06: SBE error cmd a4:04 status=00fe:000a
[390442.902534] sbefifo 01:01:00:06: SBE FFDC package len 9 words but only 6 remaining
[390442.910341] occ sbefifo2-dev0: SRAM attn returned failure status: 00fe000a
The following command, run at host start, completely prevents the issue from occurring, but is undesirable as the machines use significantly more power at idle:
echo 1 | tee /sys/devices/system/cpu/cpu*/cpuidle/state?/disable
Host Linux version: 4.19.0-5-powerpc64le
OCC GIT hash: 58e422d
Metadata
Metadata
Assignees
Labels
No labels