Skip to content

HBRT infinite loop on ECC error during startup #67

@ghost

Description

Start opal-prd and observe this log before opal-prd gets stuck at 100% CPU.

HBRT: PRDF:>>PRDF::main() Global attnType=0004
HBRT: PRDF:>>PRDF::noLock_initialize() 
HBRT: PRDF:>>PegasusConfigurator::build()
HBRT: PRDF:<<PegasusConfigurator::build()
HBRT: PRDF:<<PRDF::noLock_initialize() 
HBRT: ERRL:>>ErrlManager::ErrlManager constructor.
HBRT: ERRL:iv_hiddenErrorLogsEnable = 0x0
HBRT: ERRL:>>setupPnorInfo
HBRT: PNOR:>>RtPnor::getSectionInfo
HBRT: PNOR:>>RtPnor::readFromDevice: i_offset=0x0, i_procId=0 sec=11 size=0x20000 ecc=1
HBRT: PNOR:RtPnor::readFromDevice: removing ECC...
HBRT: PNOR:RtPnor::readFromDevice> Uncorrectable ECC error : chip=0,offset=0x0

(at which point everything stops with opal-prd chewing 100% CPU)

Which ends up being a fairly classic race in trying to log an error before everything has been initialized.

Consequently, opal-prd spins a core and is right off into the weeds.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions