Thursday, March 16, 2017

PSOD: #PF Exception 14 in world - On Cisco UCS with Intel Xeon Processor E5 v4, E7 v4 Family Processors (Broadwell)

Ooooh the dreaded purple screen of death on one of our Cisco UCS B-Series Blades.....

#PF Exception 14 in world....


The ESXi host logs showed a BUNCH of Memory Check Exceptions (MCEs)

2017-02-25T05:21:34.950Z cpu8:24267427)MCE: 242: cpu8: bank8: MCA recoverable error (CE): "Memory Controller Read Error on Channel 1."
2017-02-25T05:21:35.000Z cpu16:35591)MCE: 242: cpu16: bank8: MCA recoverable error (CE): "Memory Controller Read Error on Channel 1."
2017-02-25T05:21:35.250Z cpu17:19971329)MCE: 242: cpu17: bank8: MCA recoverable error (CE): "Memory Controller Read Error on Channel 1."
2017-02-25T05:21:35.000Z cpu16:35591)MCE: 242: cpu16: bank8: MCA recoverable error (CE): "Memory Controller Read Error on Channel 1."
2017-02-25T05:21:35.250Z cpu17:19971329)MCE: 233: cpu17: bank8: status=0xcc00024000010091: (VAL=1, OVFLW=1, UC=0, EN=0, PCC=0, S=0, AR=0), ECC=no, Addr:0x406cc600 (valid), Misc:0x3c5c27b940 (valid)
 2017-02-25T05:21:35.250Z cpu17:19971329)MCE: 242: cpu17: bank8: MCA recoverable error (CE): "Memory Controller Read Error on Channel 1."

Apparently, there is a known issue with Intel Xeon Processor (Broadwell) E5 v4, E7 v4 and D-1500 processors.  Symptoms include OS crashes with a signature pointing to internal parity errors, PF, DG or UD exceptions.

https://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=2146388


In my case,  I was using a Cisco UCS B-Series with an Intel Xeon E5-2697 v4 processor...  To resolve this issue, the hardware firmware must be upgraded to 3.1(2b).

Here's a link to the Cisco Bug report (Cisco Log In Required)


No comments:

Post a Comment