Broadwell ESXi 6.0 Exception 14 PSOD and Lenovo support fail

Share this:

You may already have heard about known issue with Intel Broadwell Family CPUs (Intel Xeon CPU E5-26xx v4 CPUs)  where you can hit Exception 14 PSOD if you are running older microcode.

If not, you can read about it in kb2146388

From there you can clearly see you need to be running CPU microcode 0x0B00001B or newer to protect yourself from it.

After I read about it, I knew we could be potentially impacted by this, as we do have two servers using Broadwell CPUs, so I checked microcode revision which we have.

If you don’t know you how to check CPU microcode version, just ssh into your ESXi and run following commands:

vsish

cat /hardware/cpu/CPU/0

You will get following information, from which you can also see microcode information:

APIC ID:0x00000000
Core:0
Package:0
Node:0
Number of microcode updates:0
Original Revision:0x0b000014
Current Revision:0x0b000014

As you can see mine was older, therefore I proactively contacted Lenovo support to get me a fix before we hit PSOD and have potential outage (I do believe correct approach would be opposite – Vendor should be the one contacting you and providing fix for such issues).

Their response was:

If the client is only querying the KB? please bear in mind that our “change history” does not include every fix listed. Support for these CPU’s was added to UEFI version 2.00: Version 2.00 – BuildID: C4E122S —————————————————————————— Problem(s) Fixed: Enhancements: – Support Intel Broadwell Processors If the nodes are running 2.11 then I see no issue with installing the OS Thanks

So you would assume you are safe, anyway I reminded them we are running older microcode, to double check.

Guess what happened two days later, meanwhile they were checking (maybe)?

PSOD with Exception 14:

exception14

If you are running Lenovo System x servers with Broadwell CPUs even with the latest uEFI and older microcode revision, I suggest you to open a case  and bombard them and same would be for another vendor too.

So indeed I uploaded them all logs including coredump. And they are still asking to open VMware Case as well, as they are claiming “Support for Broadwells was added in the previous uEFI”. Well I don’t care about support as I can boot the server, I do care about CPU microcode revision.

So let’s see how long it will take to provide fix for us.

Just a little note: you can also upgrade Intel CPU Microcode by yourself as VMware has a way to do it and Intel’s  microcodes are publicly available, however you it is always better to get it from your vendor to be sure you are running supported configuration.

I didn’t want to be mentioning Lenovo here, as the issue is with Intel CPUs, however their ignorance caused us outage and people should be aware about it before considering them. This is not a first time I had problems with them, when approaching with something new and  I may write later about it….

I’m interested to know, if other vendors already updated their firmwares or still waiting for customers to reach out first. So if you know just leave a comment. Thanks


Update August 30, 2016 : VMware support just confirmed my suspicion and suggested to upgrade microcode. What a surprise 🙂

Update August 31, 2016 : My case got escalated higher and Lenovo support acknowledged the issue, it is supposed to be fixed in Septembers uEFI release, it took them “only” 8 days!

Update November 16, 2016 : new uEFI was released on September 28, 2016

 

The following two tabs change content below.
With over 12 years of experience in the Virtualization field, currently working as a Senior Consultant for Evoila, contracted to VMware PSO, helping customers with Telco Cloud Platform bundle. Previous roles include VMware Architect for Public Cloud services at Etisalat and Senior Architect for the VMware platform at the largest retail bank in Slovakia. Background in closely related technologies includes server operating systems, networking, and storage. A former member of the VMware Center of Excellence at IBM and co-author of several Redpapers. The main scope of work involves designing and optimizing the performance of business-critical virtualized solutions on vSphere, including, but not limited to, Oracle WebLogic, MSSQL, and others. Holding several industry-leading IT certifications such as VCAP-DCD, VCAP-DCA, VCAP-NV, and MCITP. Honored with #vExpert2015-2019 awards by VMware for contributions to the community. Opinions are my own!

About Dusan Tekeljak

With over 12 years of experience in the Virtualization field, currently working as a Senior Consultant for Evoila, contracted to VMware PSO, helping customers with Telco Cloud Platform bundle. Previous roles include VMware Architect for Public Cloud services at Etisalat and Senior Architect for the VMware platform at the largest retail bank in Slovakia. Background in closely related technologies includes server operating systems, networking, and storage. A former member of the VMware Center of Excellence at IBM and co-author of several Redpapers. The main scope of work involves designing and optimizing the performance of business-critical virtualized solutions on vSphere, including, but not limited to, Oracle WebLogic, MSSQL, and others. Holding several industry-leading IT certifications such as VCAP-DCD, VCAP-DCA, VCAP-NV, and MCITP. Honored with #vExpert2015-2019 awards by VMware for contributions to the community. Opinions are my own!
Bookmark the permalink.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.