About two months ago, after our brand new and shining disaster recovery redundant infrastructure was fully installed, up and running with already a few virtual servers running to test the performance of the environment, I was happily playing in this sandbox through vCenter for virtual infrastructure.
Suddenly I shouted in horror as I noticed one of the High Availability nodes went down. Pinging didn’t help, nor SSH, so I went to the server room and checked the ESX eventlog console … there was a not so nice KERNEL PANIC showing up in there, preceded by various scsi failed commands.
Our SAN was okay, and so were the other ESX servers, so I decided to restart the faulty ESX …
Upon restart, I noticed that the integrated HP Smart Array p410i controller was reporting lockup error code 0xAB.
A few power cycles later, our fuzzy-logic Smart Array controller finally accepted to work again. I let the ESX server boot up, and I decided to investigate this. After some rather thorough information search on HP tech support, I finally managed to dig the latest firmware version for the controller.
Firmware version 2.50 indicates that it fixes default firmware (1.66) bug : Fix for lockup error code 0xAB, seen when controller is configured in Zero
Memory Mode (no cache module installed). Patching was successfully applied to this server, then on all other identical ESX servers from this cluster.
You can download the 2.50 firmware here : http://h20000.www2.hp.com/bizsupport/TechSupport/SoftwareDescription.jsp?swItem=MTX-b6f5f2f6e2e54f7baf29806dfe&taskId=135&swLang=8&lang=en&cc=us&idx=2&mode=4&
Now I don’t get why HP didn’t update this firmware before selling it. Ok, you will tell me that HP always includes a CD with the latest updates for its hardware … You’ve made a point … But these folks at HP had forgotten to include this critical firmware in their latest CD. Not really professional from them, and this could have led to serious operational losses. Our vendor assisted me in patching the machines, I had to provide the firmware on a separate USB stick as the very latest HP firmware CD wasn’t including it .. Again.
Well, just a little rant and hopefully sharing some valid information with you fellow sysadmins.
Take care and I’ll be back soon with two articles: one about Paessler Monitoring, the other about manually adding redundance to your vSwitches in ESX, in case all else fails.