My last post was a diatribe about the horrible support experience that I had with VMWare on this issue. It provided the solution, but I figured I would write a more pointed and detailed explanation.
The errors we were getting when trying to upgrade one of our VMWare ESX 3.5 hosts to VMWare vSphere 4 were as follow:
Error in Host Update Utility:
Grub update failed
Error in vua.log:
grub> find /esx4-upgrade/vmlinuz
Error 15: File not found
info: END grub output
error: grub cannot find root hd number
After many months of working with VMWare on this issue, I still did not have a good explanation of what the grubupdate process was or what might be causing it to fail. I got sick of constantly attempting the upgrade process at the request of VMWare even though there had been no change or very insignificant changes to the system. So, I started to look at the grub files more closely and compare them to servers that upgrade successfully.
The first attempt I made to correct the issue was to re-install ESX 3.5 while maintaining the existing datastores. I did this because I did not have a /var/log partition. I just had a /var partition with a log folder. The reason I thought this might be the problem is that the vSphere 4.0 upgrade always creates a /var/log partition for the ESX 3.5 failover install that you can use to boot 3.5. Anyway, this did not fix the problem.
After some more research, I noticed that all of my other servers that had been successfully upgraded had the following line in the grub.conf:
kernel /vmlinuz-version ro root=/dev/sda2
The server that was failing had the following line:
kernel /vmlinuz-version ro root=/dev/sda7
Well, I noticed sda2 on the upgraded servers was a primary partition and sda7 on the failing server was an extended partition. I hypothsized that vSphere 4 requires you to have your system partition on a primary partition. Once again, I re-installed 3.5 (maintaining the existing datastores) making sure that I installed the boot and system partitions as primary partitions and then the upgrade was successful.
If my hypothesis is true (just because it worked for me does not totally confirm my hypothesis), I cannot believe that this is not documented in the upgrade docs and that tech support was not able to help me find a solution. Anyway, I said enough about that in my previous post.