Just wanted to share some lessons learned from the upgrade/migration of one of our vCenter Servers.
In this scenario, we used the VMware Migration Assistant to go from a Windows Based vCenter 5.5 server using an external SQL DB to the vCSA 6.5.
**Make sure you read the Important Information KB, Upgrade Best Practices KB and the Release Notes (Links valid as of July 3rd 2017)**
Here are the challenges I encountered during my upgrade in chronological order:
- Some previously approved vSphere typologies have been deprecated. If necessary, reconfig your topology prior to upgrading to vSphere 6.5. https://kb.vmware.com/selfservice/search.do?cmd=displayKC&docType=kc&docTypeID=DT_KB_1_1&externalId=2147672
- Run the VMware Migration Assistant Pre-checks prior to your upgrade window to have time to resolve any potential issues. In my case, I had to remove the Update Manager extension. Prior to the upgrade. The Update Manager Service was setup on another Windows Server and since UM is now part of the vCSA, I decided to rebuild it from scatch on the vCSA. The set up was similar to perious versions.
- Confirm you do not have the same names used for Virtual Distributed Switches and Virtual Distibuted Portgroups. https://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=2147547
- Delete as many tasks, events and historical performance data from the VC DB as allowed by company policy. This will cut the conversion time and minimize errors. In my environment, I had to go down to 40 days to move forward in the upgrade. https://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=2110031 and https://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=2110031
- At this point, the upgrade was now going far enough where the vCenter Server was registering as version 6.5 in the PSC, then failing… The vpxd_firstboot.py_XXXXX_stdout.log showed the following failure when trying to re-run the upgrade.
- I was unable to upgrade, since it thought it was already at version 6.5. This required the use of jxplorer to delete the offending service prior to retrying the upgrade. Here's a link to resolve this issue: http://kenumemoto.blogspot.com/2017/05/vcenter-65-upgrade-problem-occurred.html
- After another failed upgrade, we were able to start the vCenter Service (vpxd) after initializing the embedded vPostgres DB... Essentially overwriting the entire contents of the DB… the problem seems to be pointing to the source vCenter DB. Given the complexity of our environment, starting from scratch was not an option.
- Ultimately, our vCenter 5.5 SQL vCenter DB was FTP'd to VMware so they were able to perform the upgrade and replicate the failure. The escalation team was able to perform the upgrade in verbose mode to monitor each step of the upgrade. They were able to successfully start the vpxd service after truncating the “vpx_field_val” table. This contained all the Custom Attribute info. When I saw the offending entries, they were employees who were no longer with the company. These entries were pointing to VMs that no longer existed, and this was halting the upgrade... for some reason, these orphaned entries were not properly removed from VC DB. No solid explanation was received, it could have been an un-graceful shutdown of the vCenter or sql server, iSCSI traffic issue, etc.
- Our SQL DBA truncated the table named "vpx_field_val"
- Prior to performing the upgrade the final time, I exported all the Custom Attributes to a .csv file. After the successful upgrade, I used powershell and powercli to re-inject the info into vcenter. (There was no way I was going to manually enter all those attributes for each of the 300+ VMs.)
- To confirm the status of the upgrade few the following log: /var/log/firstboot/firstbootStatus.json
- Add a Static DNS entry for your new vCSA. Since the vCSA is running on a version of linux, we can no longer leverage Windows Dynamic DNS updates and the existing entry will eventually age out.