Thursday, March 30, 2017

How to patch a standalone ESXi 6.5 host

We received an email from Homeland Security regarding a severe vulnerability in ESXi that could allow a guest to execute code on an ESXi host. VMSA-2017-0006

http://www.vmware.com/security/advisories/VMSA-2017-0006.html


Patching is swift and easy using VMware Update Manager.  However, I recently stood up a standalone vSphere Hypervisor 6.5 host for testing.

For the record, ESXi patches are cumulative.

Here are the steps I took to patch this host.

1. Download applicable patches (Log in required).
http://www.vmware.com/patchmgr/download.portal 

Edit:  On Jan 10th 2018, I had to use the following link:

2. Upload the patches into the local datastore of the host you wish to patch.   I placed them in a folder called "patches" in the datastore:

3. Place the host in Maintenance Mode.

4. Enable ESXi Shell/SSH and log into the server.

5.  Run the following command for each patch to be installed:
esxcli software vib install -d "/vmfs/volumes/Datastore/DirectoryName/PatchName.zip" 

6. Run the reboot command.
reboot

7. Confirm the patches have been installed by running:
vmware -vl

Or, by looking at the version in the client:

8. Disable ESXi Shell/SSH

9.  Exit Maintenance mode and confirm functionality.


vCenter 6.5 upgrade: A problem occurred while - Starting VMware Postgres...

While migrating from a Windows vCenter 5.5 server to vCSA 6.5, I received the following errors while running the Migration Assistant:

A problem occurred while - Starting VMware Postgres...

and

This is an unrecoverable error, please retry install. 



This occurred during Stage 2 of the process.  The resulting vm-support.tgz file included a vcdb_import.err file which contained the following error:

NOTICE:  constraint "fk_vpx_lic_auto_keys_entity_id" of relation "vpx_lic_auto_keys" does not exist, skipping
NOTICE:  constraint "fk_vpx_lic_auto_keys_asset_id" of relation "vpx_lic_auto_keys" does not exist, skipping
NOTICE:  drop cascades to constraint fk_vpx_dbm_counter_values on table vpx_dbm_counter_value
NOTICE:  constraint "fk_vpx_dbm_counter_values" of relation "vpx_dbm_counter_value" does not exist, skipping
ERROR:  unquoted carriage return found in data
HINT:  Use quoted CSV field to represent carriage return.
CONTEXT:  COPY vpx_event_arg, line 374050

This is a known issue referenced in the following KB: 

With the help of our DBA, we reduced the tasks and events tables down to 30 days.  

The procedure to purge old data is located here: 

This resolved the issue and we were able to continue with the upgrade.

An alternative is to NOT migrate the Events and Task Data.  Unfortunately,  this will also exclude the performance metrics from being migrated:

Monday, March 27, 2017

VMware VirtualCenter Server Service Hung at Starting (No such host is known)

After a gracefully rebooting our vCenter server, the VMware Virtual Center Service was hung in the "Starting" status.  A quick look at Event Viewer showed the very common Event ID 1000:



On a Windows Server 2012 R2 box, the vCenter logs are located in the following location:

C:\ProgramData\VMware\VMware Virtual Center\Logs\vpxd.log

The log showed the following warnings:

2017-03-24T19:42:30.743-05:00 [04304 warning 'Default'] Failed to resolve address; <resolver p:0x000000000xxxxx, 'mypscserver.mydomain.com:7444'>, e: system:11001(No such host is known)

2017-03-24T19:42:30.743-05:00 [04304 error 'HttpConnectionPool-000001'] [ConnectComplete] Connect failed to <cs p:000000000af76900, TCP:mypscserver.mydomain.com::7444>; cnx: (null), error: class Vmacore::SystemException(No such host is known)

We are currently upgrading our vSphere infrastructure to version 6.5.  One of the first steps performed was to upgrade our external 5.5 SSO server to a 6.5 PSC appliance.  It appears the PSC appliance did not register properly in DNS.  No ping replies...

After adding a static entry in DNS and allowing time for the entry to propagate, I was able to successfully start the vCenter service.

I then performed a graceful restart of the vCenter server to make absolutely sure the vCenter server starts successfully.

vCenter Server 6.5 Upgrade: This vCenter Server has extensions registered that cannot be upgraded to or may not work with the new vCenter Server.

AKA:
How to unregister / remove a vCenter extension.

While running the vSphere 6.5 VMware Migration Assistant I received the following error:

This vCenter Server has extensions registered that cannot be upgraded to or may not work with the new vCenter Server. 

In my case, the extensions were no longer being used.  So, I chose to just remove the offending extensions.

1.  Create a snapshot or backup of your vCenter server.
2.  Log into the Managed Object Browser (MOB).

https://myvcenterserver.mycompany.com/mob

3. Select Content.

4. Select ExtensionManager.

5. Capture the VALUE of the Extension you wish to remove.  (ex.com.vmware.vcHms):

6. Select UnregisterExtension

7. Enter the Value of the extension you wish to remove.  Then, select Invoke Method.  Confirm that  you receive the Result of Void.

8. Confirm the extension is no longer in the extension list.  If so, delete the VM snapshot created earlier.


Common vCenter extensions:

Extension NameService Description
com.vmware.vim.eamvSphere ESX Agent Manager
com.vmware.vim.inventoryservicevCenter Inventory Service
com.vmware.vim.lsLicensing Services
com.vmware.vim.smsVMware vCenter Storage Monitoring Service
com.vmware.vim.spsVMware vSphere Profile-drive Storage Service
com.vmware.vim.stats.reportPerformance charts built-in extension
com.vmware.vim.stats.vsmService Manager
cim-uivCenter Hardware Status
health-uivCenter Service Status
hostdiagInternal extension to declare diagnostic events from VMware Host systems
VirtualCenterVirtualCenter dynamic events and tasks
com.vmware.orchestratorVMware vRealize Orchestrator plugin (formerly known as VMware vCenter Orchestrator plug-in)
com.vmware.rbdAuto Deploy
com.vmware.syslogVMware Syslog Collector Configuration
com.vmware.vcDrVMware vCenter Site Recovery Manager Extension
com.vmware.vcHmsvSphere Replication Management (VRM)
com.vmware.vcIntegrityVMware vSphere Update Manager Extension
com.vmware.vShieldManagervShield Manager
vCloud Director-1vCloud Director
com.vmware.vcopsvRealize Operations Manager (formerly known as vCenter Operations Manager)
com.vmware.vadmVMware vRealize Infrastructure Navigator (formerly known as vCenterInfrastructure Navigator)
com.vmware.vdpvSphere Data Protection 5.1
com.vmware.vdp2
com.vmware.vdp2.config
vSphere Data Protection 5.5/5.8
com.vmware.vsan.healthvSAN Health Check Plug-in
com.vmware.heartbeattasksvCenter Server Heartbeat
com.vmware.hbwcvCenter Server Heartbeat
com.vmware.heartbeatvCenter Server Heartbeat
com.neverfail.heartbeatvCenter Server Heartbeat

vCenter Server Database: How to Change the Driver used by the System DSN

While running the Vmware Migration Assistant Prechecks, I was prompted with the following error:

Unsupported database driver: C:\WINDOWS\system32\msodbcsql11.dll

It appears the System DSN used to connect to our vCenter Server DB was running the ODBC Driver 11 for SQL Server.  To resolve this warning, we need to change the driver used for the DSN to SQL Server Native Client 11.0.

To change the driver used by the System DSN perform the following steps:

1. Capture all the information regarding the existing DSN.  (DSN name, SQL server, SQL Server login info, Default DB etc)

2. Create a snapshot of the vCenter server and DB.

3.  Install the SQL Server Native Client 11.0. if you haven't already.

3. Gracefully stop the vCenter Server Service.

4. Remove the existing System DSN:

5. Create a new Data Source using the SQL Server Native Client 11.0.

6. Input the information used for the previous DSN and test the connection.

7. Create a backup of the registry

8. Run RegEdit and go to the following location:

9. Select the DSN Driver string and manually enter the DSN Driver Name.

10. Confirm the settings and exit Regedit.

11. Reboot the vCenter server and confirm the vCenter Service starts up properly.

12.  Test vCenter functionality and remove snapshot.

Thursday, March 16, 2017

PSOD: #PF Exception 14 in world - On Cisco UCS with Intel Xeon Processor E5 v4, E7 v4 Family Processors (Broadwell)

Ooooh the dreaded purple screen of death on one of our Cisco UCS B-Series Blades.....

#PF Exception 14 in world....


The ESXi host logs showed a BUNCH of Memory Check Exceptions (MCEs)

2017-02-25T05:21:34.950Z cpu8:24267427)MCE: 242: cpu8: bank8: MCA recoverable error (CE): "Memory Controller Read Error on Channel 1."
2017-02-25T05:21:35.000Z cpu16:35591)MCE: 242: cpu16: bank8: MCA recoverable error (CE): "Memory Controller Read Error on Channel 1."
2017-02-25T05:21:35.250Z cpu17:19971329)MCE: 242: cpu17: bank8: MCA recoverable error (CE): "Memory Controller Read Error on Channel 1."
2017-02-25T05:21:35.000Z cpu16:35591)MCE: 242: cpu16: bank8: MCA recoverable error (CE): "Memory Controller Read Error on Channel 1."
2017-02-25T05:21:35.250Z cpu17:19971329)MCE: 233: cpu17: bank8: status=0xcc00024000010091: (VAL=1, OVFLW=1, UC=0, EN=0, PCC=0, S=0, AR=0), ECC=no, Addr:0x406cc600 (valid), Misc:0x3c5c27b940 (valid)
 2017-02-25T05:21:35.250Z cpu17:19971329)MCE: 242: cpu17: bank8: MCA recoverable error (CE): "Memory Controller Read Error on Channel 1."

Apparently, there is a known issue with Intel Xeon Processor (Broadwell) E5 v4, E7 v4 and D-1500 processors.  Symptoms include OS crashes with a signature pointing to internal parity errors, PF, DG or UD exceptions.

https://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=2146388


In my case,  I was using a Cisco UCS B-Series with an Intel Xeon E5-2697 v4 processor...  To resolve this issue, the hardware firmware must be upgraded to 3.1(2b).

Here's a link to the Cisco Bug report (Cisco Log In Required)


How To Schedule the vCheck Report

First off, I've been running Alan Renouf's vCheck for YEARS and it's one of the most useful tools I use.  For those who haven't used it here's a link below:


With vSphere 6.5, I plan to run VMware Update Manager on the vCenter Server Appliance, vCSA. (YES, Finally!)  Currently, VMware Update Manager is running on an old Windows Server 2008 R2 VM along with other tools, file shares and runs scheduled jobs.  

I figure it's time to move all these tools onto a newer OS.  Unfortunately,  the initial instructions I wrote up on scheduling the vcheck job no longer worked with Windows Server 2012.
Here's the steps for setting vCheck using Task Scheduler in Windows Server 2012 R2:

1.  Create a service account to be used to run the vCheck report.
2.  At a minimum, this account must have "Log on as a batch job" rights on the server running vCheck.

3. Grant this user the minimum permissions in vCenter to run the vCheck Plug-Ins you have enabled in your vCheck report.
3.  Setup vCheck to run manually with your preferred options using the vCheck Service account.
4.  Launch Powercli using the vCheck service account and run the new-vicredentialstoreitem command to add the service account to the credential store.  When the script is run, the credentials provided here will be used to connect to the vCenter server.

Exmple. C:\PS>New-VICredentialStoreItem -Host 'vCenterServer' -User 'admin' -Password 'password'


Keep in mind that the credential store is only obfuscated.  Use additional security as required by your security team. 

Additional details regarding New-VICredentialStoreItem can be found here:

https://www.vmware.com/support/developer/PowerCLI/PowerCLI50/html/New-VICredentialStoreItem.html

5. Setup your scheduled task in Windows.  Set the Task to run using your vcheck service account:

6. Configure the Action:


Program/Script: 
C:\Windows\System32\WindowsPowerShell\v1.0\powershell.exe

Add arguments (optional): 
"D:\vCheck-US-Daily\vCheck.ps1"

Start in (optional): 
"D:\vCheck-US-Daily\vCheck.ps1"

7.  Run a test scheduled job. Once you have confirmed it has run successfully, scheduled the permanent time for the job to run. 

Friday, March 3, 2017

How To Find the OS of a Remote Computer - Linux and Windows

Although we're primarily a Windows shop, with the proliferation of virtual appliances, I been working more with the Linux OS (YAY!)  With this comes non-Windows questions, like "What is this host on our domain?"

I've found the easiest way to find the OS running on a remote server is by using NMAP (Network Mapper).  The results may not be 100%, but it gets you pointed in the right direction.

I personally run NMAP from my CentOS 7 box.  To install NMAP, go to the location of the nmap distributions and note the latest rpm. 



Log into your CentOS and run the following command to download and install nmap:

sudo rpm -vhU https://nmap.org/dist/nmap-7.40-1.x86_64.rpm



To find the OS, run the following command:

sudo nmap -O servername

Here's an example of the output.  It's not perfect, but you get a general sense of the OS.  In this case, it was a test ESXi host running 6.5.