Wednesday, May 16, 2018

VMWare vCSA 6 - Failed to start File System Check on /dev/dis...

AKA - The vCSA is FRAGILE...

Errors encountered during this process:
Failed to start File System Check on /dev/dis...
Failed to start update UTMP about System RunLevel Changes.
Failed to start Network Service.

We recently had a quick "blip" on one of our storage arrays.   All the Windows Servers had no disruption of service or came back up without incident.

This was not the case with our vCSA and external PSC.  BOTH servers were not functional.   So, it doesn't appear to be a one-off or fluke.  The appliances were running version 6.5.0.15000, the March 2018 release.

Both servers showed "Detected aborted journal"  and "journal has aborted" errors on the console.  I started trouble shooting with the external PSC.

Upon restart, I received the following error, and the server entered Emergency Mode:

Failed to start File System Check on /dev/dis...


Log in and run the following commands to determine the device which is causing the error (Both were /dev/sda3 in my case):

/bin/sh
/bin/mount
blkid


Match the UUID in the error message with the PARTUUID in the output.  In the example below, we see it matches up with /dev/sda3.


Run the following command which runs a check on ext2, 3 and 4 File Systems. "-y" answers "yes" to all the questions. (Super handy)

e2fsck -y /dev/sda3



After the file system check has completed, restart the appliance.

This resolved the issue with the the external PSC.  Cool, just repeat the process on the vCSA right?  Not so fast....

I had the following additional errors with the vCSA after running the file check on /dev/sda3.

Failed to start update UTMP about System RunLevel Changes.
Failed to start Network Service.



Running the following command to view the contents of the systemd journal.  This pointed me to log_vg-log

journalctl -xb



Run a file system check against log_vg-log by running the following:

fsck -y /dev/mapper/log_vg-log



Reboot the server after the fsck has completed.  After coming back up, the vCenter services started successfully and I was able to log into the vCSA.

Tuesday, May 15, 2018

Deploy OVF Template - The following manifest file entry (line 1) is invalid: SHA256

AKA - Another reason to stop using the vSphere C# Client....

I created an OVF template from my vSphere 6.5 environment.  When trying to import the template using the vSphere C# Client, I received the following error:

The following manifest file entry (line 1) is invalid: SHA256


vSphere 6.5 started using SHA256 as the default hashing algorithm when exporting OVF templates.  Unfortunately, the vSphere Fat/C# Client only supports SHA1.

There a several ways to resolve this issue:
Option 1- Use the Web or HTML 5 client to import the OVF.  Both support SHA256.

Option 2. Use the OVFTool to convert the Cryptographic Hash Algorithm from SHA256 to SHA1.  This free tool can be downloaded here:
https://www.vmware.com/support/developer/ovf/

Option 3. (Not recommended) If you trust the source of the OVA, you can delete the optional .mf file (manifest file) and just use the .ovf and .vmdk files to import the VM..  The .mf file contains the SHA256 info





ESXi 6 - How to Unlock Your SSH Account.

ESXi Account lockout info:
1.  Accounts are locked after 10 failed attempts through SSH and the vSphere Web Services SDK.
2. The Direct Console Interface (DCUI) and ESXi shell do not support the account lockout feature.
3. The account automatically unlocks after 120 seconds by default.
4. ESXi leverages the Linux Pluggable Authentication Modules (PAM)

If you are unable to wait for the account to unlock, you can reset the account by doing the following:
1. Console into your server by using your DRAC/iLO/UCS Manager etc.
2. log in as root, and run the following command to unlock the account.  In the example below, you can see there was 11 failed attempts:

pam_tally2 --user root --reset