*Update – story behind the script*
Finally I have a few minutes to write the story behind this script.
One of our VMware View environments was experiencing performance problems. The CPUs on our VMs would constantly spike to 100% after they were powered on. Our admins relayed back to engineering that they were having density issues. We reached out to Nutanix who recommended that we increase the cache size to be able to absorb more IOPS. To increase the cache size on Nutanix you simply need to power off the controller virtual machine (CVM) on a host, increase RAM, and power it back on. While is a non disruptive process if you power the CVMs on and off one at a time, it becomes a very disruptive process if someone makes a mistake and powers off more than one CVM at a time. It is also very time intensive because you must check that the CVM services are completely back up before you perform the procedure on the next CVM. With 120 hosts in our environment, and averaging 10 minutes per manual CVM procedure, it looked like it was going to take about 20 hours to perform this task. For us this means 3-4 days in maintenance windows!
I figured there has to be a way to automate this and eliminate the human component so we could perform this maintenance task all in one maintenance window. Well a couple hours of fiddling with powerCLI and trying to figure out which service is the last CVM service to power on, and running the script in our test environment to work out the bugs and we were ready to run it in production. In our environment the average run time per CVM was about 5 minutes, but the best part is that it really saves hours of admin time. An admin only needs to babysit the script while it is running instead of needing to perform an intensive manual process. This shows the huge benefit of Software Defined Storage. Imagine trying to update cache on a traditional SAN without any downtime… isn’t going to happen.
It later turned out that the issue in our environment was a classic VMware View admin mistake of installing updates and then shutting down immediately and recomposing the pool. The updates needed to finish installing after reboot, so they finished installing on all of the linked clones when they powered on. Combined with refresh on logoff which occurs multiple times per day and it was a sure way to test max performance of our equipment!
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 |
#Nutanix CVM RAM Add Script #This script leverages The SSH.NET powershell module from http://www.powershelladmin.com/wiki/SSH_from_PowerShell_using_the_SSH.NET_library #Download it at: http://www.powershelladmin.com/w/images/a/a5/SSH-SessionsPSv3.zip Import-Module SSH-Sessions #This script records the time to collect metrics on how long it takes to perform the memory upgrade. $scriptStart = (Get-Date) #Connect to the vCenter Servers connect-viserver vc01.test.com #Find all of the Nutanix CVMs that have less than 24GB RAM $vms = Get-VM -name NTNX* | Where MemoryGB -lt 24 #Sort the CVMs by IP address (just to watch the CVMs be done in order) $vms = $vms | Sort-Object guest.IPAddress[0] #Loop though the CVMs and upgrade them one at a time foreach ($vm in $vms) { #Use the IP address of the CVM to connect to it with SSH $CVM = $vm.guest.IPAddress[0] #Using the default user/pass New-SshSession -ComputerName $CVM -Username 'nutanix' -Password 'nutanix/4u' #Check to make sure that the CVMs in the cluster are all up. If a CVM is not UP it will be in DOWN state. $result = Invoke-SshCommand -ComputerName $CVM -Command '/home/nutanix/cluster/bin/cluster status | grep Down' Remove-SshSession -RemoveAll #Perform memory upgrade if there are no CVMs DOWN and SSH connection was successful If ($result -NotLike '*Down*' -and $result -notlike "*No SSH session found*") { write-host "Shutting down $CVM" Shutdown-VMGuest $vm -Confirm:$false #Wait a period of time to make sure the CVM is shutdown before changing settings sleep 60 #Set CVM memory write-host "Setting $CVM Memory" Set-VM $vm -MemoryGB 24 -Confirm:$false #Power-on CVM write-host "Starting $CVM" Start-VM $vm -Confirm:$false #Wait for CVM to start before checking that it is UP sleep 60 write-host "Checking $CVM state" #Check that the services are started on the CVM before performing the upgrade on the next CVM. From what I could tell by watching the services start, alert_manager is the last service to start. Do { New-SshSession -ComputerName $CVM -Username 'nutanix' -Password 'nutanix/4u' $result = Invoke-SshCommand -ComputerName $CVM -Command "/home/nutanix/cluster/bin/genesis status | grep alert_manager" Remove-SshSession -RemoveAll #If the services are not started yet they will have a status of [] If ($result -contains 'alert_manager: []') {write-host "$CVM Down"} #Wait before attempting to make another SSH connection sleep 5 } #If the service is started there will be port numbers in the brackets. Check to see if the brackets are empty. Make sure to escape the [] characters with `. Until ($result -notlike '*`[`]*' -and $result -notlike "*No SSH session found*") write-host "$CVM Up" $scriptEnd = (Get-Date) #Calculate the timespans $totalTime = New-Timespan -Start $scriptStart -End $scriptEnd #Write a time summary write-host "Total Script Time" write-host $totalTime } Else { #If a CVM is down at the beginning of the script, just end the script. write-host "Cluster not ready." Disconnect-VIServer -Confirm:$false Break } } Disconnect-VIServer -Confirm:$false |