Category Archives: PowerCLI

Use PowerCLI to Automate Disaster Recovery Failover On Nutanix

Using VMware SRM on Nutanix has a few challenges. SRM expects replication to happen at a datastore level. By default Nutanix protection domains replicate at a VM level. It is possible to set up Nutanix replication at a datastore level, but you lose granularity of being able to take VM specific snapshots. SRM is also dependent on vCenter and SSO. We were having a few issues that caused us to migrate from the Windows version of vCenter to the vCenter Server Appliance, and in doing so broke SRM so it had to be set up again. Well, instead of setting it up again, I figured we would get more flexibility if I could do the same thing with PowerCLI. Unfortunately, Nutanix’s Powershell CMDLET Migrate-NTNXProtectionDomain was published before actually implementing the failover part of the command, so after the script runs you still need to perform the additional step of logging into PRISM and clicking migrate. The script checks to see if the VMs are Windows or Linux. If they are Linux, the script expects a file to be staged called failover, that copies a staged network interface configuration file.

Change Nutanix CVM RAM with PowerCLI

*Update – story behind the script*
Finally I have a few minutes to write the story behind this script.

One of our VMware View environments was experiencing performance problems. The CPUs on our VMs would constantly spike to 100% after they were powered on. Our admins relayed back to engineering that they were having density issues. We reached out to Nutanix who recommended that we increase the cache size to be able to absorb more IOPS. To increase the cache size on Nutanix you simply need to power off the controller virtual machine (CVM) on a host, increase RAM, and power it back on. While is a non disruptive process if you power the CVMs on and off one at a time, it becomes a very disruptive process if someone makes a mistake and powers off more than one CVM at a time. It is also very time intensive because you must check that the CVM services are completely back up before you perform the procedure on the next CVM. With 120 hosts in our environment, and averaging 10 minutes per manual CVM procedure, it looked like it was going to take about 20 hours to perform this task. For us this means 3-4 days in maintenance windows!

I figured there has to be a way to automate this and eliminate the human component so we could perform this maintenance task all in one maintenance window. Well a couple hours of fiddling with powerCLI and trying to figure out which service is the last CVM service to power on, and running the script in our test environment to work out the bugs and we were ready to run it in production. In our environment the average run time per CVM was about 5 minutes, but the best part is that it really saves hours of admin time. An admin only needs to babysit the script while it is running instead of needing to perform an intensive manual process. This shows the huge benefit of Software Defined Storage. Imagine trying to update cache on a traditional SAN without any downtime… isn’t going to happen.

It later turned out that the issue in our environment was a classic VMware View admin mistake of installing updates and then shutting down immediately and recomposing the pool. The updates needed to finish installing after reboot, so they finished installing on all of the linked clones when they powered on. Combined with refresh on logoff which occurs multiple times per day and it was a sure way to test max performance of our equipment!