Tag Archives: VMware

Use PowerCLI to Automate Disaster Recovery Failover On Nutanix

Using VMware SRM on Nutanix has a few challenges.  SRM expects replication to happen at a datastore level.  By default Nutanix protection domains replicate at a VM level.  It is possible to set up Nutanix replication at a datastore level, but you lose granularity of being able to take VM specific snapshots.  SRM is also dependent on vCenter and SSO.  We were having a few issues that caused us to migrate from the Windows version of vCenter to the vCenter Server Appliance, and in doing so broke SRM so it had to be set up again.  Well, instead of setting it up again, I figured we would get more flexibility if I could do the same thing with PowerCLI.  Unfortunately, Nutanix’s Powershell CMDLET Migrate-NTNXProtectionDomain was published before actually implementing the failover part of the command, so after the script runs you still need to perform the additional step of logging into PRISM and clicking migrate. The script checks to see if the VMs are Windows or Linux. If they are Linux, the script expects a file to be staged called failover, that copies a staged network interface configuration file.

Change Nutanix CVM RAM with PowerCLI

*Update – story behind the script*
Finally I have a few minutes to write the story behind this script.

One of our VMware View environments was experiencing performance problems. The CPUs on our VMs would constantly spike to 100% after they were powered on. Our admins relayed back to engineering that they were having density issues. We reached out to Nutanix who recommended that we increase the cache size to be able to absorb more IOPS. To increase the cache size on Nutanix you simply need to power off the controller virtual machine (CVM) on a host, increase RAM, and power it back on. While is a non disruptive process if you power the CVMs on and off one at a time, it becomes a very disruptive process if someone makes a mistake and powers off more than one CVM at a time. It is also very time intensive because you must check that the CVM services are completely back up before you perform the procedure on the next CVM. With 120 hosts in our environment, and averaging 10 minutes per manual CVM procedure, it looked like it was going to take about 20 hours to perform this task. For us this means 3-4 days in maintenance windows!

I figured there has to be a way to automate this and eliminate the human component so we could perform this maintenance task all in one maintenance window. Well a couple hours of fiddling with powerCLI and trying to figure out which service is the last CVM service to power on, and running the script in our test environment to work out the bugs and we were ready to run it in production. In our environment the average run time per CVM was about 5 minutes, but the best part is that it really saves hours of admin time. An admin only needs to babysit the script while it is running instead of needing to perform an intensive manual process. This shows the huge benefit of Software Defined Storage. Imagine trying to update cache on a traditional SAN without any downtime… isn’t going to happen.

It later turned out that the issue in our environment was a classic VMware View admin mistake of installing updates and then shutting down immediately and recomposing the pool. The updates needed to finish installing after reboot, so they finished installing on all of the linked clones when they powered on. Combined with refresh on logoff which occurs multiple times per day and it was a sure way to test max performance of our equipment!

VMware View Guy Admits that Citrix XenDesktop is Just As Good

So I’ll admit it, I knew nothing about Citrix.  Well I mean other than all the FUD VMware was spewing about how much “fun” I would have if I ever implemented it for a customer.  Citrix actually showed up in the office about 4 years ago to try to explain what was going on but all I remember is that they showed me something called Dazzle and I thought, “how the hell am I supposed to explain to my customers what a Dazzle is supposed to do?” and then went back to installing VMware View.

Really, I was just too busy running around deploying View to get a couple hours to deploy XenDesktop and do my own fact checking.  And really, that is all it takes, is a couple hours. 

One of my vendors insisted that I was missing out.  They introduced me to the Federal team over at Citrix, who got me into Citrix Synergy and introduced me to Bob Mensah, Systems Engineer for Citrix.  Bob is an amazing font of Citrix knowledge!  Bob was able to walk me through the installation of XenDesktop in my lab in a couple hours while I was literally sitting at Honda waiting for my wife’s van to be serviced.

If you’ve been doing View for any significant period of time it’s not that hard to pick up.  Yeah, all the services have different names, but they have the same functionality.  Here’s a chart to help you figure it out:

Horizon View Citrix XenDesktop
vCenter vCenter (but could also be XenCenter or SCVMM)
View Connection Server StoreFront
View Composer Machine Creation Services
View Administrator Citrix Studio
Horizon Workspace StoreFront
Install license key on host Licensing Server
Need 3rd party load balancer Netscaler included
ThinApp (packaged executables) XenApp (Streamed Applications)
Blast (run ThinApps, XenApps, or RDSApps) StoreFront / XenApp

Bob Mensah even pointed me toward these guides that helped me set up CAC authentication in my lab:
Citrix – Create a JITC test CAC environment for XenDesktop/XenApp
Microsoft Technet – Step by Step Guide – Single Tier PKI Hierarchy Deployment

The Citrix administrative tools are Windows only, which could be seen as a draw back, but really the vSphere Web Client and View Administrator client are written in Flash and are slow, so I think Citrix actually has better functioning tools here.

Using Citrix Receiver to connect to a Windows desktop feels a lot like using the View Client.  The one thing that I did notice using my CAC was that I had to use my PIN two times.  Once to authenticate to StoreFront and then another to authenticate to the Windows VM.  With View I only have to put in my PIN once to authenticate to the View Connection Server and that gets passed to the VM.  Citrix told me that this is to overcome a security issue with having the PIN cached on the connection broker, but really I have never had an IA person tell me that was an issue with View so I am curious to understand where that requirement came from.

One thing that the Citrix Receiver has going for it is that it works with the new Tactivo iPad CAC Reader from Precise Biometrics.  CAC Authentication for iPad is nothing new, but previously it could only be accomplished on a per app basis with specialized apps designed to interact with some kind of Bluetooth CAC reader or dongle.  Neither were very convenient.  The Bluetooth reader meant that you needed to carry around an extra peripheral, charge it, and hope nothing interrupted your bluetooth connection.  The dongle… was just cumbersome and silly.  The Tactivo is a sleek integrated case, shown below in the iPad mini model with a magnetic smart cover (not included).  It connects via the lightning adapter and has a micro USB port that supports charging only.  See my photos of the unit below.  The VMware View client does not support this unit yet and I’m suspecting that it will actually fuel a lot of interest in Citrix until they do.

photo 3 photo 2 photo 4

Using XenApp you can now wrap CAC authentication around any application and present it on the iPad, including presenting entire Windows desktops complete with paired bluetooth keyboard and mouse (explained below)!

photo 6         

The other innovative thing about the Citrix Receiver client for iPad is that they have cleverly overcome the iOS inability to pair with a bluetooth mouse!  You can use another iOS device with the Citrix Receiver client installed on it as a touchpad!  The only silly part about this was that I had to set up the storefront connection on the extra device before I could pair it.  I am assuming that it either communicates between the iDevices through wireless or bluetooth, so I think that having to set up the client before you can use it as a touchpad is unnecessary.  However it works really well.  While the screen is a little small on the iPad mini, I was able to open applications and even play a movie just like I could with the Windows client.  My opinion is that it would definitely be a better experience with a full size iPad.

The only other issue I had when I was using the Citrix Receiver client is that there are a lot of extra options in the settings (shown in the picture below) that weren’t intuitive.  Here is the documentation for the client, but if you look through it you will see that the settings in the picture below are not documented.  If you look at the documentation for the View Client for iOS you see that every little feature in the client has a blurb explaining what it does.

options

In all, my initial impression of Citrix XenDesktop is that it has just as much functionality as VMware View.  I just wish that some things had more effort put into documentation rather than getting the functionality ready to ship.

Nutanix and VMware vSphere Host Profiles

Host profiles seem like a great idea… Make sure that all of your hosts are configured consistently and enforce compliance. However, when it comes to actually applying a host profile the caveat is that you need to put the host in maintenance mode to apply it. This means that you have to vMotion any running VMs to another host and then enter maintenance mode… A process that could take quite a while depending on the number of VMs you have running.

On Nutanix there is the pesky issue that there is one VM that you can not vMotion to another host… the CVM! The CVM (Controller Virtual Machine) is the storage controller that lives on the host. The physical disks are presented to the VM through VMDirectPath. Since Virtual Machines that are tied to physical devices on the host can not be vMotioned the host will fail to enter maintenance mode. It is possible to shut down a CVM on one node, then put that host into maintenance mode, apply the host profile, exit maintenance mode, power on the CVM, then SSH into the CVM to make sure it is back into the storage cluster before you rinse and repeat for all of your hosts. However, that is a very manual process! It would be bearable to perform on one block (four Nutanix hosts), but if you have hundreds of hosts it will take weeks and a small army of dedicated sys admins to complete the task.

It’s too bad that VMware couldn’t have host profiles distinguish between minor and major changes when dealing with applying host profiles. For example adding a port group would be a minor change, not requiring entering into maintenance mode, while attaching a vSwitch to a vNIC would be a major change requiring maintenance mode because of its potential to disrupt traffic for all of the VMs on that host.

Do we really need host profiles? Nutanix is trying to market the idea that infrastructure should be web-scale. I don’t really like the term web-scale because I think it implies that you’re trying to build some kind of internet service, but that’s beside the point… What they are trying to say is that it should be easy to massively scale infrastructure. This includes having to manually configure a bunch of settings. Putting all of the hosts in your environment into maintenance mode just to apply some settings definitely isn’t scalable. There is no reason to do it!

Every change that a host profile makes can be accomplished through PowerCLI without putting your host into maintenance mode. My recommendation for Nutanix hosts is to use PowerCLI to make any changes to your hosts that you want to be consistent throughout your environment, and then maintain your PowerCLI script and apply it to new hosts that you add to your environment.

You could also make a script that checks the settings on the hosts to monitor for compliance, for example to make sure that no one has added a vLAN to just one host. If you are using vCloud in your environment VMware includes VCM (vCenter Configuration Manager) which accomplishes the same task, with the added component of generating automated compliance reports.

Of course I’m implying that your hosts are running VMware, Nutanix also supports running Hyper-V and KVM where it’s almost inherently implied that you are going to need scripts to maintain consistency in the environment.

Upgrade Nutanix 1350 block to ESXi 5.5

Nutanix recommends that you upgrade to vSphere 5.5 using the VMware Update manager instead of directly mounting the ISO.

Another way to upgrade instead of installing Update Manager is to just download the offline bundle and run the command:

esxcli software vib update –d “FILEPATH to OFFLINE BUNDLE”

Here are the steps that I used to upgrade my nodes from ESXi 5.0.0 to ESXi 5.5.

  1. Download ESXi 5.5 bundle from VMware.
  2. Upload the bundle to the root of my Nutanix datastore

    image

  3. SSH to the CVM.  From the CVM we can execute a script that will run on all of the hosts:

    for i in hostips; do echo $i && ssh root@$i "esxcli software vib install -d /FILEPATH TO OFFLINE BUNDLE"; done

    *I missed that the hostips is encapsulated with backticks and not ‘’ single quotes so I just logged onto each host and ran “esxcli software vib install –d /FILEPATH TO OFFLINE BUNDLE”

    image

  4. Shutdown the CVM.  We are able to shut down one CVM at a time without disrupting the state of the cluster.   Then reboot the host.

     image

  5. Rut-roh!  My host didn’t come back into vCenter.  When I try to force it to reconnect it tells me that some virtual machines powered back on without following the cluster EVC rules.  Upgrading to ESXi 5.5 must have reset the EVC setting on that host.

    image

    To remedy it I shut down the CVM, force the host to reconnect, then power the CVM back on.  On the next node I just put the host into maintenance mode before I reboot.

Copy files between ESXi hosts using SCP

Need a quick way to move files on one datastore to the datastore of another host that is not within the same vCenter?

In a Nutanix environment SSH is enabled on the hosts so we can use SCP to do this.  I needed to move an ISO repository from the production cluster to the TEST / DEV cluster.  Log into the source host as root, change directory to the datastore folder  (/vmfs/volumes/DATASTORE/FOLDER) and then run the following command:

scp –r * root@DESTINATION:/vmfs/volumes/DATASTORE/FOLDER

# The destination FOLDER must already exist on the destination DATASTORE.

Export Teradici PEM cert from Windows

We deployed certs on our View Connection Servers on one of our projects and needed to put our CA’s root cert on the zero clients.  The zero clients expect the cert in PEM format.  Turns out that this format is just a Base 64 encoded X.509 cert that you can export from windows.

Open the cert in windows either through the certificates mmc or by double clicking on the cert file.  Click on the Details tab then click Copy to file…

image

 

Click Next.

image

 

Click the radio button Base-64 encoded X.509 (.CER)

image

 

Specify the path where you want to save and click next.

image 

 

Click Finish.

image

 

Using your favorite method, simply change the file extension from .cer to .pem.

image

In the PCoIP Manager click the Profiles tab and then click Set Properties.

image

At the very bottom of the page you’ll find the Certificates section.  Click Add New.

image

Select the .pem cert file that you just renamed and click Add.

image

You will see that the cert has been successfully added and you can push it out in your zero client profile.

image

Migrate VMs on Nutanix from one cluster to another without Live Migration

One of the great things about Nutanix is that you can add nodes one at a time and grow your storage cluster.  One of the bad things about Nutanix is that there really isn’t a way to remove a node from a cluster (yet) without doing a cluster destroy.  Cluster destroy is basically game over for that cluster, it removes all of the nodes and puts them back to factory restore mode, as in they look as when they arrived from the factory.

So what happens when you buy a few more blocks from Nutanix, create a new cluster, and need to migrate your production VMs from the old cluster to the new cluster?

We ran into a situation where we had our production servers running on Nutanix 3450 blocks, and needed a bit more oomph, so we purchased Nutanix 3460 blocks which support 512GB RAM per node instead of 256GB and have 10 core CPUs instead of 8.  We could have added these nodes to the same cluster except that we wanted to take the old nodes and add them to our VDI cluster.  (We haven’t performed any performance testing on the solution of just having one cluster mixing VDI and server workloads, so we decided to play it safe and segregate the clusters).

So how do we migrate 6TB of production VMs all in one night and maintain application consistency?  Live Migration!?  Well, we could have tried it, but upgrading to vSphere 5.5 SSO seems to have killed our vSphere webclient.  Support ticket opened… Yay VMware for not including live migration in the Windows client because it’s not like we still need that supported for SRM or Update Manager or anything because that is fully supported by the webcli…. oh.  Also I’m sure that as soon as they get everything working in the webclient that 95% of their enterprise customers are going to ditch windows because finally it will be the year of the linux deskt… oh.

Meanwhile back at the ranch, we need to get these VMs over to the new cluster.  I guess we’re going to power them off and do a storage migration.  Luckily our production servers support a mission that only happens during the day, so powering them off for a few hours isn’t that big of a deal.  Maybe we should test this first.  Test VM created, power off, right click Migrate, start migration and… it’s moving at a whopping 33MB/s.  Hmm… so 6TB/33MB/s = 58hrs 15 minute and 15 seconds to complete.  Uh, I don’t think that’s going to work.  VMware should really add storage migrations to the VAAI API and let the storage vendors figure out how to speed up transfers.

Still, I don’t have 58 hours of downtime to migrate these VMs.  How can I get them migrated in a reasonable time?  Nutanix DR to the rescue!

All of the gory details about how DR works is a separate blog post.  Let’s suffice it to say that I did the following:

#Log into CVM and open firewall ports for DR
for i in svmips; do ssh $i "sudo iptables -t filter -A WORLDLIST -p tcp -m tcp –dport 2009 -j ACCEPT && sudo service iptables save"; done

#Create the remote site of new cluster on old cluster
remote-site create name=NEW_CLUSTER address-list="10.xxx.xxx.2" container-map="OLD_DATASTORE:NEW_DATASTORE" enable-proxy="true"

#Create the remote site of old cluster on new cluster
remote-site create name=KEN address-list="10.xxx.xxx.1" container-map="NEW_DATASTORE:OLD_DATASTORE" enable-proxy="true"

#Create the protection domain
pd create name="PRODUCTION"

#Add my production server VMs to the protection domain
pd protect name="PRODUCTION" vm-names=PROD01,PROD02,PROD03 cg-name="PRODCG"

#Migrate the production VMs
pd migrate name=”PRODUCTION” remote-site=”NEW_CLUSTER” 

This operation does the following:
1. Creates and replicates a snapshot of the protection domain.
2. Shuts down VMs on the local site.
3. Creates and replicates another snapshot of the protection domain.
4. Unregisters all VMs and removes their associated files.
5. Marks the local site protection domain as inactive.
6. Restores all VM files from the last snapshot and registers them on the remote site.
7. Marks the remote site protection domain as active.

#Check that replication started
pd list-replication-status

You will see an output similar to below on the sending cluster:

ID 2345700
Protection Domain PRODUCTION
Replications Operation Sending
Start Time 01/11/2014 20:35:00 PST
Remote Site NEW_CLUSTER
Snapshot Id 2345688
Aborted false
Paused false
Bytes Completed 2.72 GB (2,916,382,112 bytes)
Complete Percent 91.117836

On the receiving cluster you will see:

ID 4830
Protection Domain PRODUCTION
Replications Operation Receiving
Start Time 01/11/2014 20:35:00 PST
Remote Site OLD_CLUSTER
Snapshot Id OLD_CLUSTER:2345688
Aborted false
Paused false
Bytes Completed 2.72 GB (2,916,382,112 bytes)
Complete Percent 91.117836

If you want to watch the replication status a helpful command to know is the linux command watch.  The command below will update the status every 1 second.

watch –n 1 ncli pd list-replication-status

Since the migration takes two snapshots you will see the replication status reach 100% and then another replication will start for the snapshot of the powered off VMs.

When it gets to 100% on the first snapshot the VMs we be removed from the old cluster in vCenter.  After the 2nd replication completes they will be added to the new cluster.

For our migration the transfer seemed to reach 90% fairly quickly, then took about 1-2 hrs to get from 90-100%.  Perhaps someone from Nutanix can shed some light on what is happening during that last 10% and why it takes so long.

Nutanix 1350

I have been using the Nutanix Virtual Computing Platform 3450 and 3460 appliances on some of my recent projects.  I have been wanting to do some testing to see what these appliances are capable of, I mean other than hosting 5000+ VMware View desktops, but it’s not like I can just go pull one out of production and fire up IOMeter, or install Hyper-V on it, or do some What-If-BadThingsTM happen like a hard drive accidently getting pulled or two nodes decide to power off at the same time.

Nutanix was kind enough to send me a Nutanix 1350 Virtual Computing Platform appliance to do exactly this.  The 1000 series is the little brother to the 3000 series.  Without having received Nutanix Official Sales Training(TM) I should clarify what the series numbers mean:

X (Series Number)
X (Number of Nodes)
X (Processor Type)
X (SSD Drive Capacity)
1 (1000 Series)
3 (3 Nodes)
5 (Dual Intel Sandy Bridge E5-2620)
0 (1-400GB SSD Drive)

Nutanix had also warned me that the appliance is rated to consume 1150W at 10-12A.  With all of the other equipment that I have in the office, my 15A circuit didn’t look like it was going to cut it.  Time for a power upgrade!

However, something seemed to be missing to complete this power upgrade… attic access!  5 days, 10 trips to Home Depot, a stud finder, 1 new reciprocating saw, and 4 holes in the wall later I had finally installed a new 20A circuit!

This is also probably where I should put the disclaimer:
I am a computer systems engineer and not a licensed electrician.  Any work performed on your own structures must be performed according to your local laws and building codes.  It is highly recommended to have any electrical work performed by a licensed electrician.

Found the back of the electrical panel!

 electrical

 

Circuit breaker installed!

20A 

 

Time for unboxing!

 Nutanix

Even though it came with rails, I don’t feel like moving everything around in my lab rack, I want to play!  I’ll just set it on top and rack it later.

nutanix

So now that I have it plugged in, let’s see what this thing is going to cost me to run.  Thanks to Southern California Edison and the California Public Utilities Exchange Commission I’m in Tier 4 which costs $0.31 per kilowatt hour.  At 1.15 kw/hr * 24 hrs per day * 30 days per month * $0.31, I’m looking at a $256.68 increase in my bill next month. 

However, I plugged in my Kill-A-Watt meter and it shows me that these 3 nodes are only consuming 367 Watts.  At 0.367 kw/hr * 24 hrs per day * 30 days per month * $0.31, it looks like I’m only going to be paying an additional $81.91.  I realize that these numbers are at idle, so I’ll have to write another post once I get a load spun up.  Also, this load probably could have fit on my existing 15A circuit.  But at least I got to play Tim Taylor over the holiday break and get more power!

kill-a-watt