Category Archives: Nutanix

What #NixVblock should have been

Nutanix is running a marketing campaign #NixVblock.  As part of the marketing  campaign they had a video that I really can’t describe better than the way Sean Massey put it:

“VBlock is supposed to be an uninteresting, high maintenance woman who hears three voices in her head and dresses like three separate people.

The “VBlock” character is supposed to represent the negatives of the competing VCE vBlock product.  Instead, it comes off as the negative stereotype of a crazy ex that has been cranked past 11 into offensive territory.”

While I was not personally offended by the video, it was inappropriate, and I was very disappointed.  It had the feeling of the inside joke that you tell someone else who isn’t involved and then you come off as an insensitive jerk.  You didn’t mean to be an insensitive jerk, you just wanted to let your new friend in on the joke too.  When you turn that joke into a marketing video for your company, comparing your competitor to a crazy date and broadcast it to the world in an official marketing campaign, that is sexist and immature.  Would VCE put out a video like that?  The immaturity of the video just makes Nutanix come out looking like the underdog that they are… nipping at the heels of VMware, Cisco and EMC.

Since Nutanix is still a startup, perhaps they still have interns running the marketing department?  I really only need to ask the marketing department one question that should illustrate why I am upset that they chose such an immature method to attempt to communicate their product’s technical superiority to vBlock (which that video doesn’t even attempt to address).  Who is the intended audience of that video?  Is it customers that haven’t purchased Nutanix before but are also considering VCE?   Consider that some of my US Federal customers have many organizations run by women.  Is that video something that I should point them to that will make them choose Nutanix over VCE?  Is that video going to help convince them that Nutanix is actually the more mature feature rich product?

I have actually experienced trying to procure VCE for a project.  VCE is actually a separate company that resells VMware, Cisco and EMC in one package.  They market that the value add is that their support is qualified in all three products and won’t redirect you to VMware, Cisco or EMC.  But in reality this only helps tier 1 sys admins.  If you forget to check a box, VCE will help you, but if you encounter anything that is a serious bug in one of the technologies, you are going to get redirected to the source.  Also when I tried to procure VCE it came out as SIGNIFICANTLY more expensive than just buying the components separately and putting them together myself… I guess that VCE SME has to eat to?!  Imagine that… putting in a middle man costs more money rather than less… Who would have thought it?!

Another disadvantage you have with VCE is that you lose the ability to compete the internal components.  For example I lose the ability to compete VMware with Citrix, Cisco with Brocade or Arista, and EMC with Netapp, which lowers costs for my customers.  I also had the requirement to have US citizen on US soil support which at the time the VCE rep couldn’t answer if they had or not… IE I was going to get redirected to the component supplier when I called support anyway.  In the end, I just bought the separate VMware, Cisco, and EMC components and bolted them together myself.

Of course that was long before Nutanix.  Which brings be to the title of this post.  All Nutanix really had to do was highlight the features that Nutanix has that vBlock doesn’t have.  Let’s compare.

Nutanix vBlock
Built-in VM aware Disaster Recovery integrated into GUI with N:Many replication Not Built in.  Can buy RecoverPoint for Block replication and MirrorView for file replication. Not VM aware unless you’re talking about vSphere replication, but that’s not really storage-level replication
VM aware storage snapshots Block or File level snapshots
Simple web based GUI interface Cluttered Java interface that I can only get to when I alter security policies to allow some version of java 5 releases old.
Storage Controller on every node 2 Storage Controllers
Infinitely scalable Forklift upgrade
Shared nothing architecture Shared Everything Architecture
Built in Compression / Deduplication Why would you compress / dedupe?  How would VCE make you buy more disks?
Shadow Clones Nothing like shadow clones.
Built in storage analytics that detail IO by disk, VM and node Not Built in.  You can buy the EMC Storage Analytics plug-in for vCOPS for $20K.
Prism Central management interface can span multiple clusters You can argue that Unisphere can do this too, but is still in Java and sucks.

I could sit here for an hour adding to this list, but I think I’ve made my point.

Nutanix, please don’t fire anyone for failing with that video.  We can forgive you, and you need to allow people to make mistakes, learn and grow from them, but going forward please stick to marketing your strengths.  You don’t need to put anyone down, what you are doing stands out for itself.  Take the high road and you’ll win more friends.  I also get that may have grown out of an inside joke and sometimes it is hard to see any potential complications from the inside, but you have enough money to get an external PR agency for future marketing campaign analysis.

Nutanix Compression Results

Nutanix has a feature called Post Process Compression.  It’s gone through a couple marketing name changes, and it looks like the latest name for it is MapReduce Compression.  Basically what this means is that when data is written it can be compressed after a period of time (0 to X minutes later).  When the data is accessed again it is decompressed, then recompressed after the specified time period.  The compression is designed to perform the task with unused cycles, meaning that it will not compete with the production workloads. 

There are not really any other end user configurable options for the compression other than on/off and delay.

If you have a file that is constantly accessed, you will want to set a delay of at least a few minutes so it is not constantly being compressed / decompressed every time someone opens it.

Unfortunately Nutanix does not currently have an estimation tool to determine what kind of savings you may get by enabling compression or how long it will take to compress so I decided to test this feature for myself on a test cluster as I am looking at enabling it in production. 

Compression is enabled at the container level.  You can either use the ncli command or you can enable it in the PRISM UI:

NCLI: 
container edit id=[container id] enable-compression=[true] compression-delay=[# minutes]

PRISM UI:
Click on Storage, Diagram, Update, Advanced Settings.

image

As you can see here I started out with 3.27 TB of data.  About 1TB are VMs and 2TB are documents, ISOs, photos and videos. 

image

It took a couple days for it to stop churning.  It finally ended up with 12% compression.
 image

 

Below is the performance chart for the CVMs in this test cluster.  All VMs were powered on (although many were doing nothing).  You can see that 25% utilization is the normal idle and that most of the compression was performed in the first few hours.

image

image

image

 

Conclusion:
Overall I see no downside to enabling the compression feature.  While it didn’t save me an amazing 50%, from what I can tell there is no noticeable performance impact, so why not save all the space that I can?  With the changes coming to the Nutanix software licensing this is now a standard feature, which makes me happy as it was previously a separately licensed feature.

Nutanix and VMware vSphere Host Profiles

Host profiles seem like a great idea… Make sure that all of your hosts are configured consistently and enforce compliance. However, when it comes to actually applying a host profile the caveat is that you need to put the host in maintenance mode to apply it. This means that you have to vMotion any running VMs to another host and then enter maintenance mode… A process that could take quite a while depending on the number of VMs you have running.

On Nutanix there is the pesky issue that there is one VM that you can not vMotion to another host… the CVM! The CVM (Controller Virtual Machine) is the storage controller that lives on the host. The physical disks are presented to the VM through VMDirectPath. Since Virtual Machines that are tied to physical devices on the host can not be vMotioned the host will fail to enter maintenance mode. It is possible to shut down a CVM on one node, then put that host into maintenance mode, apply the host profile, exit maintenance mode, power on the CVM, then SSH into the CVM to make sure it is back into the storage cluster before you rinse and repeat for all of your hosts. However, that is a very manual process! It would be bearable to perform on one block (four Nutanix hosts), but if you have hundreds of hosts it will take weeks and a small army of dedicated sys admins to complete the task.

It’s too bad that VMware couldn’t have host profiles distinguish between minor and major changes when dealing with applying host profiles. For example adding a port group would be a minor change, not requiring entering into maintenance mode, while attaching a vSwitch to a vNIC would be a major change requiring maintenance mode because of its potential to disrupt traffic for all of the VMs on that host.

Do we really need host profiles? Nutanix is trying to market the idea that infrastructure should be web-scale. I don’t really like the term web-scale because I think it implies that you’re trying to build some kind of internet service, but that’s beside the point… What they are trying to say is that it should be easy to massively scale infrastructure. This includes having to manually configure a bunch of settings. Putting all of the hosts in your environment into maintenance mode just to apply some settings definitely isn’t scalable. There is no reason to do it!

Every change that a host profile makes can be accomplished through PowerCLI without putting your host into maintenance mode. My recommendation for Nutanix hosts is to use PowerCLI to make any changes to your hosts that you want to be consistent throughout your environment, and then maintain your PowerCLI script and apply it to new hosts that you add to your environment.

You could also make a script that checks the settings on the hosts to monitor for compliance, for example to make sure that no one has added a vLAN to just one host. If you are using vCloud in your environment VMware includes VCM (vCenter Configuration Manager) which accomplishes the same task, with the added component of generating automated compliance reports.

Of course I’m implying that your hosts are running VMware, Nutanix also supports running Hyper-V and KVM where it’s almost inherently implied that you are going to need scripts to maintain consistency in the environment.

Nutanix CVM Autopathing Test

I have a Nutanix cluster that needs to be upgraded from 3.1.2 to 3.5.2.1 (or 3.5.3.1 if it is out by the time I get around to upgrading it). That got me to thinking about the upgrade process. When you perform a Nutanix Operating System (NOS) upgrade, it performs what Nutanix calls a “rolling upgrade”. This in effect only performs the upgrade on one CVM at a time. While the CVM is being upgraded, the storage on that node is directed to another CVM.

My first thought was, “How does that actually work”? Thanks to Zach Vaughn @z_n_v, Nutanix SE Extraordinaire, my eyes were opened.  When the cluster detects that a CVM is down, it SSHs to the Hypervisor (I’m referring to ESXi) and adds a route to the external IP of another CVM in the cluster. The cluster performs this check every 30 seconds, so it is possible that your VM will be without storage for 30 seconds. How disasterous could this be? (I’m told that as of NOS version 3.5.3.1 this will be much faster than 30 seconds). The following video shows what happens.

This test was performed on a Nutanix 1350 block running NOS 3.5.2.1. The desktop is running on Node C. I start encoding a video using handbrake which is writing to the user’s desktop on the local disk. When I shut down the CVM on Node C the desktop appears to hang for 20 seconds. However, it is possible that the PCoIP server process stops responding for those 20 seconds, as when the desktop resumes you can see that it has still received pings from the hypervisor.

I ran this test from a different machine and the View Client seemed to stay connected. The difference being that it was an iMac connected via ethernet and I recorded the video on my Macbook Pro connected via wireless. The desktop continued to receive pings, but the handbrake process stopped while the disk was unavailable for about 20 seconds and then resumed when the route to the CVM was changed on the hypervisor. If I can get that to work again I’ll try to post another video.

Export Nutanix Configuration to CSV through Powershell and REST API

What do you do when you have over 100 Nutanix nodes scattered across multiple datacenters and need to audit the configurations, or record the configurations for documentation?

Write a powershell script that queries the REST API of course!

In this instance I needed a known starting point.  I didn’t have all of the IP addresses of the CVMs, hosts, etc in a format that I could query.  What I did have was all of the hosts in vCenter along with all of their CVMs.  So this script starts by connecting to all of the vCenters in the Datacenters and getting a list of all of the CVMs and their IP addresses.  It then runs REST API queries against the CVM IPs.


Here’s what the output looks like when opened in Excel (and scrubbed of proprietary information):

image


Any blocks that are not configured yet, or are not running a version of NOS that has the REST API, or do not have network connectivity will return System.Collections.Hashtable values as you can see below.

image

Nutanix Block Startup / Shutdown Powershell Scripts

Anyone who has Nutanix lab blocks that need to be started / stopped frequently may appreciate these scripts.

Upgrade Nutanix 1350 block to ESXi 5.5

Nutanix recommends that you upgrade to vSphere 5.5 using the VMware Update manager instead of directly mounting the ISO.

Another way to upgrade instead of installing Update Manager is to just download the offline bundle and run the command:

esxcli software vib update –d “FILEPATH to OFFLINE BUNDLE”

Here are the steps that I used to upgrade my nodes from ESXi 5.0.0 to ESXi 5.5.

  1. Download ESXi 5.5 bundle from VMware.
  2. Upload the bundle to the root of my Nutanix datastore

    image

  3. SSH to the CVM.  From the CVM we can execute a script that will run on all of the hosts:

    for i in hostips; do echo $i && ssh root@$i "esxcli software vib install -d /FILEPATH TO OFFLINE BUNDLE"; done

    *I missed that the hostips is encapsulated with backticks and not ‘’ single quotes so I just logged onto each host and ran “esxcli software vib install –d /FILEPATH TO OFFLINE BUNDLE”

    image

  4. Shutdown the CVM.  We are able to shut down one CVM at a time without disrupting the state of the cluster.   Then reboot the host.

     image

  5. Rut-roh!  My host didn’t come back into vCenter.  When I try to force it to reconnect it tells me that some virtual machines powered back on without following the cluster EVC rules.  Upgrading to ESXi 5.5 must have reset the EVC setting on that host.

    image

    To remedy it I shut down the CVM, force the host to reconnect, then power the CVM back on.  On the next node I just put the host into maintenance mode before I reboot.

Upgrade to Nutanix OS 3.5.2.1 from 3.1.3

My 1350 lab block came with Nutanix OS 3.1.3.  A block refers to a 2U chassis with 4 nodes (or in my case 3, as that is the minimum number of nodes required to create a storage cluster) and Nutanix OS refers to the abstracted virtual storage controller and not the bare metal hypervisor.

Below is a node that I have removed that is sitting on top of the chassis.

node

The bare metal server node is currently running VMware ESXi 5.0 and the Nutanix OS runs as a virtual machine.  All of the physical disks are presented to this VM through the use of Direct PassThru.

The latest version of Nutanix OS is 3.5.2.1, so I want to run through the upgrade procedure.

  1. Log onto a Controller VM (CVM – another name for Storage Controller or Nutanix OS VM).  Run the following command to check for the extent_cache parameter.

    for i in svmips; do echo $i; ssh $i "grep extent_cache\~/config/stargate.gflags.zk"; done

    image

    If anything is returned other than No such file or directory, or a parameter match is returned, the upgrade guide asks you to contact Nutanix support to remove this setting.

  2. We need to confirm that all hosts are part of the metadata store with the following command:

    nodetool –h localhost ring

    image

    Hmm… Running that command seems to have returned a haven of errors.  Maybe my cluster needs to be running for this command to work?  Let’s try “cluster start” and try this again.

    image

    Ok, that looks more like what I’m expecting to see!

  3. I’m skipping the steps in the guide that say to check the hypervisor IP and password since I know they’re still at factory default.  Now I need to enable automatic installation of the upgrade. 

     image    

  4. Log onto each CVM and remove core, blackbox, installer and temporary files using the following commands:

    rm –rf /home/nutanix/data/backup_dir
    rm –rf /home/nutanix/data/blackbox/*
    rm –rf /home/nutanix/data/cores/*
    rm –rf /home/nutanix/data/installer/*
    rm –rf /home/nutanix/data/install
    rm –rf /home/nutanix/data/nutanix/tmp
    rm –rf /var/tmp/* 

    image

  5. The guide says to check the CVM hostname in /etc/hosts and /etc/sysconfig/network to see if there are any spaces.  If we find any we need to replace them with dashes.

    image

    image

    No dashes here!

  6. On each CVM, check that there are no errors with the controller boot drive with the following command:

    sudo smartctl –a /dev/sda | grep result 

    image

  7. If I had replication, I would need to stop it before powering off my CVMs.  However, since this is a brand new block, it’s highly unlikely that I have it set up.

  8. Edit the settings for the CVM and allocate 16GB of RAM, or 24 GB of RAM if you want to enable deduplication.  In production, this requires shutting down the CVMs one at a time, changing the setting, then powering the CVM back up, waiting to confirm that it is back up and part of the cluster again, and then shutting down the next CVM to modify it.  However, since there are no production VMs running in the lab I can just stop the cluster services, shutdown all of the CVMs, make the change, and then power them all back on.

    To stop cluster services on all CVMs that are part of a storage cluster log onto the CVM and use the command:

    cluster stop

    image 

    We can confirm that cluster services are stopped by running the command:

    cluster status | grep state

    We should see the output: The state of the cluster: stop.

    image

    We can now use the vSphere client, vSphere Web Client, PowerCLI, or whatever floats your boat to power off the CVMs and make the RAM changes.

    image

    image

  9. Power the CVMs back on, grab a tasty beverage of your choice, then check to see if all of the cluster services have started using: cluster status | grep state.  The state of the cluster should be “start”.

  10. Next we need to disable email alerts:
    ncli cluster stop-email-alerts

  11. Upload the Nutanix OS release to /home/nutanix on the CVM.  Or if you’re lazy like me just copy the link from the Nutanix support portal and use wget.

     image

  12. Expand the tar file:

    tar –zxvf nutanix_installer*-3.5.2.1-* (or if you’re lazy tab completion works as well)

    image

  13. Start the upgrade
    /home/nutanix/install/bin/cluster –i /home/nutanix/install upgrade

    Here we go!

    image

  14. You can check the status of the upgrade with the command upgrade_status.

    image

    You’ll know the upgrade is progressing when the CVM that you’re logged into decides to reboot.

    image

    8 minutes later… One down two to go!

    image

    11 minutes in…

    image

    13 minutes later… up to date!

    image

  15. Confirm that the controllers have been upgraded to 3.5 with the following command:

    for i in svmips; do echo $i; ssh -o StrictHostKeyChecking=no $i \cat /etc/nutanix/svm-version; done

    image

  16. Remove all previous public keys:

    ncli cluster remove-all-public-keys

    image

  17. Sign in to the web console:

    image

    Behold the PRISM UI!

     image

Copy files between ESXi hosts using SCP

Need a quick way to move files on one datastore to the datastore of another host that is not within the same vCenter?

In a Nutanix environment SSH is enabled on the hosts so we can use SCP to do this.  I needed to move an ISO repository from the production cluster to the TEST / DEV cluster.  Log into the source host as root, change directory to the datastore folder  (/vmfs/volumes/DATASTORE/FOLDER) and then run the following command:

scp –r * root@DESTINATION:/vmfs/volumes/DATASTORE/FOLDER

# The destination FOLDER must already exist on the destination DATASTORE.

Use a Script to Power On/Off a Nutanix Block

Nutanix is able to accept IPMI commands from the command prompt.  This requires use of the ipmiutil tool.

The following are examples of how to power on and off the nodes using the command line:

Power Up:
ipmiutil reset –u -N node –U username -P password
Example: ipmiutil reset –u -N 192.168.1.1 –U ADMIN -P ADMIN

Power Down:
ipmiutil reset –d -N node –U username -P password
Example: ipmiutil reset –d -N 192.168.1.1 –U ADMIN -P ADMIN