Recently I had a design session with a customer looking to setup a fairly unique DR scenario. They are a health care organization with 2 central datacenters located within the same town 10 miles apart.
Additionally they had hundreds of clinics and offices scattered across the state that had no DR plans in place. A quick solution was to setup vSphere Replication between these external locations and the primary DC in the HQ. But the customer wanted an additional layer of security.
The proposal was to setup a vSAN stretched cluster between the primary and secondary DC’s. Using nested fault tolerance each DC would have FTT=2 while using FTT=1 between the DC’s.
This would give the customer a primary target for all external DC’s giving them FTT=2. Additionally the primary DC was replicating to a secondary DC which also enabled FTT=2. This gave the customer resiliency both within a site and between sites.
This was only possible due to the low latency between primary and secondary sites so if you consider something like this please keep in mind the stretch vSAN cluster requirements:
Data Site to Data Site Network Latency
Data site to data site network refers to the communication between non-witness sites, in other words, sites that run virtual machines and hold virtual machine data. Latency or RTT (Round Trip Time) between sites hosting virtual machine objects should not be greater than 5msec (< 2.5msec one-way).
Finally the customer elected to host the witness node at a non-production test facility in another city. Luckily the networking and site latency requirements for this are much more lenient:
Data Site to Witness Network Latency
This refers to the communication between non-witness sites and the witness site.
In most vSAN Stretched luster configurations, latency or RTT (Round Trip Time) between sites hosting VM objects and the witness nodes should not be greater than 200msec (100msec one-way).
In my next post I’ll go over some interesting behavior we discovered with vSphere Replication with regard to FTT settings and vSAN.
If you’ve been living under a rock you don’t know anything about wannacry. If that’s the case you may already be in trouble.
But if you’re on top of things then you know one of the major recommendations beyond patching your systems (you should be doing this anyway) is to disable smb1 across your environment.
Beyond breaking many things like printers, scanners and folder shares for legacy applications; it also will break AD authentication from the virtual center virtual appliance. It’s key to note that the windows installation is not effected by this.
VMware has a good KB article around this which calls out the requirements for SMB1:
Luckily there is a workaround you can apply to the VCSA to enable smb2 for authentication.
- SSH into the VCSA
- enable the bash shell
- enter the bash shell
- Set the SMB2Enabled Flag in likewise’s config:
- /opt/likewise/bin/lwregshell set_value ‘[HKEY_THIS_MACHINE\Services\lwio\Parameters\Drivers\rdr]’ Smb2Enabled 1
- You can verify the values with the following command:
- /opt/likewise/bin/lwregshell list_values ‘[HKEY_THIS_MACHINE\Services\lwio\Parameters\Drivers\rdr]’
- /opt/likewise/bin/lwsm restart lwio
Once you’ve made these changes, your vcsa will once again be able to authenticate via AD using smb2.
Reading the headlines over the last few weeks, you would think the world in IT is coming to an end.
And in some ways, things are scarier now then they have been for some time. IT departments are now reaping the decision to not patch their environments or to stay on legacy operating systems.
At the end of the day, the release of wannacry and it’s new variants as well as the new ‘Hera’ and ‘Athena’ which compromises ALL windows operating systems is a wakeup call for all of IT. We must now assume that our systems are vulnerable at all times. Assume that a 0-day vulnerability is in the wild and targeting you.
How can you protect yourself in this environment? There are several ways, some cost prohibitive and some operationally prohibitive.
- Place every device behind a dedicated firewall (physical, virtual or some combo)
- Use pvlans to isolate every virtual machine (doesn’t help with physical as much)
- create thousands of vlans and create thousands of ACL’s on net-devices
Creating thousands of vlans simply does not scale if you have more then 4,000 devices. Beyond that, creating and managing the acls necessary for this is operationally unfeasible.
Isolating every device behind a dedicated firewall is cost prohibitive and operationally prohibitive as well.
Really only VMware NSX is positioned today to be able to cope with todays environment. By placing a stateful firewall at every vNic within your environment you can massively reduce the threat scope within your environment. Moving to a zero-trust stance means only specifically allowed traffic to VM’s is valid, everything else is blocked. Combine this with Identity based fire-walling and you greatly shrink even that limited scope to specific allowed users.
I won’t post additional material because there literally are thousands of posts and pages but if you have not looked at NSX, you really need to. It’s expensive, no doubt but today it’s really the only option with the new frankly terrifying security field we’re all now part of.
After upgrading my homelab to vSAN 6.6 I noticed that my memory usage suddenly jumped up around 2.5 times normal consumption. Wondering if I had missed something I went back through and read the release notes with more focus and tracked down this tidbit:
- Object management improvements (LSOM File System) – Reduce compute overhead by using more memory. Optimize destaging by reducing cache/CPU thrashing.
There it is, in plain english, vSAN 6.6 improves performance by utilizing more memory. Now for most folks in true production environments, a few extra GB won’t be noticed but if your lab only has 64GB of RAM, anything extra get’s noticed quickly.
Based on some calculations I’ve been able to use it looks like the RAM consumption roughly doubles what it was before. The original formula for calculating RAM usage could be found here from VMware:
I spoke to a few folks internally at VMware and indeed the numbers will shortly be updated to reflect the new changes. One thing to note based on my home tests, enabling or disabling Dedup, compression and the new encryption features has no impact on the memory changes.
Make sure to plan accordingly when thinking about upgrading to vSAN 6.6
vCloud Director has supported the control of vCNS and more recently NSX for some time. The mainstays of this were the deployment and management of Edge appliances. Now vCNS/NSX edges have had the same basic functionality but until very recently, vCloud Director was unable to take full advantage of the newer NSX edge features. Instead it worked with NSX in compatibility mode deploying older style vCNS edges or new NSX edges but limiting what functionality they had.
With vCD 8.20 the deployment of edges has not changed, you’ll still get the same basic deployment options but you will discover a few new options and additional text then in previous Edge deployments.
Once the edge is deployed vCD can then control and manage the edge as before. What’s new is the ability to upgrade the edge to advanced functionality. While the same edge is deployed it now communicates with vCD using the current generation of API’s opening a new set of features and functions.
To upgrade the Edge to this new functionality right click the edge and select ‘Convert to Advanced Gateway.’ In my testing this has not caused any disruption to existing services but take caution when performing this upgrade.
Managing the edge still requires you to right click the edge and select ‘Edge Gateway Services’ but now a new HTML5 control menu will open.
The new UI will look familiar if you’ve read my post on the new distributed firewall controls but let’s take a look at the basic edge configuration.
Clicking on the ‘Edge Settings’ will bring up the deployment settings on the edge:
One nice thing that VMware has added is some sanity checks on input data. Previously when working with Edges there was no data checks until you attempted to submit a change. This often left you attempting to make a change multiple times until you determined where your error was and corrected it.
I purposely input some bad data so you can get an idea of how this looks and works.
Hopefully you see how the new edge functionality works from a basic deployment standpoint. We’ll continue with the Edge Logical Firewall in our next post.
One of the VMware technologies I simply could not wrap my mind around was vRealize Automation (formerly vCAC.) I made the decision to buckle down and see if I could get it up and running in my lab and try to finally tackle the suite.
I had no problem getting the VRA7 appliance to deploy which speaks volumes over the original vCAC5 deployment I did almost 2 years ago (go VMware.) One place I kept stumbling however was I couldn’t get vCloud Director to register as an EndPoint and collect any data. I went over everything I could think of when I finally noticed this in the log:
Reading into the error I noticed vRA was attempting to reach vCloud Director at https://vcd-lab.justavmwblog.com/api/api/versions. Notice the second ‘/api/ which was clearly our issue. Suspecting this was not a vRA issue, I took a look at the public addresses section of vCloud and look what I discovered:
Notice the API URL has a /api appended on the end of the FQDN. Now I am not sure how that got there to be honest, but after removing the /api I was left with the following:
Upon applying the settings, vRA was able to connect without issue using the following configuration with vRA:
So moral of the story is to always check the logs and see what API endpoint vRA is trying to use.
As I’ve been slowly adding components to my NUC lab it’s become painfully clear that the single gigabit interface on each of my nics leads to problems. This is exasperated by running hybrid vSan as the shared storage system and it can be a bit of bandwidth hog.
So I needed a way to limit vSAN so as to not gobble up all the bandwidth on my servers and cause downtime. I accomplished this using the network resource pools within vSphere. Specifically I guaranteed a reservation for virtual machine, management and vMotion traffic as well as vSAN. The key however was to set a limit on the bandwidth vSAN could use, thus limiting any resync’s or vMotions due to maintenance mode from flooding my single interface and causing downtime.
I’ve been working on my homelab for the last few days and have had a hard time getting VSAN working. In the end it was a combination of misconfigurations in my lab and me trying to be ‘creative.’ Moral of the story is don’t be creative….
So after getting everything online my 2 year old thought it would be fun to press the bright blue LED power button on one of my NUCs. Thankfully I had the FTT setting to 1 on my cluster so the other 2 nuc’s were able to handle the failure and I never lost my VM’s (yay vSAN.)
Once I turned on the third nuc however it kept showing as degraded in my environment no matter what I did. Turns out the way to fix this was to change the FTT policy to ‘0’ and apply the new storage policy to all VM’s in the environment.
This took a bit to sync the VM data but once complete I was able to once again change the FTT policy to ‘1’ and again apply to all the VM’s.
I had roughly 200GB of data to sync but it was showing the esxi01 nuc as ‘reconfiguring’ rather then ‘degraded’ and I was able to see all the VM’s now syncing via the vSAN health monitor.
Now if I had a fourth node in the cluster, this would not have happened but alas I can only support 3 at the moment, so this is a compromise I can work with.
I’ve been working on my homelab for the last few months and I finally have it at a point where I am content with how it’s behaving.
I opted to use the Intel NUC i7 model for several reasons, not the least is the high wife approval factor of low energy, low heat and quiet running. My old lab was a 2u Dell R710 that sounded like an airplane taking off, needless to say the wife did not approve.
I chose the NUC5i7RYH model and threw in a 250GB SSD along with 16GB of RAM and this SSHD
Now one major drawback of the NUC is of course the low amount of supported RAM which has recently been addressed with the new skylake chipsets from intel (now supports 32GB) but I can live with 16GB. I went ahead and purchase 3 of the nodes along with 3x 32GB low profile USB drives that I could install ESXi onto.
If you’re following closely you can see that I went ahead and decided to go with a vSAN setup for my shared storage which has worked out nicely for myself. In case you haven’t signed up yet, the VMUG advantage program is fantastic for the licenses you get so head on over and sign up asap.
I do not have a 10GB switch at home but I do have a dedicated 1GB switch I use on my lab and so far the performance has been just fine, I’m not building a production level lab so having resyncs and vMotions max out at 1GB is just fine for me.
I’ve installed vSphere6 and NSX within the environment. This has allowed me to dive into my nesting experience and setup multiple versions of vSphere all running on the host parent lab. Currently I am running vSphere5 and vSphere6 in a nested environment with 3 nested ESXi hypervisors and “shared” storage in each lab. I tend to keep the nested labs powered down unless they are in active use since memory is at a premium in my lab.
I’ll post more in following posts regarding powercli automation used to deploy, configure and manage these nested labs. In the meantime reach out if you have any questions.
It’s been awhile since I updated the blog but things have been quite busy for myself. My family and I will be welcoming our third child in 2 months and I’ve left my previous job of over 9 years to join VMware. I’ll be taking on the role of NSX Technical Account Manager, something I am really looking forward to.
Now that things have somewhat calmed down I plan to get back to updating the blog on a semi-regular basis. Feel free to hit me up with any topics or ideas you’d like to see.