04.05.09
The Great vSwitch Debate – Part 3
OK…in Part 1 of this series, we introduced the concept of a vSwitch and touched on some of the options available. In Part 2, we talked about some of the security features available in the vSwitch. In this Part 3, we’re going to talk about the load balancing features that are available in the vSwitch.
In a vSwitch, load balancing policies describe the different techniques that will be used for distributing the network traffic from all the virtual machines that are connected to the vSwitch and its subordinate Port Groups across the physical NICs associated with the vSwitch. There are several options available for load balancing as shown below:
- Load Balancing Policies
- vSwitch Port Based (default)
- MAC Address Based
- IP Hash Based
- Explicit Failover Order
Something to keep in mind is that all of these load balancing policies we’re going to discuss here affect only traffic that is outbound from your ESX host. We have no control over the traffic that is being sent to us from the physical switch. Also, all of these techniques apply to connections between the vSwitch uplink ports (i.e. the physical network adapters affiliated with a virtual switch) and the physical switch. Additionally, these load balancing policies have no effect on the connections between the virtual network adapter in a virtual machine and the vSwitch. Figure 1 shows the scope of discussion for this installment of our series on vSwitches.
To help illustrate the load balancing concepts, we’re going to work with the configuration shown in Figure 2. In this graphic, we have a single vSwitch that is connected to two pNICs. We’ve also configured two Port Groups (PG_A and PG_B) on the vSwitch. For purposes of discussion, our vSwitch has eight ports configured (an impossible configuration in real life!).
We’ll use Figure 2 as the backdrop for all our discussions going forward. On an editorial note – in all my examples, I am assuming that the load balancing approach is set at the vSwitch level and is not overridden at the Port Group. If anyone has a really good example of why I would want to override the load balancing approach at the Port Group, please leave me a comment!
In all load balancing scenarios, the affiliations that are made between a vNIC and a pNIC are persistent for the life of the vNIC or until a failover event occurs (which we’ll cover a little later). What this means is that when a vNIC gets mapped to a pNIC, all outbound traffic from that vNIC will traverse the same pNIC until something (i.e. vNIC power cycle, vNIC disconnect/connect, or a detected path failure) happens to change the mapping.
Note: Each pNIC can be associated with only one vSwitch. If you want a pNIC to be affiliated with more than one network, you will need to use 802.1Q VLAN Tagging and Port Groups!
Now, on to the load balancing approaches!
vSwitch Port Based Load Balancing
The first load balancing approach I want to discuss is vSwitch Port Based (I’ll refer to this as simply “Port Based” load balancing), which is the default option. In the interest of full disclosure, let me say that this is my favorite type of load balancing. I tend to use this except in situations which can truly benefit from IP Hash.
In Port Based load balancing, each port on the vSwitch is “hard wired” to a particular pNIC. When a vNIC is initially powered on, it will be dynamically connected to the “next available” vSwitch port. See Figure 3 for the first example.
In the example shown in Figure 3 the virtual machines are powered up in order from VM1 to VM5. You’ll notice that VM3 is connected to PG_B, yet it still winds up affiliated with vSwitch port #3 and pNIC #1. This is because, as you’ll recall from Part 1, a Port Group is merely a template for a vSwitch connection rather than an actual group of ports. So, what we wind up with after this initial power-up sequence is shown in Table 1:
In Scenario 2, we’re building on the configuration presented in Scenario 1. In this second scenario, the following events have occurred since the end of Scenario 1:
- VM2 was powered off
- VM6 was powered on
- VM2 was powered back on
The result is shown in Figure 4.
Notice that vSwitch port #2 is now connected to VM6, yet it retains its association with pNIC2; whereas VM2 is now connected to vSwitch port #6, also on pNIC2. We wind up with the configuration represented in Table 2.
MAC Address Based Load Balancing
MAC Address Based load balancing, which I’ll call “MAC Based,” simply uses the least significant byte (LSB) of the source MAC address (the MAC address of the vNIC) modulo the number of active pNICs in the vSwitch to derive an index into the pNIC array. So, basically what this means in our scenario with two pNICs is this:
Assume the vNIC MAC address is 00:50:56:00:00:0B, therefore, the LSB is 0×0B or 11 decimal. To calculate the modulo, you divide (using integer division) the MAC LSB by the number of pNICs (thus 11 div 2) and take the remainder (1 in this case) as the modulo. The array of pNICs is zero-based, so (modulo 0) = pNIC1 and (modulo 1) = pNIC2.
If we look at a scenario where we have six VMs with sequential MAC addresses (at least the LSB is sequential), we wind up with a situation like the one shown in Figure 5.
Notice that I removed the vSwitch ports from this diagram. That’s because they really don’t come into consideration with MAC based load balancing. What we wind up with is VM to pNIC mapping as shown in Table 3. The MAC LSB column shows the least significant byte of the MAC address for the vNIC in each VM. The modulo value shows the remainder of (MAC LSB div (# pNICs)), and the pNIC column indicates to which pNIC the vNIC will be affiliated.
As you can see, there is no real advantage to using this over vSwitch Port Based load balancing, in fact, you could potentially wind up with a worse distribution with MAC based load balancing. So…even though this is an option, I see no real justification for taking the extra steps to configure MAC based load balancing. This was the default load balancing approach used in ESX 2.x. I file this one in the “interesting but worthless” category.
IP Hash Based Load Balancing
IP Hash based load balancing (I’ll call it simply “IP Hash”) is the most complex load balancing algorithm available, it also has the potential to achieve the most effective load balancing of all the algorithms. The problems with this algorithm, from my perspective, are the technical complexity and the political complexity. We’ll discuss each as we go along.
In general, IP Hash works by creating an association with a pNIC based on an IP “conversation”. What constitutes a conversation, you ask? Well, a conversation is identified by creating a hash between the source and destination IP address in an IP packet. OK, so what’s the hash? It’s a simple hash (for speed) – basically (((LSB(SrcIP) xor LSB(DestIP)) mod (# pNICs)) which all boils down to: Take an exclusive OR of the Least Significant Byte (LSB) of the source and destination IP addresses and then compute the modulo over the number of pNICs. It’s actually not that different than the calculation used in the MAC based approach.
When configuring IP Hash as your load balancing algorithm, you should make the configuration setting on the vSwitch itself and you should not override the load balancing algorithm at the Port Group level. In other words, ALL devices connected to a vSwitch configured with IP Hash load balancing must use IP Hash load balancing.
A technical requirement for using IP Hash is that your physical switch must support 802.3ad static link aggregation. Frequently, this means that you have to connect all the pNICs in the vSwitch to the same pSwitch. Some high-end switches support aggregated links across pSwitches, but many do not. Check with your switch vendor to find out. If you do have to terminate all pNICs into a single pSwitch, you have introduced a single point of failure into your architecture.
It is also important for you to know that the vSwitch does not support the use of dynamic link aggregation protocols (i.e. PaGP/LACP are not supported). Additionally, you’ll want to disable Spanning Tree protocol negotiation and enable portfast and trunkfast on the pSwitch ports.
All this brings up the political complexity associated with IP Hash – the virtualization administrator can’t make all the configuration changes alone. You have to involve the network support team, which in many organizations, isn’t worth any possible performance improvement!
So, let’s assume that you have one VM (one single IP address) copying files between two file servers (two unique IP addresses) See Table 4:
As you can see, we now have one VM taking advantage of two pNICs. There are obvious performance advantages to this approach! But, what happens if the two file servers have IP addresses that compute out to the same hash value, as shown in Table 5?
In this example, both conversations map to the same pNIC, which kind of defeats the purpose for implementing IP Hash in the first place! What it all boils down to is this:
To derive maximum value from the IP Hash load balancing algorithm, you need to have a source with a wide variety of destinations.
Where most people want to use IP Hash is for supporting IP Storage on ESX/i (remember, that’s my notation for either ESX or ESXi). Since there is a single source IP address (the IP address of the vmkernel), you need to have multiple destination IP addresses to be able to take advantage of the load balancing features of IP Hash. In many IP Storage configurations, this is not the case. NFS is the primary culprit – it is very common to have a single NFS server sharing out multiple mount points, which all share the NFS server’s IP address. Many iSCSI environments suffer from the same problem – all the iSCSI LUNs frequently live behind the same iSCSI Target, thus a single IP address.
The lesson to this story is really quite simple:
If you want to use IP Hash to increase the effective bandwidth between your ESX/i host and your IP Storage subsystem, you must configure multiple IP addresses on your IP Storage. For NFS, this means either multiple NFS servers or a single server with multiple aliases, and for iSCSI, it means that you’ll want to configure multiple targets with a variety of IP addresses.
So, as you can see, the IP Hash load balancing algorithm offers the best (under the right set of circumstances) and the worst of all options. It offers the best load balancing and performance under the following circumstances:
- IP Hash load balancing configured on vSwitch with multiple uplinks
- Static 802.3ad configured on all relevant ports on the pSwitch(es)
- Multiple IP conversations between the source and destinations with varying IP addresses
If you don’t meet ALL those requirements, IP Hash gains you nothing but complexity. IP Hash gains you the worst of all options because of the following:
- Significantly increased technical complexity
- Significantly increased political complexity
- Potential introduction of a single point of failure
- No performance gains if there is a single IP conversation
The long and short of it comes down to this – use IP Hash load balancing when you understand what you’re doing and you KNOW that it will provide you concrete advantages. This is not the load balancing algorithm for the new VI administrator, nor for an administrator who is not on good terms with their network support team. My recommendation for most environments is to start with vSwitch Port Based load balancing and monitor your environment. If you see that your network throughput is causing a problem and you can satisfy the conditions I set out above, then – and only then – implement IP Hash as your load balancing algorithm.
Explicit Failover Order Load Balancing
This is the load balancing algorithm for the control freak in the crowd. With the Explicit Failover Order load balancing algorithm in effect, you are essentially not load balancing at all! Explicit failover will utilize, for all traffic, the “highest order” uplink from the list of Active pNICs that passes the “I’m alive” test. What does the “highest order” mean? Well, it’s simply the pNIC that has been up the longest!
You manage the failover order by placing pNICs into the “Active Adapters,” “Standby Adapters,” and “Unused Adapters” section of the “Failover Order” configuration for the vSwitch or Port Group. pNICs listed in the “Active Adapters” section are considered when calculating the highest order pNIC. If all of the pNICs in the Active Adapters section fail the “I’m alive” test, then the pNICs listed in the “Standby Adapters” section are evaluated. Adapters listed in the “Unused Adapters” section are never considered for use with the Explicit Failover Order load balancing approach.
This is another policy that I file in the “interesting but worthless” category.
Load Balancing and 802.1Q VLAN Tagging
It’s important to note that, for all of the load balancing options I’ve discussed, you can still use 802.1Q VLAN Tagging. The thing you have to be careful of is to ensure that all ports configured in a load balanced team have the same VLAN trunking configuration on the pSwitch. Failure to configure all the pSwitch ports correctly can result in very difficult to troubleshoot traffic isolation problems – it’s a good way to go bald in a hurry!
Load Balancing Summary
To summarize on network load balancing options…even though there are four load balancing options available for your use, I recommend that you stick with one of two:
- vSwitch Port Based Load Balancing: This is the default (and preferred) load balancing policy. With zero effort on the part of the virtualization administrator, you achieve load balancing that is – in most cases – good enough to meet the demands of the majority of virtual environments. This is where I recommend that you begin, especially if you are new to VMware technologies. Stand this configuration up in your environment and monitor to see if the network is a bottleneck. If it is, then look to IP Hash as a possible enhancement for your setup.
- IP Hash Load Balancing: This is the most complex, and possibly, the most rewarding load balancing option available. If you’re comfortable working in your virtual infrastructure, if you understand the networking technologies involved, and if you have a good working relationship with your network administrator, IP Hash can yield significant performance benefits. The problem I have with this algorithm is that I see it implemented in far too many environments where network throughput is not a problem. People seem to think that a gigabit (or even a 10Gb) Ethernet connection just doesn’t have enough guts to handle 20, 30, or more virtual machines. I beg to differ! In most cases, you’ll find that a single GbE connection is more than capable of handling the load, so why not let it? The area where I do sometimes see a need for IP Hash is with IP based storage, but even here, it is frequently not needed.
Do yourself a favor – if you don’t need to use IP Hash, and especially if your environment isn’t setup to be able to take advantage of the benefits of IP Hash, KISS it and stay with vSwitch Port Based Load Balancing. You’ll be glad you did!










Penthouse Brendan Read Joel Hackney said,
April 5, 2009 at 5:17 PM
[...] The Great vSwitch Debate – Part 3 « Ken’s Virtual Reality [...]
Christofer Hoff said,
April 7, 2009 at 9:53 AM
Ken:
Your blog is absolutely fantastic.
/Hoff
Ken Cline said,
April 7, 2009 at 1:04 PM
Thanks, Chris…hope I can keep it that way!
The great vSwitch debate - Storage Informer said,
April 7, 2009 at 10:18 AM
[...] The Great vSwitch Debate – Part 3 [...]
Steve Chambers said,
April 8, 2009 at 6:07 PM
Great work, Ken.
Ken Cline’s Great vSwitch Debate | Arnim van Lieshout said,
April 9, 2009 at 6:19 AM
[...] The Great vSwitch Debate – Part 3 [...]
Massimo Re Ferre' said,
April 9, 2009 at 8:37 AM
Ken,
>If anyone has a really good example of why I would want to override the
>load balancing approach at the Port Group, please leave me a comment!
I am not in front of a vCenter Console so it’s off the top of my head but…. isn’t it true that if you create two PGs that override the default vSwitch policies you can configure both PGs to have “reverse” Active/Passive Load Balancing (i.e. Failover) so that PG1 uses NIC1 and keep NIC2 in standby whereas PG2 uses NIC2 and keep NIC1 in standby? This is useful if you want to segment traffic in NIC constrained configurations (i.e. 2 NICs supporting both VM traffic as well as VMotion / or whatever).
As I said just an idea…
Massimo.
Ken Cline said,
April 9, 2009 at 11:34 AM
Yes, Massimo, that is true, but I question if the gain is worth the pain. If you’re running GbE, chances are good you’re not bandwidth constrained to begin with, so what are you really gaining by adding the complexity? Yes, you can deterministically say you know what traffic is using which interface (until there’s a failure), but do you really care?
I suppose there are environments where you really need to separate traffic – and NIC constraints would be a motivator, but I would advocate starting with the defaults and implementing the PG overrides ONLY if there was a real problem to solve.
Massimo Re Ferre' said,
April 9, 2009 at 5:00 PM
I agree Ken.
I was just thinking about situations where it’s not worth having very many NICs but yet it would make sense to separate traffic that might have potentially high bursts (i.e. VMotion) from something that needs to be somewhat predictable (i.e. VMs).
Is this for everyone? Not at all. I agree we have enough CPU / Memeory and Network bandwidth today that half would be sufficient.
Massimo.
Steven Beard said,
April 9, 2009 at 11:27 AM
Fantastic trio of articles. I have been considering IP Hash load balancing my IP storage for a while and feel you’ve saved me some pain
Great Blog…
Ken Cline said,
April 9, 2009 at 11:36 AM
Thanks Steven!
Glad to have saved you the trouble. That’s what I’m hoping to do with this series … convince people that the simple way is usually the best way.
KLC
Cameron Moore said,
April 14, 2009 at 10:01 AM
Ken,
One concept that is still unclear to me is how load balancing work on the IP Storage vSwitch/PortGroup. Your examples deal solely with how the vNICs of various VMs are assigned to pNICs, but how does that translate to IP Storage?
For example, I’m planning on building an ESX server (my first) with 12 pNICs and using 3 pNICs for my NFS storage traffic. How are connections to the pNICs made in that scenario?
Your blog is awesome so far!! Thanks!!
Ken Cline said,
April 14, 2009 at 7:58 PM
Hi Cameron,
The vSwitch doesn’t care whether it is a VM or a vmkernel device connecting to it. The vSwitch is a layer 2 device that passes Ethernet frames, nothing more, nothing less. All of the concepts I discussed apply equally to virtual machines, VMotion, and IP storage (NFS / iSCSI).
Thanks, and glad you’re enjoying the articles.
KLC
Cameron Moore said,
April 16, 2009 at 1:54 PM
Ken,
I understand that the concepts are the same. What I don’t understand is how ESX connects IP storage processes to the pNICs. Specifically, I’m trying to understand how the load would be distributed across multiple pNICs when pointing to a single NFS host with one NFS volume. Does ESX make a separate connection for each VMDK/File? Or only one for each NFS volume?
I’ve only deployed a single-pNIC ESXi host so far, so perhaps I’m misunderstanding how the vmkernel ports are supposed to work.
Flex-10 lessons learned « Frank Denneman said,
April 26, 2009 at 10:40 AM
[...] • Ken Cline and his great vSwitch debate series; http://kensvirtualreality.wordpress.com/2009/04/05/the-great-vswitch-debate%e2%80%93part-3/ [...]
Frank Denneman » Blog Archive » Flex-10 lessons learned said,
May 1, 2009 at 12:04 PM
[...] • Ken Cline and his great vSwitch debate series; http://kensvirtualreality.wordpress.com/2009/04/05/the-great-vswitch-debate%e2%80%93part-3/ [...]
Les secrets du vSwitch - Hypervisor.fr said,
June 8, 2009 at 7:56 PM
[...] La partie 3 est particulièrement intéressante techniquement car Ken détail les différentes méthodes de load-balancing. On y apprend, par exemple, l’utilisation des ports d’un vSwitch en fonction de la séquence de démarrage des vm : [...]
Abby said,
June 9, 2009 at 5:52 PM
There are only five VMs in figure 3. This text is a little confusing:
“In the example shown in Figure 3 the virtual machines are powered up in order from VM1 to VM6.”
For IP Hash Based Load Balancing, must you be running IPv4 — what happens if you have IPv6 traffic or some other protocol?
Thanks for this series of articles.
Ken Cline said,
June 9, 2009 at 10:27 PM
Thanks for catching my counting mistake – it’s fixed.
If you’re using a protocol other than IPv4, then whatever values happen to be at the “standard IPv4 address” offset within the datagram are used as if they were IP addresses.
Good question! Thanks for visiting…
KLC
Clement said,
July 24, 2009 at 9:55 AM
Hi,
Something is confusing me. You tell in chapter 6 that the links between the vNic of the vSwitch and the pNIC is static. What I do not understand however is how are those relationship defined. Is it an even/odd relationship ? But then it is not applied on the figure 5 of the sixth chapter…
The whole point is that I’m trying to understand why the vNic 6 is mapped to the first pNIC and not the the second one…
Should I have commented on the chapter 6 ? Well the bottom line is about the vNIC/pNIC relationship in the port base load balancing scenario, right ?
Thanks a lot, this bunch of article where very useful for me as an introduction to vSwitch. Being a total new comer in the ESX universe, I can state that you really made you point clearly.
Clement.
Ken Cline said,
August 4, 2009 at 3:19 PM
The mapping between vNIC and vSwitch(Port Group) is static for the current vNIC invocation (i.e. from the time the vNIC is powered on until it is powered off). Once a vNIC is powered off, it leaves an “available” vSwitch port, so the next time a vNIC is powered on, it is assigned to the first available vSwitch port.
I wouldn’t worry too much about it – it’s one of those academic exercises that, in the end, doesn’t really make much difference. In most cases, your host will be up for a long time and VMs will be powered on and off many, many times which will cycle the vNICs across the vSwitch ports so many times that it almost becomes random.
Thanks for stopping by – and sorry it took so long for me to reply!
KLC
Ami said,
August 30, 2009 at 4:06 PM
Hi Ken,
a)Shapo on the lesson..bravo!
b)You’ve mentioned in fig. 1 that no policy applied “here”(The vmkernel is there)….but then in the end you recomend ‘ I do sometimes see a need for IP Hash is with IP based storage’..what have I missed..?!
c)Where on Earth have you got all that XOR,LSB,,,algorithm information from..? (It’s almost as if you’ve written them by yourself) A M A Z I N G!
NFS and IP-HASH loadbalancing « Frank Denneman said,
November 13, 2009 at 10:42 AM
[...] It will not do perfect load-balancing out of the box. Remembering Ken Cline’s (@clinek) excellent article about vSwitches I knew you must dive in to the algorithm used by IP-Hash load balancing and pick [...]
deercutter said,
November 20, 2009 at 8:53 AM
Ken – We will be using IP Hash Load Balancing on our two pNic Etherchannel which wil be carrying our “service console”, “vmotion”, and “fault tolerance logging” ports. would you agree this being the preferred way?
Ken Cline said,
November 30, 2009 at 7:19 PM
Without knowing more about the details of your environment, I would probably stick to the default vSwitch Port ID based load balancing algorithm. I would even consider creating multiple port groups and having the SC & VMotion share one port group with pNIC1 as primary and pNIC2 as standby and have FT on the other PG with the primary/standby reversed.
FT & VMotion are both (potentially) heavy bandwidth users and I would like to keep them separated as much as possible. IP Hash will not give me the guaranteed separation I can get with the approach mentioned above.
Something to consider…
KLC
Ryan Gallier said,
December 2, 2009 at 5:58 PM
I still confused on the pSwitch configuraiton. If i am using the default vSwitch Port Based Load Balancing, Do i still configure an 802.3ad (static) port-group on my pSwitch? Why or why not? Keep in mind I’m running this on Cisco 3750 switches which can do cross switch 802.3ad.
Ken Cline said,
December 2, 2009 at 11:10 PM
No, you do not configure 802.3ad on your pSwitch if you are using the default vSwitch Port Based LB. You don’t need to make any special configuration changes on your pSwitch – that’s one of the great things about the default policy! When you configure .3ad, your pSwitch is going to expect all ports in the port group to act as one single port. This configuration used to work back in the ESX 2.x days (and it may still work now…), but it is not recommended and is not supported (by either Cisco or VMware).
HTH,
KLC
Ryan Gallier said,
December 3, 2009 at 9:57 AM
How does the switch then deal with the mac-address going down different physical ports during the load balancing?
Mike Williams said,
December 10, 2009 at 6:12 PM
I think your point about needing 802.3ad (etherchanneling) the pNICs is a good one when using IP Hashing. An important implication in this is that this virtually eliminates the possibility to use the IP Hashing when running ESX on blade servers, as *no* blade server switching technology that I’m aware of would allow an etherchannel across 2 switch modules. If you were using passthrough modules to allow each blade access to external switches directly, then you could connect those to some upstream stacked 3750s, 6500s running VSS or Nexus 7000s (and soon 5000s) using VPCs.
Just like the hashing on the network side using etherchannels, it’s important (as you said) to choose the hashing that’s best for what that box does. It’s if a server that dumps files to one other box, then IP hashing isnt buying anything. If it’s an application server that handles thousands of connections from a wide distribution of client IP addresses, IP hashing is probably the best option.
I’m surprised ESX hasn’t taken a tip from Cisco and added another level which is source/dest IP *AND* TCP port hashing. This allows even 2 machines that talk alot between them to have multiple conversations balanced properly over multiple links.
Ken Cline said,
April 16, 2009 at 2:36 PM
Hi Cameron,
For a single NFS server presenting a single share (or any number of shares on the same IP address), all traffic will traverse a single pNIC, regardless of load balancing options. The only way to have your traffic spread across multiple pNICs is to use IP Hash load balancing with multiple src/dest IP address pairs.
“Connections,” in the TCP/IP sense, don’t matter for load balancing. It’s all about “Conversations” with IP Hash. A conversation is comprised of a source IP address and a destination IP address. It doesn’t matter how many connections are present in the conversation, all traffic will use a single pNIC.
Hope this helps!
KLC
Ken Cline said,
December 3, 2009 at 11:21 PM
Hi Ryan,
The MAC address will not “float” between the various ports in the port group. Once a vNIC has established an affiliation with a pNIC, ALL outbound traffic for that vNIC will traverse the same pNIC (the only exception is IP Hash, where the affiliation is tied to the IP conversation rather than the vNIC). The only time that the vNIC’s traffic will change pNICs is in the event of a path failure – in which case the vSwitch will send a gratuitous ARP packet to the pSwitch to notify it of the change in interface.
Thanks!
KLC