03.29.09

The Great vSwitch Debate – Part 1

Posted in LinkedIn, Virtualization tagged , , at 6:27 PM by Ken Cline

Table of Contents

Part 1

There are many articles out discussing “best practices” for configuring virtual switches (vSwitches) in a VMware Infrastructure 3 (VI3) environment – well, here’s the first in a series of articles that present vSwitch recommendations that conform to the rules of “Virtualization According to Ken”.

For purposes of clarity, unless otherwise specified, all discussion herein applies to both VMware ESX Server (ESX) and VMware ESXi Server (ESXi). When I want to make it clear that I’m referencing both, I’ll use the construct ESX/i.

First, let’s start out by defining exactly what a VMware vSwitch is:

A VMware Virtual Switch (vSwitch) is a software construct that runs on a VMware ESX host. The vSwitch runs under the control of the hypervisor (a.k.a. the vmkernel) and is responsible for providing all network-based communications with and within an ESX/i host system. Networking components connect to vSwitches via Port Groups, which I’ll discuss a little farther down in this post. For now, let’s just know that they exist, but for simplicity’s sake, I’m going to exclude them from the first several graphics.

vSwitches provide several features that I’ll discuss in greater detail in Part 2 of this series. This initial post will cover the basics of vSwitch communications and we’ll go deeper over time. For now, simply be aware that a vSwitch has the following configuration areas:

  • Security
    • Promiscuous mode enable/disable
    • MAC Address Change enable/disable
    • Forged Transmit enable/disable
  • Traffic Shaping Configuration (outbound traffic only)
    • Average Bandwidth
    • Peak Bandwidth
    • Burst Size
  • NIC Teaming Policies
    • Active
    • Standby
    • Unused
  • Load Balancing Policies
    • vSwitch Port Based
    • MAC Address Based
    • IP Hash Based
    • Explicit Failover Order
  • Network Path Failure Detection
    • Link State
    • Beacon Probing
  • Cisco Discovery Protocol (CDP)

Additionally, there are a couple really neat differences between pSwitches and vSwitches that will be covered later, as well:

  • vSwitches do not learn MAC addresses from snooping network traffic
  • vSwitches do not participate in Spanning Tree negotiations
    • Impossible to create a spanning tree loop within a vSwitch

Detailed discussion of each of these features will be presented in a later blog article – for now, I just want to list them so I don’t get lots of comments saying “but you forgot to talk about all these features!” 

The types of communications supported by vSwitches include:

  • Management traffic to/from the ESX Service Console
  • vmkernel network traffic
    • VMotion network traffic between hosts
    • IP Storage (iSCSI and NFS) traffic
    • ESXi management traffic
  • Communications between VMs and network connected devices

These communications paths are depicted in Figure 1

vSwitch Functional Overview

Figure 1. vSwitch Functional Overview

A single ESX host can have multiple vSwitches  (up to 127 vSwitches per host). Figure 2 shows a fairly common configuration with three vSwitches. In this case, the vSwitches are used to functionally separate network traffic.

Figure 2. Multiple vSwitches

Figure 2. Multiple vSwitches

It is important to note that there is no mechanism for direct vSwitch to vSwitch communications. Devices connected to different vSwitches must traverse an external (to the vSwitch) device to communicate with each other (see Figure 3).

Figure 3. No Direct Connection between vSwitches

Figure 3. No Direct Connection between vSwitches

That seems pretty simple…but wait, there’s more!

A vSwitch is a network Layer Two device. Layer two is the Data Link Layer of the seven-layer OSI network model (http://en.wikipedia.org/wiki/Data_Link_Layer). As a Layer Two device, a vSwitch is responsible for delivering packets to adjacent devices and cannot perform any routing functions – thus, if a network packet needs to be delivered to a network segment that is not directly connected to the vSwitch, there needs to be an external (to the vSwitch – it can be physical or virtual. See Figure 3) device to perform the necessary routing functions.

In a significant architectural change from your typical physical environment, in the virtual environment, the vSwitch is responsible for providing fault tolerance features to your systems. In most physical environments, network fault tolerance is achieved through the use of two physical NICs in a server. These two NICs are “bonded” through the use of special purpose software that allows either link aggregation (combining the bandwidth of the two NICs into a single logical interface), link failover (if one NIC or path fails, traffic is automatically re-routed over the alternate NIC), or both.

Figure 4. Fault Tolerance in a Physical Environment

Figure 4. Fault Tolerance in a Physical Environment

In a virtual environment, the responsibility for fault tolerance is removed from the “originating system” a.k.a. the virtual machine, and instead falls to the vSwitch. As shown in Figure 4, the same four workloads (represented as VMs) each have the same level of fault tolerance (two physically separate paths) as in the physical example shown in Figure 3.

Figure 5. Fault Tolerance in a Virtual Environment

Figure 5. Fault Tolerance in a Virtual Environment

In the event of a path failure, the vSwitch will automatically reroute traffic over a surviving outbound link. The cabling and pSwitch port utilization advantages of moving the redundancy to the vSwitch are clear when you look at Figure 6, below.

Figure 6. Fault Tolerance - a Comparison between Physical and Virtual

Figure 6. Fault Tolerance – a Comparison between Physical and Virtual

As you can see, the number of physical switch ports required to provide the same effective level of fault tolerance is significantly reduced. In this simple example of four workloads, we went from eight pSwitch ports to two – and that doesn’t count the out-of-band management port or any other types of network connection (i.e. IP storage, backup, database, etc.).

In addition to the switch port real estate that we’re conserving, we have also significantly simplified the overall architecture of our solution – always a good thing!

…but wait, there’s more!

A vSwitch enables the use of IEEE 802.1Q VLAN Tagging (http://en.wikipedia.org/wiki/802.1Q). If you want to download the complete IEEE 802.1Q specification it’s available here: http://standards.ieee.org/getieee802/download/802.1Q-2005.pdf.

802.1Q enables the division of a single physical network into multiple logical “virtual LANs” or VLANs. There are three ways to implement 802.1Q in a VMware Infrastructure environment:

  • Virtual Guest Tagging (VGT) Mode – network packets are passed through the vSwitch to the guest OS with the 802.1Q VLAN tags intact. It is the guest OS’ responsibility to marshal/unmarshal the packets. To enable VGT mode, specify 4095 as the VLAN ID on the Port Group definition.
  • External Switch Tagging (EST) Mode – this is the method that is used most commonly on physical networks. VLAN tags are applied and removed on egress and ingress from/to the physical switch port. Network traffic that reaches the server does not have VLAN tags associated with it.
  • Virtual Switch Tagging (VST) Mode – this is the method most commonly used in VMware Infrastructure environments. In VST mode, VLAN tags are processed by the vSwitch and the guest OS receives and transmits untagged traffic. This is the method we’re going to focus on here.

When using Virtual Switch Tagging (VST) Mode, the vSwitch enables the definition of VLANs through the use of Port Groups, which we’ll discuss further in a few paragraphs. Each VLAN can be viewed as a separate Layer Two network, as such, a Layer Three device is required to route traffic between VLANs – hang on a minute … didn’t we say that a vSwitch couldn’t perform any routing? Yep…and that’s why it’s necessary to use an extra-vSwitch Layer Three device to route between VLANs – even on the same vSwitch.

Clear as mud, right? Well, let’s see if a couple of pictures can help clear things up a bit.

Once again, let’s start out with an example from the physical world. Figure 7 shows a not unusual environment in a physical datacenter. A single server needs to connect to four separate networks, therefore, there are four (eight if you want redundancy!) NICs installed to support the connectivity requirements. In most cases, VLANs are configured at the physical switch port, so it is possible to isolate a particular switch port (or set of ports) to a given VLAN. This technology has worked very well in datacenters – allowing for the logical isolation of traffic without having to build out totally separate networks for each function.

Figure 7. Typical VLAN Configuration

Figure 7. Typical VLAN Configuration

Back in the days when 100Mbps networks were the norm, this made a lot of sense. You could have dedicated bandwidth for each function up into the switch. In many cases, you would have Gigabit Ethernet (GbE) on the backbone connecting your switches, so the aggregate of all those 100Mbps client-side connections wouldn’t saturate your network. With the advent of affordable GbE to the server, there is now a glut of bandwidth that is (frequently) underutilized.

To take advantage of all that bandwidth, the logical segmentation afforded by 802.1Q VLAN trunking was deployed on the server (edge) side of the switch in addition to being used for inter-switch links (ISLs). This change significantly reduces the cost of network segmentation. Rather than having to purchase servers that support four, eight, or more NICs; switches that have enough ports to connect to all those NICs; and running all those cables to connect servers to switches – we can now use the logical segmentation capabilities of VLANs to combine all those different types of network traffic onto a single interface (Figure 8).

Figure 8. Introducing a VLAN Trunk

Figure 8. Introducing a VLAN Trunk

Obviously you would not want to rely on a single pNIC in a production environment – where’s your fault tolerance?

Now that we’ve set the stage, let’s look at how this is implemented in a vSwitch.

Take a second and look back at Figure 7 where we had a single physical server connecting to four different networks. Compare that with Figure 9 where we have the same effective configuration. Notice that the VM has only three network interfaces (vNICs) rather than the four that were present in the physical system. This is due to the fact that IP Storage is accessed via the vmkernel rather than directly from the guest OS. While it is possible to use IP Storage from within the guest, access via the vmkernel is the more common approach.

Figure 9. Single VM, Multiple Networks

Figure 9. Single VM, Multiple Networks

Look at the cable count. We still have four cables connecting the physical ESX host to the pSwitch; however, had we scaled the physical solution to four servers, we would have multiplied the network infrastructure impact by four (for a total of 16 NICs, cables, and pSwitch ports). Look at Figure 10 to see what happens in our virtual environment – quadruple the number of systems connecting into the network without incurring a single additional dollar’s worth of network infrastructure cost.

Figure 10. Multiple VMs, Multiple Networks

Figure 10. Multiple VMs, Multiple Networks

We’ll decompose this a little further, but first, I promised earlier that I would discuss Port Groups in more detail…well, the time has come!

If you order your vSwitch in the next 15 minutes we’ll throw in Port Groups at no additional charge!

A Port Group is a construct that lives on top of a vSwitch. I kind of don’t like the term “Port Group” because it implies that there are a specific number of ports in the port group – there’s not. A port group serves as a template for configuration characteristics for a vNIC connection to a vSwitch. Information that can be configured at the Port Group level pretty much mirrors the configuration parameters available on a vSwitch and includes:

  • Security
    • Promiscuous mode enable/disable
    • MAC Address Change enable/disable
    • Forged Transmit enable/disable
  • Traffic Shaping Configuration (outbound traffic only)
    • Average Bandwidth
    • Peak Bandwidth
    • Burst Size
  • NIC Teaming Policies

Items that are configured at the Port Group level override the corresponding parameter on the associated vSwitch. As mentioned above, I’ll cover these configuration items in more detail in a later post.

The Port Group feature that I want to discuss now is VLAN tagging. Before we can delve into the details, I need to provide an overview of what VLAN tagging is, and why you would want to use it in the first place. I discussed VLANs briefly up above, now I want to dig a little deeper.

VLAN tagging is a feature of the IEEE Standard for Local and metropolitan area networks Virtual Bridged Local Area Networks. I linked to this standard earlier, if you want to download the full document. To quote from the standard:

VLANs facilitate easy administration of logical groups of stations that can communicate as if they were on the same LAN.

So why do you care about that? Well, as an example, let’s say you have some “Project X” servers in your datacenter and the client user community is located in a building on the other side of your campus. The two locations are well connected from a network perspective, so wouldn’t it be great if the clients and the servers could live on the same subnet and broadcast domain? That would certainly simplify your networking setup, wouldn’t it? Prior to the advent of VLANs, the only way to accomplish this type of scenario was to stand up a totally separate network infrastructure to join the two locations – a costly and complex solution, to say the least!

With VLANs, you can configure specific ports on a switch to belong to an identified VLAN. All the ports on that switch that belong to the same VLAN are in the same broadcast domain. Ports that are configured for different VLANs are (probably) not. Figure 11 shows a single physical switch (in yellow) that has been logically partitioned into two “virtual switches”.

Figure 11. Single Physical Switch, Multiple Logical LANs

Figure 11. Single Physical Switch, Multiple Logical LANs

In many respects, this is quite similar to how VMware partitions a physical machine into multiple virtual machines, we’re just dealing with networks rather than servers. That’s all well and good – but how does that help reduce port count? Well, let’s extend the example to two switches (see Figure 12):

Figure 12. Two Switches, Two Logical LANs

Figure 12. Two Switches, Two Logical LANs

What we’ve done here is to create a “VLAN Trunk” between the two switches. A VLAN Trunk can carry the traffic from multiple VLANs and keep everything separated. In the example we’ve trunked only two ports between the switches – you could aggregate tens, hundreds, or even thousands of VLANs in one trunk.

In each of the prior examples (Figure 11 and Figure 12), all of the 802.1Q “stuff” is happening within the switch(es). The devices (client systems and servers) don’t know or care that their network traffic is getting packaged into a VLAN before it reaches its final destination. In the following paragraphs we’ll explore how all this applies to a VMware environment.

As I mentioned above, there are three types of VLAN support within ESX:

  • Virtual Guest Tagging (VGT) Mode
  • External Switch Tagging (EST) Mode
  • Virtual Switch Tagging (VST) Mode

Figure 13 shows the differences between the three modes. The graphic is presented from the perspective of traffic entering the ESX host (the yellow rectangle). The ENTER and EXIT signs designate the points at which inbound IP packets have 802.1Q VLAN tags applied and removed, respectively (this is analogous to “entering” and “exiting” the VLAN trunk). For outbound traffic, simply reverse the positions of the ENTER and EXIT signs.

Figure 13. VLAN Tagging Modes

Figure 13. VLAN Tagging Modes

 I’m going to pretty much ignore the VGT Mode since it’s not used very often and you can figure it out from my discussion of VST. EST is the technique I discussed around Figure 11 and Figure 12 – the ESX host and the VMs thereon don’t know or care that there are VLANs involved. Where I want to focus the discussion is on vSwitch Tagging Mode (VST).

Enabling VST is really quite simple – all you have to do is to put a valid VLAN number in the VLAN ID field of the Port Group definition. The valid range of VST Mode VLAN numbers is the integer numbers 1 through 4094. A VLAN ID of zero is used for EST mode and a value of 4095 is used for VGT mode. Once you’ve configured the Port Group for a particular VLAN, a vNIC (the NIC within a VM) or vmkNIC (vmkernel NIC) that connects to that Port Group will see ONLY traffic on that VLAN. It is important to mention that, if you enable VST on your vSwitch, you MUST have a physical switch that is capable of processing the VLAN tags that will be added to your IP packets.

Figure 14. Traffic Isolation by Port Group

Figure 14. Traffic Isolation by Port Group

As shown in Figure 14, a VLAN Trunk is connected to the vSwitch. The trunk is carrying two different VLANs, VLAN 100 and VLAN 200. The vSwitch will deliver the appropriately tagged packets to the corresponding Port Group which will strip the VLAN tag and pass the untagged packet on to the connected vNIC.

Something else that is important: remember we said that a vSwitch is a Layer Two device? That means that it cannot route traffic between different VLANs. So, referring back to Figure 14, virtual machines within VLAN 100 can communicate among themselves without having to exit the vSwitch to hit the physical network. The same is true for VMs within VLAN 200 – they can communicate among themselves. If, however, one of the VMs on VLAN 100 needs to communicate with one of the VMs on VLAN 200 (or vice-versa), the packets would have to exit the vSwitch via the uplink port (pNIC) to have some external Layer Three device route between the VLANs. This is shown in Figure 15.

Figure 15. Port Group Communications Paths

Figure 15. Port Group Communications Paths

OK. That’s going to do it for Part 1 of this series of articles on vSwitch configurations. Next time, we’ll look at some of the options that I glossed over in this post – and then, once you have a good understanding of what a vSwitch is, we’ll actually get into the best practice configurations for hosts with various numbers of pNICs – of course, in compliance with the rules of “Virtualization According to Ken”.

Forward to The Great vSwitch Debate – Part 2

References:

VMware ESX Server 3 802.1Q VLAN Solutions: http://www.vmware.com/pdf/esx3_vlan_wp.pdf

Data Link Layer of the seven-layer OSI network model: http://en.wikipedia.org/wiki/Data_Link_Layer

IEEE 802.1Q VLAN Tagging: http://en.wikipedia.org/wiki/802.1Q

IEEE Standard for Local and metropolitan area networks Virtual Bridged Local Area Networks (802.1Q): http://standards.ieee.org/getieee802/download/802.1Q-2005.pdf

26 Comments »

  1. [...] Ken’s Virtual Reality My Ramblings about all things Virtual « The Great vSwitch Debate – Part 1 [...]

  2. Pascal Rocheteau said,

    Hi Ken,
    Great article, the explanation are very clear and the figures very well represented: a picture tells more than 1.000 words indeed.

    Maybe just a thougth for the people that are not familiar with vlan tagging: the vlan tag will be provided by the network admin, it is not a number you invent.

    Will definitely read more of your articles.

    Sincerely

    Pascal

  3. [...] The Great vSwitch Debate – Part 1 [...]

  4. [...] The Great vSwitch Debate – Part 1 [...]

  5. athlon_crazy said,

    A very great and clear article I ever met! Though I’m not from network area, but this article simply give me good, clear explanation & easy to understand for newbie like me regarding vSwitch.

    Thanks Ken,

    • Ken Cline said,

      Thanks for the kind words – glad you found it useful. Make sure to check out the remaining articles in the series.

      KLC

  6. Fantastic article which gives a very in-depth yet understandable overview of the virtual networking concept.

    It would be very interesting to see a follow-up on this with a security focus where the different security aspects are addressed (e.g. virtual security gateway’s connected to the vSwitch, how to avoid un-managed VMs in the virtual network, potential security weaknesses in the vSwitch etc)

    Thanks for a great article Ken!

    • Ken Cline said,

      Thanks, Andreas. Good suggestions for follow-up articles! You just may get your wish :)

  7. Alex P said,

    Great post, clear and just the info needed. I am a Windows admin new to virtualization and your way of explaining things just makes sense, i guess I will be a regular visitor.

    Thanks for the info

  8. Kris said,

    Hi Ken,

    A tip: Should be cool to put the 6 parts in a PDF-file. Many people will downlaod this great articles!

    Cheers

    Kris

    • Ken Cline said,

      Hey Kris,

      Great suggestion! Maybe once I’m all the way done, I’ll roll a .pdf. I’ve still got at least one more installment to go before I put a seal on this thing!

      KLC

  9. Kris said,

    Thanx Ken, I’m looking forward!

    Kris

  10. Allen said,

    Ken, I didn’t see any contact info so I figured I’d post this comment.

    Great write-up (just read the six existing posts all at once). On this first part there’s a minor typo you might want to fix in case it confuses anyone. Plus, I think these posts are going to make great references for many VMware admins. Anyway, the part with the typo reads:

    “The cabling and pSwitch port utilization advantages of moving the redundancy to the vSwitch are clear when you look at Figure 5, below.”

    The figure below was actually Figure 6.

    - Allen

    • Ken Cline said,

      Hi Allen,

      Thanks for catching my mistake! I’ve been surprised how few I let slip through … considering these things are self-published with no editor ;)

      KLC

  11. Albert Widjaja said,

    Thanks Ken,
    this is the best article that I could found in the web for free and very clear explanation + nice Visio stencils too.

    Cheers,
    AWT

    • Ken Cline said,

      Thanks Albert! Glad you found the articles useful.

      KLC

  12. vminstructor said,

    Thanks for this series of posts!
    I have been referencing these articles in class. They are a terrific way to breakdown the concepts and the illustrations are gold.

    Dennis

    • Ken Cline said,

      Thanks Dennis,

      Glad that they’re proving useful!

      KLC

  13. [...] admin/VMware person, then I highly recommend you read Ken Cline’s fantastic blog series, The Great vSwitch Debate. Ken likes to debate with himself, which is a Good Thing, and he’s managed to succinctly and [...]

  14. [...] Lien: http://kensvirtualreality.wordpress.com/2009/03/29/the-great-vswitch-debate-part-1/ [...]

  15. Liam Grant said,

    Hi Ken,

    In figure 13 should the middle option not be PG_EST (as opposed to PG_EGT) or am I getting confused?

    Thanks, this guide is really helping me understand networks more than my basic grasp!

    Liam Grant

    • Ken Cline said,

      Hi Liam,

      Well…since there’s no such thing as EGT, I’m going to hazard a guess that you’re right! Now, let’s see how long it takes me to get around to fixing it :)

      Thanks!
      KLC

  16. kornemuz said,

    Great article, thx ;)

  17. [...] how VMware ESX networking functions, I’ll recommend a series of articles by Ken Cline titled The Great vSwitch Debate. Ken goes into a great level of detail. Go read that, then you can come back [...]

  18. [...] of how VMware ESX networking functions, I’ll recommend a series of articles by Ken Cline titled The Great vSwitch Debate. Ken goes into a great level of detail. Go read that, then you can come back [...]


Leave a Comment