The Great vSwitch Debate – Part 5

So far, we’ve been through four posts on vSwitches. If you’ve not read these posts, I recommend that you go back and do so now (or you can read this post and then go back – there are not many dependencies). The first four posts were:

Now, in Part 5, I’m going to identify the various “networks” that you interact with in a VMware environment and also provide my recommendation for a configuration with only two pNICs. On with the show!

The Various Networks

In the VMware architecture, there are nominally five IP-based networks. I’ll cover each of these below, but in summary, they are listed below and shown in Figure 1:

  • VMware Management Network. This is the network that connects vCenter Server to the ESX Service Console. Since ESXi doesn’t have a Service Console, the ESXi Management Network is terminated at the vmkernel.
  • VMotion Network. This network interconnects the various ESX/i (reminder, ESX/i is my shorthand notation for ESX and/or ESXi) hosts within a VMware cluster and enables VMotion among those nodes.
  • NFS Network. The Network File System (NFS) Network is an IP Storage network that provides the interconnect between ESX/i hosts and one or more NFS servers that provide storage for virtual machine and ancillary support (i.e. .iso files).
  • iSCSI Network. The Internet Protocol Small Computer Systems Interface (iSCSI) Network is an IP Storage network that provides the interconnect between ESX/i hosts and one or more iSCSI targets that provide storage for virtual machine and ancillary support (i.e. .iso files).
  • Virtual Machine Network(s). One or more networks that support VM to access and provide services.
Figure 1. The Various Networks

Figure 1. The Various Networks

You may have additional networks in your environment, and I’ll mention some of them as I get through this article. One of these that I’ll mention right now is the out-of-band (OOB) management network. While not required to make your VMware environment work, this network is present in many, if not most, enterprise VMware environments.

The out-of-band management network provides connectivity to your out-of-band management interface (i.e. HP iLO, Dell DRAC, IBM Director, Sun LOM, etc.). Users who have access to this network essentially have the keys to the kingdom. With proper authentication (or by hacking a username/password), they have direct console access, the ability to connect and disconnect virtual I/O devices (CD-ROM, floppy), the ability to power the host on or off, and pretty much anything else you could do if you were sitting in front of the system. This network needs to be protected at all costs! It should never be placed into a DMZ network. Another way to view the OOB network is as the door to your datacenter. You wouldn’t leave your datacenter door propped open to a back alley, so don’t leave your OOB network exposed to unnecessary risks, either!

VMware Management Network

Much like the Out-of-Band Network, the VMware Management Network is critical to the security of your virtual infrastructure. This network provides the management interface to the vmkernel – either through the service console (for ESX) or directly (with ESXi). This is the network where vCenter Server (a.k.a. Virtual Center) lives, as well as the path for ssh, web access, and third-party tools access, see Figure 2.

Figure 2. VMware Management Network

Figure 2. VMware Management Network

Again, in much the same manner as your OOB network, the Management Network can be viewed as a door into your datacenter. It needs to be protected very carefully and should never be exposed in a DMZ, and will frequently live behind a firewall with minimal ports exposed.

VMotion Network

The VMotion Network is a special-purpose network with only one use – the live migration of a running virtual machine from one physical ESX/i host to another with no interruption of service to clients of the VM. The VMotion interface is a vmkernel interface that is flagged as being used for VMotion. Figure 3 shows the VMotion network.

Figure 3. VMotion Network

Figure 3. VMotion Network

All VMotion Network interfaces should be within the same IP broadcast domain to ensure that the hosts can find each other. No function other than VMotion needs to be able to access the VMotion network – in fact, for a two-node cluster, you can use a direct cable with no intervening switch to support VMotion.

It is important to know that the data sent across the VMotion network (on port TCP 8000) is not encrypted in any manner. This means that anyone who can connect their PC to the VMotion network will be able to listen in and intercept that data, which contains whatever happens to be in the virtual machines vRAM at the time of the VMotion. The information could include usernames, passwords, credit card numbers, you name it…it could be there.

With that knowledge, it should come as no surprise that I’m recommending that you protect the VMotion Network very carefully!

NFS Network

The NFS Network is a vmkernel network that supports access to Network File System v3 (NFSv3) shares over the Transmission Control Protocol (TCP). NFS is a file sharing protocol (much like Server Message Block [SMB] or the Common Internet File System [CIFS], the common Windows file sharing protocols (see http://en.wikipedia.org/wiki/Cifs). NFS was originally developed by Sun Microsystems in 1984 for use with Unix systems (see http://en.wikipedia.org/wiki/Network_File_System_(protocol)). Figure 4 shows a typical NFS network configuration.

Figure 4. NFS Network

Figure 4. NFS Network

Since NFS can be used to store virtual machines and/or utility files such as .iso images, it is very common for the NFS server to have multiple connections to the network. Depending on the NFS server or appliance that you are using, there are a variety of ways that these connections can be aggregated to improve the performance of your NFS network. Regardless of how the NFS server is connected to the network, the ESX host is bound by the rules of the vSwitch that’s used to support NFS traffic, as discussed in Part 3.

From a security perspective, the NFS protocol is not encrypted or otherwise secured on the wire. This means that anyone who has access to the NFS network has the ability to intercept data that represents the on-disk information stored in virtual machine files. Obviously, this is a significant risk that needs to be mitigated with appropriate configuration and management actions.

iSCSI Network

The iSCSI Network is quite similar to the NFS network. The primary difference between NFS & iSCSI is that iSCSI is a “block oriented” protocol whereas NFS is a “file oriented” protocol. What does that mean? Basically, it means that with NFS, it is the NFS server that is responsible for managing the filesystem and individual blocks of data on the disk. The ESX Server doesn’t care if the disks are formatted with NTFS, ZFS, ext3, or Ken’s File System – it never sees the structure on the disk. It is the NFS Server’s responsibility to read from, write to, and manage access of all information on the disk.

Conversely, an iSCSI Target (that’s what a single instance of an iSCSI server process is called), the ESX server is intimately knowledgeable about the on-disk structure (unless you happen to be using a Raw Device Map (RDM)) because ESX will communicate with the iSCSI Target using standard SCSI commands, exposing the actual on-disk blocks of data to the ESX host. In many, if not most, cases, the logical unit number (LUN) exposed by the iSCSI Target will be formatted by the ESX host as a VMware Filesystem (VMFS) volume. Figure 5 shows a typical iSCSI configuration.

Figure 5. iSCSI Network

Figure 5. iSCSI Network

As with NFS, it is not uncommon to see an iSCSI Target configured with multiple uplink connections into the network. And just like NFS, iSCSI is bound by the rules of the vSwitch for load balancing and failover.

I hate to be redundant, but this is worth stating: From a security perspective, the iSCSI protocol is not encrypted or otherwise secured on the wire. This means that anyone who has access to the iSCSI network has the ability to intercept data that represents the on-disk information stored in virtual machine files. Obviously, this is a significant risk that needs to be mitigated with appropriate configuration and management actions.

Virtual Machine Network(s)

Here’s where things get interesting! When you start talking about virtual machines, you’re talking about all the servers that live in your datacenter. These servers provide and consume services of all types – from other virtual machines, from physical servers in the datacenter, and from resources of all types on the Internet and other external networks. Obviously, this means that there may be the need to connect to more than one network to support all of these different communications paths. Figure 6 provides a view of some of the possible connectivity that needs to be supported by the VM Network(s).

Figure 6. Virtual Machine Networks

Figure 6. Virtual Machine Networks

Notice that I’ve included an NFS Server and an iSCSI server in this diagram. You might ask “Why?” Well, it’s simple. The guest operating system inside a VM can directly mount an NFS volume and it can also use an iSCSI Initiator to connect to an iSCSI Target.

Some of these networks may need to be separated from each other. Depending on the level of sensitivity of each network, you may be able to use a single network divided only by IP subnets; you may need to use VLANs to provide a logical separation among the networks; and you may need to use totally separate vSwitches with dedicated pNICs to provide the required separation. I can’t help you with these decisions…that’s between you, your network team, your application team, your management team, and your security officer (good luck!).

Also notice that, for the first time in my initial six diagrams (other than the ESX Service Console), the network connections into the ESX/i hosts is not through a vmkernel port. In this case, the connection is via a vSwitch that is configured for Virtual Machine connectivity. That’s really just a technicality, a configuration setting. In reality, all vSwitches are owned and managed by the vmkernel.

Best Practices

There are some general guidelines that I like to use when designing a network architecture. Based on the discussion above, you can see that there are quite a few networks that need to / should be protected or isolated. I recognize that not everyone shares the same views on security and data protection – and that’s perfectly fine (as long as you understand the consequences!). I’ll try to accommodate as many positions as I can…on with the show!

There are two primary considerations when deciding how to carve up your networks: security and performance. The third consideration is manageability, and we’ll talk about it in a moment. The key to picking the optimal configuration based on the number of pNICs you have in your hosts is to understand the level of risk associated with mixing the various networks. I’ve created a couple tables that shows my personal assessment of the risk associated with each pairing. Table 1 shows the security implications while Table 2 shows the performance ramifications.

Table 1. Security Impact of Mixing Networks

Table 1. Security Impact of Mixing Networks

My rationale for assigning these levels of security risk are as follows:

  • The Management Network should be isolated as much as possible. When it is not possible to give it a dedicated vSwitch and associated pNICs (best case scenario), it is typically a “Medium Risk” to mix it with either VMotion or one of the IP Storage networks. The reason I chose medium risk is that the personnel granted access to the Management network are typically the most trusted in your organization. While it is never a good idea to allow users of any level of trust to access an unsecured storage or VMotion network, if you have to do it due to constraints, do it with your administrative users on your Management network segment.
  • For the VMotion Network, there is “Low Risk” associated with the IP Storage networks. The reason for this is that there shouldn’t be any users on any of these three networks (VMotion, NFS, iSCSI), so there’s not much chance of someone intercepting data on the wire.
  • The VMotion logic above applies to the NFS and iSCSI networks as well.
  • The Virtual Machine Network(s) are always considered “High Risk”. This is because you have unknown/untrusted users connecting via potentially uncontrolled systems. There may be cases where you have specific VM Networks that are not high risk (for example, you may run vCenter Server as a VM. That particular VM would live on the Administrative Network which is a Medium Risk network).

Your assumptions about the level of risk associated with each network may be different than what I’ve suggested here. It doesn’t matter, simply substitute your values into the calculations and you’ll be set.

The second consideration I want to discuss is performance. Similarly to security, there are performance impacts to combining the various networks onto the same set of pNICs. Table 2 shows the matrix of performance impacts from mixing the various networks.

Table 2. Performance Impact of Mixing Networks

Table 2. Performance Impact of Mixing Networks

If you evaluate each network individually, you can develop an understanding of the traffic patterns that exist for each.

  • Management Network: There is typically not a lot of traffic on this network. It is used for management functions such as vCenter Server operations (host configuration and management, virtual machine configuration and management, performance monitoring, etc.) and access by third-party applications (configuration management, resource monitoring, etc.). These are typically low-impact applications. The exceptions to the low-impact “rule” are the deployment of templates and the use of service console-based backup utilities.. Each of these functions has a significant impact on management network utilization.
  • VMotion Network: This network sits idle except during a virtual machine migration; however, when a VMotion migration is taking place, you want to have as much bandwidth as possible available to enable the migration to complete as quickly as possible. If you do not use VMotion, you don’t need to worry about this network.
  • NFS & iSCSI Networks: These are your IP Storage networks. Their utilization fluctuates wildly depending on what is happening in your virtual environment. During steady-state operations, there is typically a “moderate” level of activity; however, when virtual machines are being powered on, resumed, or backed up across these networks, there is significant activity.
  • VM Network(s): This is a total crap shoot. In some environments, these networks sit almost idle, while in others, they are hit very hard. You will have to judge for yourself how significant your workloads are – although I will say that it is the exception rather than the rule where the traffic on these networks is “significant”.

Next, I’ve taken the two considerations and combined them. The result is a matrix that shows the overall “risk” of combining the various networks (see Table 3).

Table 3. Overall Impact of Mixing Networks

Table 3. Overall Impact of Mixing Networks

Again, these are not set-in-stone, has-to-be-this-way recommendations, but rather a tool to be used to help you make your decisions.

Oh, I nearly forgot – I promised to talk some about manageability. Basically, I recommend that each of these networks be separated into individual port groups, even if you’re not using VLAN tagging. If possible, create separate vSwitches for each of the major networks (Management, VMotion, IP Storage). For the VM Network(s), I recommend at least one vSwitch (depending on separation requirements) with a separate port group for each different network. Figure 7 shows two possible configurations.

Figure 7. vSwitch Options

Figure 7. vSwitch Options

Notice that, even though there is a different number of vSwitches in the two configurations, the port groups are the same. This logical separation makes it simple to manage your environment – you can even have different hosts with different vSwitch configurations, yet the same port group configuration, that support VMotion among them. It also makes it easy to scale the environment.

OK..that’s it for this time. Next time, in Part 6, I’ll talk about my recommendations for when you have differing numbers of pNICs in your hosts.

Advertisements

Tags: , , ,

About Ken Cline

vExpert 2009

23 responses to “The Great vSwitch Debate – Part 5”

  1. Neal Roche says :

    Hi Ken
    Thanks for your posts on vSwitch. If you want to test network performance with different combinations of VM’s, Hardware and vSwitch, Ixia has published a test plan on our IxChariot Blog.
    http://www.ixchariot.com/blog/

    Thanks
    Neal Roche

  2. Henrik Huhtinen says :

    I’d say mixing IP storage networks and VMotion network is a “high risk”. You may nearly stop all storage access during VMotion, if you mix them.

    • Ken Cline says :

      Henrik,

      From a performance perspective, I agree 100% (which is why those combinations receive a “High” rating in the Performance table). The security risk is relatively “Low”, since all of those networks “should” be protected and have no users accessing them. When you combine those two ratings, you get a “Medium” overall risk. As I said, these are simply suggestions/guidelines to be adapted for each environment.

      Thanks for the feedback (finally a little debate 🙂 )

      KLC

  3. Joel Snyder says :

    “All VMotion Network interfaces should be within the same IP broadcast domain to ensure that the hosts can find each other.”
    I don’t believe that this is true anymore. Where VI3 is coordinating and TCP is routing, there should not be any reason to have this restriction.

    • Ken Cline says :

      You’re correct – VMotion traffic can be routed; however, I recommend that it not be. You want VMotions to occur as quickly as possible. Inserting a router in the path adds at least one hop, which introduces latency. It is best to keep everything together in the same broadcast domain to minimized hops and latency.

      Thanks for the comment!

      • q says :

        Don’t forget that you can only have 1 default gateway for the vmkernel. So if you have 3 vmkernel interfaces for vmotion, NFS, and iSCSI, routed vmkernel traffic will transit the interface that is local to the default gateway. In other words, if your default gateway is on your NFS subnet, and you have ESX hosts with VMotion interfaces on different subnets, that VMotion traffic will transit the NFS network.

        You can circumvent this behavior by manually adding permanent routes to each ESX host, but this does become rather more difficult to setup, maintain, and troubleshoot.

  4. Aaron says :

    Do vSwitches fail?
    I am trying to figure out how many vswitches I will need for my network and have some serious concerns of how to handle my 8 nics. My current plan is to use 4 vswitches: Management, vmotion, storage (ISCSI), and VM Client Networks. If I assign 2 nics to each of the categories and have a vswitch fail then I will completely lose one of my networks rendering the boxes useless.
    Have you seen anything happen to a vSwitch before?

    • Ken Cline says :

      Hi Aaron,

      I have never seen nor heard of an instance of a failed vSwitch. I wouldn’t worry about the stability of the vSwitch – the bigger question is do you trust your NICs? If you were to create a single vSwitch with all 8 of your NICs affiliated to it, you could define primary & failover NICs for each port group, but you do that at the expense of physical separation between your different network types. It’s a tradeoff, and you have to judge for yourself which is the lesser evil. Most of the customers I deal with are most comfortable with accepting the risk (it’s unlikely that you will have both NICs associated with a vSwitch fail) and go the “more secure” route of separate vSwitches for each type of traffic.

      KLC

  5. vps says :

    You have a very well informed. In a few places here will show as the source. I wish you continued working

  6. Mario says :

    Hi!

    I’m planning to secure our VMware environment and need help- I didn’t really find any satisfying answers :-/

    My problem is about ESXi 4 management, vmkernel and ILO networks. How should I separate them? I’m planning to make them one security zone- that is not separating them by a firewall- but use three VLANs. Traffic between these networks would be possible, but access to them would be restricted by a firewall. VMs will get their own network infrastructure, storage isn’t connected via ethernet.

    I think this is a good tradeoff between security and cost but would like to have a second opinion. I seldom find information about ILO in this context :-/

    You’re talking about “mixing networks” (which I won’t), my problem is more about “mixing security zones”…

    cu

    Mario

    • Ken Cline says :

      When thinking of the iLO (or any out-of-band management interface), think of it as providing physical access to the console of your server. In the case of VMware, in many ways it’s more like physical access to your datacenter, since your physical server will house many VMs. Without knowing details about your environment, I can’t really tell you whether your solution is viable or not. In cases where I don’t know how sensitive the information is, what type of regulatory guidelines you have to comply with, etc., my recommendation is always to provide physical separation of the different security zones.

      However, since you DO know the answers to all those questions, you’re much better qualified to make the determination as to whether it is “secure enough” or not. I will say that, for a “low” to “medium-low” security context, your solution is quite viable. Depending on the level of trust you have with your staff, this might even be good enough for a “medium” security context. Most “high” contexts would likely balk at this, but then you’re getting to the area where security is more important than cost and ease of operation…it’s all about trade-offs.

      One critical thing, from a security perspective, is to ensure that your vmkernel & iLO networks are NEVER exposed to a DMZ or other “public facing” network. It may make your life “easier” – but it’s inviting the bad guys into your world…never a good thing!

      Best of luck, and thanks for stopping by!
      KLC

      • Mario says :

        Hi!

        Thank you for your answer 🙂

        I want to use the ILO network for non- ESXi machines, too. (Don’t freak out, but up until now no one bothered about securing ILO interfaces; I’m the first one.) The firewall will restrict access on an IP basis. That is I won’t be allowed to access ILO interfaces of, say, the windows machines and the windows administrators won’t be allowed access to the ILO interfaces of my VMware hosts. They *will* get access to VMware management because I’m hosting Windows VMs for them. (They have the right to open a console, start and stop their machines and mount CDs remotely- and that’s it, all other rights stay with me and my colleagues.)

        There are several DMZs (we don’t mix DMZs on ESX) which means the “full” solution (three physically seperated and firewalled networks per DMZ) will result in quite some firewall interfaces and switches…

        So if I won’t be able to get three firewalled security zones per DMZ, what would be the second best solution? There are three options:

        VMware management + ILO | firewall | VMotion
        VMware management + VMotion | firewall | ILO
        VMotion + ILO | firewall | VMware management

        The third one looks a bit odd, doesn’t it? Anyway, due to the benefits we will host more and more systems on VMware; but I’m not really prepared for all security questions that are turning up, neither are our security, firewall and network guys prepared for virtualized environments. Do you know of any source describing network designs for VMware in widely accepted terms of security requirements? Like my original proposal would be EAL2 or you recommendation (physical separation of the different security zones) would be EAL4?

        cu

        Mario

      • Ken Cline says :

        Ha! Believe me, I’ve seen it all when it comes to people not being security conscious. Your environment is not all that strange. In fact, you may want to consider using authentication rather than firewalls to segment your systems. Configuring different iLOs with different AD security groups being authorized is much easier than managing firewall rules based on IP addresses (which doesn’t protect against a Windows admin using your workstation to access an ESX host…)

        If it were me, I think I would opt for your first option (VMware management + ILO | firewall | VMotion) and use iLO authentication to control who has access to which iLO. The advantage here is that you are isolating the VMotion network. Remember that VMotion traffic is unencrypted and could potentially hold a treasure-trove of “intelligence data”. Keeping it separated is always a good idea. You’ve also got a requirement for all users to access both the VMware Management and the iLO networks (and they’re both “management” types of traffic), so why complicate your world by separating them with a firewall. I would put them on separate VLANs, but that’s not really necessary.

        As for a good source of security information – probably your best bet is Edward Haletky’s “VMware vSphere and Virtual Infrastructure Security | Securing the Virtual Environment” (Edward, you owe me!) I did a tech review of the book for him and there’s lots of good info in there. He doesn’t classify things by EAL level, but there’s enough security jargon to keep most SSOs happy.

        KLC

      • Mario says :

        Hi!

        I’ve ordered “VMware vSphere and Virtual Infrastructure Security | Securing the Virtual Environment” a couple of weeks ago and it’s really a good book. But the “separate the security zones physically” approach is quite expansive given that I want/have to separate management networks of different DMZs physically, too. I’m still trying to find an acceptable tradeoff between security and costs- your comments have been very helpful, thank you very much 🙂

        Granting access to ILO interfaces at the firewall based on AD groups might be… hard to achieve. One of our firewall guys told me we could “as well stop using firewalls then”. Personally, I think security based on IP adresses is a bad idea and prefer something based on strong authentication- like certificates and private keys on smart cards.

        ILO will of course be configured to grant access only to authorized users. That is ILO itself will deny, for example, the windows guys access to machines owned by the linux guys.

        One issue occured to me yesterday: ILO doesn’t have to be as available as VMware management since there’s only one ILO interface. It’s probably cheaper to build a less redundant ILO network (lots of interfaces) and a highly redundant VMware network. Separating VMware management and VMotion by VLANs with a firewall between them is probably “secure enough”, especially with switches using separate spanning tree tables per VLAN.

        I think I’ll ponder the problem a bit longer. Thank you very much for your comments 🙂

        cu

        Mario

        PS
        Maybe somone working on network security and VMware stumbles across this conversation and finds the following helpful: http://www.spirit.com/Network/net0103.html

        PPS
        http://i.imgur.com/mUcOm.jpg 🙂

      • Ken Cline says :

        Hmm…do you really need to isolate the management networks for the various DMZs from one another? Does each DMZ have a different set of administrators? In some organizations, that level of isolation is required, but in others, the

          management

        networks for the DMZs are in the same administrative domain, and thus, don’t need to be separated from each other. Obviously, it’s a different story for the data networks for the DMZs.

        As for your less-redundant iLO network – very true. Also keep in mind that current versions of the iLO support only 100Mbps interfaces. I’m sure that at some point, that will get bumped to 1Gbps, but unless you use virtual media, the higher speed won’t really get you much. Many organizations have high density 100Mbps switches just lying around begging to come out of retirement 🙂

        Thanks for the discussion – I’ve enjoyed it! (finally some debate to the “great debate”)
        KLC

      • Mario says :

        Well, an attacker managing to break out from a VM might access the managment/VMotion network of another DMZ- so separating the management networks of different DMZs will almost certainly be a conditio sine qua non. It doesn’t matter what I think about this, people will simply demand separated networks.

        However, I think I won’t have to separate ILO networks. Anyone gaining access to management networks by accessing ILO “from the inside” is probably unstoppable, anyway.

        /Mario

      • Ken Cline says :

        All I’ve got to say is “wow”! Your management/VMotion networks should not be exposed in the DMZ – ever. I don’t think you have to worry about someone breaking out of a VM…I believe they are more secure than physical servers from that perspective.

        In case you haven’t seen it, here’s a good read: http://www.vmware.com/pdf/vi3_security_architecture_wp.pdf

  7. Conrad Kimball says :

    Why is Table 2 (performance impact of mixing networks) not symmetric? Table 1 (security impact of mixing networks) is symmetric, and I don’t see why Table 2 isn’t also symmetric – mixing VMotion with NFS should be the same as mixing NFS with VMotion…

    This asymmetry of Table 2 then carries over into asymmetry of Table 3.

    • Ken Cline says :

      Hmm…well, I guess you proofread better than do I! There should be symmetry in the tables (obviously). When I get the initiative, I’ll fix it. Thanks for taking the time to read this. I hope you got something useful out of it.

      KLC

  8. Bob says :

    Brilliant series Ken! Could i just ask, what are your thoughts on isolating FT traffic in vSphere 4.1 — is a seperate vSwitch/dedicated pNiCs the way to go? If one had to mix FT traffic, due to the number of available pNICs, what would you recommend it is combined with: Managment, VMotion, IP Storage or VM traffic?

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: