As I consult with companies and organizations ready to deploy a cloud managed MX WAN infrastructure, I’m constantly tasked with helping them understand the different connectivity models available and the appropriate deployment methodologies. With the WAN connectivity options evolving faster than ever, it’s important to know what options are available and more importantly help map business requirements to the end design.
This perennial topic is too often ill planned leading to more difficult adjustments later down the deployment cycle. My intention here is to help break down the details of each architecture so you understand the unique advantages of each deployment methodology and which might best serve and scale your business.
MX Hub and Spoke Roles
MX appliances can securely merge independent private and public circuits into a globally routable WAN fabric using hub, spoke, and mesh connectivity models. AutoVPN is the technology that powers it all, so that’s where we’ll start.
When AutoVPN is enabled on a MX security appliance network, an administrator must select whether the device should be a hub or spoke node. The differences are important.
MX hubs will automatically build VPN tunnels to all other MX hubs as well as dependent MX spokes (where specific hubs have been selected) in the organization. This is the default setting and as such all MXs will by default attempt to peer with all other MXs left in hub mode. This can lead to scaling challenges as we’ll discuss below.
To configure, Dashboard > Select the network > Security appliance > Configure > Site-to-Site VPN
MX spokes only build VPN tunnels to MX hubs. Additionally, they only tunnel to hubs specifically configured on the network’s Site-to-Site VPN configuration page in Dashboard.
No hub, no AutoVPN.
Multiple hubs can be added and prioritized in descending order. A common use case is to add a primary data center MX as the top hub, followed by a secondary or DR data center for failover of any shared subnets.
Another common use for defining hub priority is based on geography. If the spoke node is in London for example, it may have it’s primary hub be in-country with an out-of-country (or continent) MX hub listed as the secondary. Using this method allows each spoke to connect to it’s preferred hubs which can be highly distributed.
Finally, there isn’t a hard limit on the number of hubs that can be added to a spoke although most production spokes use 1-3.
Ok, now that we’ve got the hub and spoke definitions out of the way let’s jump into some of the different ways we can put the pieces together.
Hub and Spoke WAN Architecture
The most common MX WAN deployment model is a classic hub and spoke (H+S) design. In a H+S model, high-volume sites and data centers are selected as WAN hubs and all other sites serve as spokes.
In organizations where most of the applications are hosted out of centralized data centers, a H+S architecture can be a natural compliment. I often see a primary and DR data centers configured as hubs and remote offices, branches, and manufacturing locations configured as spokes with direct connectivity to both hubs for redundancy.
The sites selected as hubs need not be limited to data centers however. Companies that host services out of their corporate campus locations might prefer to promote the onsite MX to a hub role for direct connectivity to all other sites.
The data forwarding in a hub and spoke topology is very simple. AutoVPN will first populate the global VPN route table of every participating MX via the cloud VPN registry. Every MX will automatically know network reachability details of every other MX in the security domain. Think of this as a cloud-orchestrated WAN route table that’s synced across every device.
Hub-to-hub and hub-to-spoke traffic is sent directly. Spoke-to-spoke traffic is sent through a connected hub (often the primary data center).
H+S At Scale
While simple, a hub and spoke design brings some strengths that shouldn’t be overlooked. For example, a properly sized hardware design affords massive scale.
With large data center appliances in place, the number of supported spoke nodes easily scales into the thousands. Big DC boxes mean that the spoke nodes can be relatively small appliances since they only need to build IPSec tunnels to a handful of global data centers.
A centralized hub design also enables rapid deployment of each additional remote MX. Since the AutoVPN peer configuration is consistent across all sites, this model fits nicely into a template context – further facilitating a high-velocity rollout.
- Highly scalable
- Lowest capital cost
- DC-centric service model
- Consistent spoke configuration
- Template support
- No spoke-to-spoke tunneling
- Limited design flexibility
Full Mesh WAN Architecture
Full mesh – the unicorn of WAN engineers everywhere. If we can just spin up persistent tunnels between every single WAN node, why wouldn’t we?
Before we answer that, lets first discuss what a mesh design is. In full mesh MX architectures every node builds a persistent IPSec tunnel to every other MX. This is the default behavior when AutoVPN is enabled as every MX defaults to hub mode, much like the diagram below.
The clear advantage to this design is that site-to-site latency is reduced and data centers no longer serve as route proxies to remote spokes. All good things.
That said, keep in mind the tradeoffs involved. As the number of MX nodes in the WAN fabric increases, so do the number of IPSec security associations required per device.
You can see that small or mid-size deployments can in mesh mode with modest hardware requirements, but it becomes more expensive as the MX tunnel requirements grow. Checkout the Cisco MX Sizing Guide for the latest specs on supported tunnel counts and aggregate VPN throughput numbers.
Another consideration is real traffic forwarding patterns between sites. In most modern networks, over 90% of all application data coming from WAN-connected sites is destined to data center server infrastructure – not other remote branches. Outside of the occasional branch-to-branch call, it’s fairly rare to see communication between small sites in modern networks.
If very little traffic travels between remote nodes, is it worth all of the persistent IPSec overhead and silicon required to drive a mesh architecture? Perhaps, but consider all the inputs involved and the long-term scaling requirements of the organization.
VoIP and video services are the most common reason companies like to run to mesh designs – after all there is an occasional call between BranchA and BranchB. However, advancements in real-time voice quality detection in SD-WAN solutions have largely negated the need and allowed businesses to pivot to more scalable multilink H+S designs while maintaining voice quality.
- Lowest serial latency design
- Aligned with distributed service model
- Increased hardware sizing requirements
- SD-WAN scaling challenges
- Misaligned with modern DC service model
Hybrid WAN Architecture
If H+S optimizes for scale and mesh offers the low-latency per hop behavior for branch-to-branch communication, then a hybrid approach marries the advantages of each model into a custom architecture that meets the specific WAN requirements of an organization.
The foundation of a hybrid SD-WAN design starts with positioning data centers as primary WAN hubs. Other critical, data-centric locations like a corporate headquarters might also serve as a primary hub.
Once the primary hubs are defined, next it’s important to consider efficient traffic patterns for remote sites and whether or not larger/regional offices should be promoted to secondary hubs. These sites are then configured as hubs and tunnel to both the DCs as well as smaller sites within their region to serve as micro hubs for local spokes. This reduces spoke-to-spoke latency in the region and also offers local access to services at large regional offices.
A hybrid architecture also maintains the scaling benefits of a pure H+S design while adding mesh local-access advantages.
- Mesh keeps the number of total hub sites low – reducing hardware and supported tunnel count requirements
- Remote spoke nodes that benefit from in-region resources have direct access
- Remotes spoke nodes that require low-latency routing to local call centers or neighbor branches do not require long-distance DC rerouting of packets.
- Highly scalable
- Flexible hierarchical topology
- DC-centric service model
- Low latency access
- Customizable connectivity
- Additional design planning
- Increased complexity
Which architecture is right for a specific design depends on a number of factors specific to an organization. Considerations like hardware tunnel capacity, total tunnel count, distributed vs. centralized enterprise services, the value of operational simplicity, voice and video call paths, number of WAN uplinks per site, and future growth all should be included.
If an organization is under 100 locations and is only using a single WAN uplink per site, a mesh model may be a sensible choice. All sites would be direct peers with the lowest latency path to any destination.
Larger organizations or those moving to an SD-WAN architecture should strongly consider either a pure hub and spoke topology or a hybrid H+S leveraging secondary hubs for high-traffic nodes. This enables data center MX concentrators to provide a highly resilient and scalable WAN core capable of supporting thousands of remote MX peers in parallel. Layer on MX’s cloud managed SD-WAN application load balancing / dynamic traffic steering capabilities and you’ve just built a truly next generation WAN architecture.