Interested in active/active Meraki MX data center headend design? The purpose of this writeup will be to show you just that – how you can build a highly-scalable, multi-box MX VPN headend architecture capable of supporting the most demanding SD-WAN scale and performance scenarios.
This model is one that’s been repeated many times over and is the preferred headend scaling approach for maximizing both performance and redundancy. The reality is it’s a very simple approach while remaining flexible enough to be tailored to a variety of organizational and geographic requirements.
Requirements that drive a parallel headend design
First, let’s be clear where this design is not required. If you need hardware-level headend redundancy in a single data center then Warm Spare solves that.
Need SD-WAN headend termination in multiple data centers, each serving unique data center subnets and services? No problem. That’s natively well supported by using a single, appropriately sized appliance (or warm spare pair) in each.
What we’re referring to here by active/active are two or more SD-WAN data center headends that are providing connectivity to the same subnets and services. It’s adding parallel hardware to terminate the same backend routes and networks for improved capacity.
This could include multiple MX appliance headend units in the same data center, each actively serving the same application traffic to remote appliances. It could also mean two or more MX appliance headend units in multiple data centers that all advertise reachability to the same backend networks.
If your SD-WAN design includes more than one headend adverting common prefixes (regardless of location) then the active/active label applies.
Active/active or parallel headend support is only necessary when you have incoming remote tunnel counts or general performance requirements that exceed the largest single MX hardware platform available (here’s the datasheet for reference).
Sizing headends is messy
Let me just pause and make a brief but important point. Determining the number of branches or remote WAN nodes a headend Meraki MX appliance will support is not as easy as it might appear when you exceed 1,000 remote sites. Unique scaling-related performance challenges at the data center headend can occur depending on the deployment conditions. While there is a generic process (and official sizing guide) that can be followed to select the appropriate branch appliances, headends supporting 1,000+ spokes take more consideration.
Why? Because the number of spokes they’re concentrating, the SD-WAN traffic volume (packets per second), and the type of traffic make a significant impact on the processing load introduced to the data center appliances.
For example, a very large headend appliance may run at 90% CPU while supporting 1,300 branches, each filling the pipe with a constant barrage of tiny voice and video packets.
That same headend appliance hardware may only run at 45% CPU load when supporting 2,500 SD-WAN branch sites – each sending minimal HTTPS application workloads to the data center.
Bottom line – the type and amount of hub/spoke bidirectional IP traffic has a major impact on the data center hub performance and hardware requirements. For very large SD-WAN deployments, this may mean multiple, actively load balanced headend units are required in each data center.
Active/active MX headend design
The approach used to distribute SD-WAN branch load across MX hub appliances in the data center differs between a single DC design or a multi-DC design.
- When multiple hubs are required in a single data center, each MX appliance simply acts as the primary hub for a subset of the remote spoke nodes
- Each hub would also play a backup hub role for others in the case any given hub unit fails
Example 1: single data center, two active/active hubs
Let’s break this down into a couple examples to help illustrate how this works in different deployments. Here’s a quick view of what a two MX, active/active hub model would look like in a single data center.
In this example, the number of spoke tunnels (let’s say 6,000) requires two MX hubs to support the SD-WAN branch deployment at full capacity.
The spokes SD-WAN hub preference would be set such that:
- Half the spokes (branches 1-2,999) would use hubA as primary and hubB as backup
- The remaining half (branches 3,000-6,000) would use hubB as primary and hubA as backup
The Meraki dashboard SD-WAN hub settings for a branch in Group A would look like this.
The order of the hubs selected in the dashboard above matters as the hub preference is in descending order, top down.
If HubA and HubB both advertised reachability to matching local data center subnets then the branch appliance would build and maintain active tunnels to both – but only send data to HubA (until it becomes unreachable).
Here’s a look at a Meraki dashboard for a branch in Group B. You can see the hub priority is reversed.
Splitting the hub termination load between two MX appliances enables a performant, highly availability solution. However, if either hub were to fail for any reason, all SD-WAN spoke nodes would be running on a single, potentially oversubscribed hub. Hub performance may be affected in such a situation. If degraded performance during a hub failure isn’t an acceptable risk to the organization, a Warm Spare MX appliance can be added to each active hub to maintain performance levels in the event of any single hub failure.
If requirements drive a multi data center headend hub architecture, we simply replicate the single DC design across two or more DCs. We’ll unpack this scenario for our second example.
Example 2: dual data center, four active/active hubs
In this example, we’ll spread the Meraki SD-WAN hub load across four active data center concentrators – expanding the model we used in the single DC design.
Let’s assume this deployment requires two data centers to provide geographic diversity and four total hubs, each actively terminating SD-WAN tunnel traffic destined to its respective data center. Four could be scaled up to eight, sixteen, etc. Same exact process.
To prevent any individual hub appliance failure from forcing a data center failover, a standby (Warm Spare) MX is added to each hub for physical redundancy. SD-WAN tunnels will be terminated on the shared virtual IP to minimize overlay disruption in the event of a spare failover/failback condition.
Load sharing the spokes
Now we need to divide the spoke nodes into four different groups (let’s use A, B, C, and D) to identify and load balance the production tunnels across all available hubs and DCs.
All spoke MXs in branch group A will be configured to use HubA as it’s primary hub and Hub C as it’s secondary hub.
All spoke MXs in branch group B will be configured to use HubB as it’s primary hub and Hub D as it’s secondary hub (and so on as illustrated in the diagram above). This logical SD-WAN hub assignment allows for any individual hub appliance or WAN transport to fail while maintaining end-to-end reachability and performance.
This can be made even more resilient in deployments with multiple physical WAN uplinks or those using integrated cellular backup connectivity.
Simplifiying with MX SD-WAN templates
I’ve shared two examples of how we can intelligently construct a highly scaled data center SD-WAN hub architecture to support ever larger deployments. The model is simple, but if you have to carve out hundreds or thousands of branch spoke sites into uniform groups – each bound to one or more hubs it quickly becomes an operationally taxing. Anyone who’s had to manually maintain an even spread across hubs as branches are added and removed over time knows just how painful this can become.
The good news is a UI tool to automatically bind specific Meraki MX spoke nodes to the appropriate hubs at scale: appliance templates!
Meraki MX appliance templates allow a common configuration set to be defined (including VLANs, security policies, SD-WAN options, and more). You can think of them as virtual, master appliance policies that real deployments are bound to and automatically inherit the templates policy set (including SD-WAN hub preferences).
If you need a quick primer on Meraki MX templates, this is a good place to start. Let’s use the dual data center example above but this time build a template for each branch appliance group.
We start by creating the Group A template and binding the Branch A appliance network to it.
Now we can specify the desired hub connectivity for all appliances added to this template.
Any deployment now added to the Group A template will automatically be provisioned with the correct hub connectivity. The remaining three templates for Groups B, C, and D would also be created and their hub priorities setup up appropriately.
The distribution of networks bound to each template group can be easily viewed at any time under the Organization > Configuration templates page.
Wrapping it up
Active/active SD-WAN headends offers a growth path for exceptionally large deployments to using parallel hardware for performance and capacity.
As we mentioned at the top, this isn’t always required if redundancy is your goal. Warm Spare or single headend data center units may solve those needs more cleanly.
If you do require the most performant and scalable data center SD-WAN solution, an active/active approach is likely your best choice. It’s uncomplicated method just requires some extra hardware and a little planning.