MX Design: Sizing Meraki MX Appliances

MX Design: Sizing Meraki MX Appliances

Firewall and WAN refresh projects have many components, but one of the first steps after vendor selection is hardware sizing. Even when working on POC/POV projects, rightsizing the hardware upfront provides a truer day-two experience and requires less rework once complete.

My goal is to walk you through a consultative sizing process to properly size Meraki MX appliance models for any deployment. The process outlined below applies equally to small, multisite MX deployments as it does to super-scale 10,000+ site deployments. I’ve been involved in the planning, pilot, and deployment phases of dozens of such projects and wanted to share my method.

Before we jump in, it’s important to remember that the Meraki MX security appliance is a pretty incredible box. This isn’t your traditional router from 1993. MX is actually a stack of services capable of VPN Automation, SD-WAN application routing, QoS, L3-7 firewalling, anti-malware engines, content filtering, client VPN, DPI, and much more. Knowing what role and mix of services are relevant to your deployment will be helpful in the process outlined below. In terms of roles, MX can be used in two different modes – NAT mode and passthrough/VPN concentrator. We’ll explore where each is used and how that affects the sizing decision.

Lastly, the official Meraki MX Sizing Guide is refreshingly easy to use, but there are real-world nuances depending on the posture and WAN topology that often aren’t obvious. This should serve as a compliment the good work the MX teams have already provided.

Step 1: Hardware Formfactor

This may seem like an odd place to start, but if having rack-mountable appliances is a requirement for specific sites (as often is the case in data centers), then the desktop form factor models like the MX64/65 will not be suitable. MX84 models and above are all 19 inch rack-mountable and include appropriate mounting hardware.

MX64/65(W) models can be used in rack infrastructure with the help of a shelf or one of the many third-party options on the market. MX64/65(W) models also include a Kensington slot – allowing them to be secured to a permanent structure for theft mitigation.

If sizing for a small site, integrated switch ports, PoE, and wireless might be important considerations. The MX64W and MX65W platforms have integrated wifi antennas and offer an all-in-one option for deployments that would benefit from a single device. The MX64 models include 3 LAN switch ports while the MX65 models include 10 LAN switch ports (two PoE).

Step 2: Throughput Requirements

Each MX has two dedicated uplink ports that can be connected to private or public networks. When sizing, make a note of the bandwidth available across both uplinks.

Campus/Branch Internet Gateway

If the MX is acting as an internet firewall, this would simply be the max up/down bandwidth provided by your service provider. If you are terminating two handoffs and expect to use them in an active/active model, combine the total bandwidth for both.

Branch MPLS Gateway

Similar to internet termination, total the max bandwidth provided by your service provider. If using a mix of WAN providers (ex. MPLS + internet or MPLS + LTE), combine the total bandwidth for both.

Data Center VPN Concentrator

Determine the total available upstream WAN bandwidth available to the datacenter at any given time. When used in concentrator mode, the MX is positioned behind MPLS and internet routers.

  • If the MX concentrator is terminating internet VPNs only, use the data center internet bandwidth as the metric.
  • If the MX concentrator is aggregating MPLS or private links as part of a SD-WAN overlay, then use the data center MPLS bandwidth available as the metric.
  • If the MX concentrator is building an overlay WAN architecture over both internet and MPLS, then the total of both services available at the data center should be applied.

Step 3: Client Count

Client count is only significant for NAT-mode deployments where the MX is acting as an edge gateway. Data center VPN concentrator deployments should not include this metric.

Client count is a soft metric used to estimate the average number of simultaneous firewall connections and NAT translations that would need to be processed during peak use. The number of clients should represent the maximum number simultaneous network devices (anything with an IP address behind the MX) at any given time.

Maintaining active connection state and IP rewrites introduce processing and memory load. Pushing packets securely, as fast as possible is a core function of the MX. Failing to consider client count count could lead to under-resourced hardware and ultimately lower throughput.

Client count is an important consideration, but isn’t a hard limit and isn’t enforced in hardware. MX appliance will serve all clients, regardless of the actual number. Just stay within the lines and you’ll be safe.

If the appliance is going to be deployed in VPN concentrator mode, the total client count metric isn’t relevant to the sizing exercise since client IP rewrite processes inherent to NAT mode are disabled.

Step 4: Total Tunnels

If the MX will not be terminating any VPN sessions, then skip this step.

Site-to-site and client VPN are core features of the MX platform, popularized largely by AutoVPN’s success. There are three different VPN types supported:

  • AutoVPN (MX-to-MX)
  • Non-Meraki IPSec VPN
  • Client VPN (L2TP/IPSec)

Regardless of the type of VPN technology used, each tunnel between the local MX and a remote peer requires an IPSec security association (SA) to be maintained. Maintaining the active tunnel sessions consumes additional system resources for every additional SA. The more site-to-site or client VPNs on a single MX, the more SAs to manage. This makes calculating the total concurrent tunnel count a critical component of the MX sizing process.

It’s relatively easy to calculate the total number of tunnels consumed by client VPN sessions. Just estimate the max number of clients that could connect into the MX at any given time.

Non-Meraki IPSec VPN tunnel count is also straightforward. Add the total number of remote, non-Meraki peers required for the deployment. This is usually a small number.

Meraki AutoVPN tunnel count is highly dependent on the WAN topology in use and the numbers can quickly grow very large in complex enterprise architectures.

Hub and Spoke AutoVPN Deployments

The most common MX WAN deployment is a simple hub and spoke design. It’s highly scalable and sizing is simple.

Hub MXs are often positioned in the data center or campus core and connect to all other MXs. Spoke MXs only connect to hubs. These are usually branches, manufacturing sites, warehouses, retail stores, sales offices, etc.

meraki mx hub and spoke diagram

Hub AutoVPN Tunnel Count

Each hub MX will form a tunnel to all remote MX hubs and spoke nodes using every available uplink path.

When using a single uplink at each location, the hub would establish 1 tunnel per spoke. In the example below, hub DC1 builds 2 AutoVPN tunnels (1 uplink x 2 peers = 2 SAs).

meraki mx single hub multi spoke

When using multiple uplinks on the hub or spoke, the hub establishes 1 tunnel between every WAN interface. The scenarios below show how to calculate the number of hub tunnels when using a single hub uplink (1 internet or MPLS handoff).

Hub Sizing: Example A

  • 1 hub with 1 uplink
  • 2 spokes each with 2 uplinks
  • Hub builds 4 total tunnels
  • (1 local uplink) x (2 peers each with x 2 uplinks) = 1x(2×2) = 4 SAs

meraki mx multi spoke multi uplink

Hub Sizing: Example B

  • 1 hub with 2 uplinks
  • 2 spokes each with 2 uplinks
  • Hub builds 8 total tunnels
  • (2 local uplinks) x (2 peers each with x 2 uplinks) = 2x(2×2) = 8 SAs

meraki mx multi uplink multi spoke

Spoke AutoVPN Tunnel Count

Each spoke MX will form a tunnel to all remote MX hubs using every available local and remote uplink path.

In its simplest form, assuming a spoke is using 1 uplink and peers with 1 hub (also with 1 uplink) only 1 tunnel is built.

In example A above, each spoke has two uplinks to a single remote uplink so the spoke would create 2 tunnels. Example B would result in 4 tunnels terminating on each spoke (2 local x 2 remote = 4 total available paths).

If multiple hubs are deployed as shown in Example C below, the same math applies. Calculating the total tunnels for the spoke is as follows:

Spoke Sizing: Example C

  • 1 spoke with 2 uplinks
  • Spoke > hub DC1
  • 2 local spoke uplinks
  • (2 local uplinks) x (1 peers with x 1 uplink) = 2x(1×1) = 2 SAs
  • Spoke > hub DC2
  • 2 local spoke uplinks
  • 2 remote hub uplinks
  • (2 local uplinks) x (1 peers with x 2 uplink) = 2x(1×2) = 4 SAs
  • Spoke > DC1 + DC2 = 2 + 4 = 6 Total SAs

meraki mx spoke sizing example

Mesh and Hybrid AutoVPN Deployments

An alternative WAN topology model is full mesh, where every MX node is a hub and builds one or more tunnels to every other MX in the WAN fabric. This offers low latency and direct access to every other location, but comes with a tunnel count tax. Full mesh can work well in deployments under 100 sites, but doesn’t scale well as the number of nodes grow – especially when building for SD-WAN with multiple uplinks per site.

Hybrid deployments blend the scaling benefits of hub and spoke with the connectivity flexibility of mesh. This often comes in the form of regional or secondary hubs that host services only a subnet of spokes need a direct tunnel to. This model requires proper inter-hub route and scale planning and is the most complex.

Both mesh and hybrid architectures are covered in much more depth (including tunnel count calculations) in the Cisco MX SD-WAN Connectivity Models writeup.

Step 5: Inspection Services

If the MX appliance is going to be positioned to have direct internet exposure or used for inline threat inspection, it’s likely that some or all of the Advanced Security features will be enabled. Doing so does add processing overhead to the unit. Inspection features with the largest performance impact are content filtering, AMP (anti-malware), and IDS/IPS.

We’re talking about a security appliance after all, so it’s expected that customers run these threat engines. The real-world throughput hit incurred on current hardware platforms is nominal as can be seen in the Advanced Security Throughput row in the metric table below.

Use the Advanced Security features but size accordingly.

Step 6: Putting It All Together

Based on metrics collected from steps 1-5, reference the sizing table to determine the minimum MX hardware model that would meet all of the physical and technical requirements.

MX Sizing Metrics

1 The official MX datasheet shows MX450 with a max concurrent tunnels value of 5,000. While this is possible in a lab environment with limited spoke VPN traffic, real world deployments should stay under 1,500 tunnels. This recommendation comes from experience with dozens of super-scale MX deployments. This is largely influenced by the number of packets per second (pps) processed by the MX450 concentrator. Voice and video applications often exhibit small, but high volume traffic patterns which have a dramatic effect on the total headend pps when scaled up.

2 The vMX100 provisioning documentation requires a m4.large EC2 instance to be used as the underlying virtual server powering the vMX. As such, the performance characteristics are largely dependent on the m4.large EC2 instance’s capabilities. Experience deploying and operationalizing vMX100s as primary data center VPN hubs has shown that the 250 max concurrent tunnel count metric is conservative. Production vMX100 deployments exist in the wild that are supporting 1,000+ spoke nodes successfully using a specific route override feature that reduces route overhead on all nodes by consolidating the AutoVPN route list advertised to each spoke node. If this is critical to your deployment, talk to your local Cisco Meraki Systems Engineer about enabling the route summary feature.

MX Sizing Example: DC Campus Firewall

Requirements:
  • Rack-mountable
  • Redundant power
  • 750 Mbps throughput
  • AD-based FW rules and content filtering
  • IDS/IPS/malware inspection
  • 350 Client VPN users, no site-to-site VPN
  • 900 campus users
Sizing Criteria:

MX Sizing Example: DC VPN Concentrator

Requirements:
  • Rack-mountable
  • Redundant power
  • 2 Gbps VPN throughput
  • 1,000 remote VPN sites
Sizing Criteria:

MX Sizing Example: SD-WAN Branch

Requirements:
  • 100 Mbps stateful throughput
  • 75 Mbps VPN throughput
  • Content filtering
  • IDS/IPS/malware inspection
  • 2 remote VPN peers (data center hubs)
  • 200 branch users
Sizing Criteria:

The branch MX sizing example above is illustrative of the fact that it only takes a single critical metric (max clients in this case) to push the hardware platform up a level. MX64 or MX65 would meet all of the technical needs of the deployment but would be under the supported client capacity. In situations like these, the right decision is always to round up.

Summary

Cisco Meraki MX security appliances are deployed for a variety of use cases and can serve a number of different roles from security inspection and traffic segmentation gateway to more advanced SD-WAN VPN aggregation engines. Understanding the MX posture, features, throughput requirements, and environment size all contribute to a proper sizing process.

The sizing steps outlined should provide a useful framework for just about any WAN or security architecture. Consider all the relevant features for the deployment, stay within the recommended max metrics, and round up if needed.