Adventures in IWAN – Part 2 Intelligent Path Control

Intelligent Path Control

Broadly speaking Intelligent Path Control looks at monitoring traffic live across multiple transport links and making a next-hop decision on the fly to make sure the traffic you define in policy always chooses the best path automagically.  Application aware routing really.

If your expensive backup-links are underutilized, or you want to take advantage of multiple Wan transports and you want to do any kind of load-sharing then traditionally you would be using all your complex BGP tricks at the routing level, and if you wanted to do any kind of application monitoring to connect the two, then there would be some manual Netflow checking, SLA probes etc.  The entire  SD-WAN market was spawned, in part, to solve this problem.

Intelligent Path Control looks to remedy this by automatically routing traffic per class based on real-time performance of links to the most optimal transport on the fly.  This is useful, as routing protocols in the main are blissfully unaware of brownouts, soft errors or intermittent short lived flapping etc.

In a nutshell –  Intelligent Path Control is intelligent routing based on performance.

This “Performance Routing (PfR)” is what enables Intelligent Path Control, so here we can end the circuitous marketing and talk about PfR from this point in.

Under the covers PfR consists of Cisco’s SAF (Service Advertisement Framework), Monitoring Mechanisms, NBARv2 and Netflow v9.  It influences forwarding decisions in PfRv3 not by altering the routing table in the main, but through dynamic route maps or Policy Based Routing to change the next hop.  It can also enable some prefix control, injecting of BGP or static routes.

Before PfR, in a Cisco environment you would rely on manually using a bunch of scripts, static routes, PBR etc. to get anything like intelligent path control, and this would be a long way from dynamic or automatic. Then you would somehow be trying to stitch this together with some monitoring maybe using Netflow, or your IPSLA probes.  All very manual, all very labour intensive.

As with DMVPN, PfR is made up of a number of components, these are below and I will cover each one in turn to get an understanding of how this solution all fits together.

  • Performance Routing
  • PfR IWAN components – Controllers and Border Routers
  • Monitoring and Performance routing – NBAR2, Netflow
  • Service Advertisement Framework (SAF)
  • Paths
  • Prefixes
  • Smart Probes (optimistation, Zero SLA)
  • Thresholds
  • Steps to set up PfR – Traffic Classes and Channels
  • Routing
  • Transit site Preference
  • Backup Master controller
  • Prefixes
  • Routing
  • Controlled and uncontrolled traffic
  • Path of Last Resort.
  • VRFs

Performance Routing (PfR)

PFR (performance routing) initially came out of OER (Optimised Edge Routing) and version 1 appeared in the year  2000 with prefix route optimisation.

PFRv2 with application path optimisation then came in 2007.

PFRv3 is the latest version with new functionality, evolving a well-established and long standing Cisco IOS feature for over 10 years.

Essentially Pfr monitors network performance and makes a routing decision based on policies for application performance.  It load-balances traffic based on linked utilization to use all available wan links to the best performance per application.

PfR is really a passive engine for monitoring and gives you a superset of Netflow monitoring with around 40+ measurements.

  1. First you start with defining your polices – There are two ways here, either by DSCP or by Application.  If you use application you enable NBARv2.  There are some very handy defaults here.
  2. Then you learn your traffic.
  3. Once the traffic is learned the next step is to measure the traffic flow and network performance and report this to a Master Controller
  4. Finally you choose your path.  Performance Routing, via the Master controller, will change the next hop based on what you have learned about the traffic and how you have set-up your policies for each traffic class and link.

PfR can automatically discover the branch once connectivity is established.  The CPE/Branch (BR) is discovered and is automatically configured – auto starts.  The whole idea was to make the PfR configuration as simple and light-touch as possible.

PfR IWAN components

A device (Cisco ISR/ASR router, virtual or physical) can play one of 4 roles in PfRv3.  It is important to outline these, as the HQ Border Routers are used simultaneously as DMVPN hubs and PfR BRs.  By separating out the functionality you know which design you are talking about –  the transport overlay, or performance routing.

Let’s now look at the components involved in PfR.

Master Controller

The Master Controller(MC) is the decision maker and the Border Router(BR) is the forwarding path (roughly analogous to Controllers in other vendors’ SD-WAN architectures, this one just happens to be a router, physical or virtual).

You need a Master Controller at every site, so inevitably there is some confusion when it comes to the Hub Master Controller.  I look at the Hub Master Controller as the place where all the policy is configured and then distributed to the rest of the Master Controllers in your network as they join your IWAN domain.  As the Hub MC is looking after the IWAN domain this will normally be a separate device at the hub (physical or virtual) for scale.  An IWAN domain is basically all your routing devices that participate in the Path Control.

If you look at a typical IWAN design you see a Hub Master Controller at the central site (and a second maybe on a redundant central site or DC).  Cisco should probably have called this an IWAN domain controller or something.  Instead they call these Hub Master Controllers (HUB MCs),  central sites are IWAN POPs (Points of Presence), and each central site has a unique POP-id.

In Cisco IWAN domain you must have a Hub site, you may have a Transit Site and of course you have your Branch sites.

The Master Controller functionality doesn’t do any packet forwarding or inspection, but simply applies policy, verifies and reports.  The MC can also be standalone or combined with a Border Router (BR).  You have to have a Master controller per site (branch, hub etc.) so policy can be applied and verified.  If you only had one router at the branch with 2 transports then the border router would be a combined Master Controller and Border Router MC/BR.

Master Controller as a term comes up in several places then, but all you have to do is differentiate between the functionality of the one at the central site, and the ones everywhere else and it all becomes simpler to understand.

So a ultimately there is central place for configuration, policies, timers etc. (your Hub MC), but a completely distributed MC control plane.  It is important to know that even if you lose your HUB MC, you still have the local MCs optimising and controlling your traffic. A distributed control plane.

Let’s briefly go through each of the components involved in PfR in turn, and then look at the monitoring:

  • Hub Master Controller:  This is where all the polices are configured, and is the Master Controller at the hub site, Data Center or HQ.  Also for this central site it makes any optimisation decisions for traffic on the Border Routers (BRs).
  • Hub Border Router:  At each central site you need Border Routers to terminate WAN interfaces and PfR needs to be enabled on these.  The BR needs to know the address of its local Master Controller (the Hub Master Controller in this case), and you can have several hub BRs and indeed interfaces per BR.  As far as PfR is concerned you also need a path name for external interfaces – I will come onto Paths shortly.
  • Branch Master Controller:  I mentioned you need an MC on each site to make the optimisation decisions, but in this case there is no policy configured as with the MC at the Hub.  Instead it receives the policy from the Hub Master Controller. Obviously it therefore needs the IP address of the hub MC.  At a branch the MC can be on the same device as the border router – an MC/BR.
  • Branch Border Router:  The branch Border Router (BR) terminates the Wan interface and you need to enable PfR on this.  It also needs to know where its MC is (if it is on a separate box).  One enabled for PfR the Wan interface is detected automatically.  As noted earlier the branch Border Router can also house the Master Controller for he site.

All Master controllers  peer with the hub MC (or IWAN Domain Controller) for their configuration and policies.

Branches will always be DMVPN spokes.

Every site runs PfR to get path control configuration and policies from the IWAN domain controller through the IWAN Peering Service

A Cisco diagram probably does a better job than a scribbling of mine to get the components across visually.

IWAN domain

One other confusing term in the architecture is that of Transit (this is the problem when you use terms with multiple meanings, even if it accurately describes a functionality.)  I understand a transit site as a redundant HUB Data Centre with a Redundant HUB MC (IWAN domain controller).  So exactly the same as the HUB site, the only difference is you do not define the policies here, these are copied from the HUB MC.  The Transit site also gets a POP-ID

Remember each central site gets a POP-id.

Technically traffic can transit a branch site to get to another branch site if you get the routing and advertisements wrong, but this can get pretty messy and is best avoided.

As with most network architectures, a solid predictable routing design where you know your expected routed paths is the key to a stable and robust IWAN deployment.

Monitoring and Performance Routing

Unified Monitoring (Performance Monitor) and application visibility includes bandwidth, performance, correlation to QoS queues, etc. and is responsible for performance collection and traffic statistics.

As I mentioned at the start PfRv3 is an evolution of Optimised Edge Routing (OER) which was prefix route optimisation with traditional Netflow for passive monitoring and IPSLA probes for active monitoring.  This moved to PFRv2 to add application routing based on real-time performance metrics and then PFRv3 which adds a bunch of stuff including smart probes, NBARv2, one touch provisioning, Netflowv9, Service Advertisement Framwork, VRF awareness etc.

For application recognition (dynamic or manual) IWAN and PfR use NBAR2.

NBAR2 – Network Based Application Recognition, is way of inspecting streams of packets up to layer 7 to identify applications,  it provides stateful deep packet inspection on the network device to identify applications, attributes and groupings.

Cisco defines this as a Cisco cross platform protocol classification mechanism.  It can support and identify 1500+ applications and sub-classifications, and Cisco adds new signatures and updates through monthly released protocol packs.  It can identify Ipv4, Ipv6, ipv6 transition mechanisms, a bunch of application like TOR, Skype, MS-Lync, Facetime, Youtube, Office 365 etc.  then you can configure policy based on this.

(ios commands)

Ip NBAR protocol discovery

IP NBAR custom transport

You match the protocol from NBAR when setting up your QOS policy for IWAN and then your TCA levels (Threshold Crossing Alerts).  TCAs are pretty much how they sound, you cross predefined thresholds around jitter, loss, and delay then create an alert that the controller can then act upon for a path.

Netflow – (developed by Cisco) essentially collects IP traffic information and monitors network traffic.  There is Netflow V5 and V9.  Netflow v5 is a fixed packet format and has fields like src and dst Ip, number of packets in the flow, src and dst ports, number of L3 packets in the flow, protocol, Type of Service, TCP flags etc.   PfR also takes advantage of Netflow v9 which adds a cornucopia of extra information to customise and report on, and you can define what you want to report on as well by creating your own custom flexible flow record.

For more information see Netflow V5 export format

For more information on Netflow v9 see Netflow v9 Format

IWAN Monitoringall the PfR stats are exported using Netflow, so it is important to have a monitoring platform that supports Netflow V9 to get the most out of your monitoring for PfR for visibility.

Service Advertisement Framework (SAF)

PFRv3  has a concept of a peering domain or Enterprise domain for service coordination and exchange at the wan edge.  SAF is the underlying technology here.

SAF creates the framework for passing service messages between routers, with SAF forwarders and clients – basically a service advertisement protocol.  There is some considerable detail under the covers, but in IWAN it has been made very simple to configure.  (If you have history here you will remember SAF headers, database, client APIs, Client and Forwarding protocols, transitive forwarders, services etc.  Fortunately improvements mean the exact nature of the underlying mechanisms are improved and hidden now).

In essence the advertisement of the SAF service uses EIGRP as the transport layer for the advertisement, and completely separate to the IP routing protocol you are using to actually forward packets . It also uses link-local multicast for neighbour discovery.

Since you are using EIGRP as the engine for service advertisments then this also comes with split-horizon and DUAL (Diffusing Update Algorithm) to prevent service advertisement loops.

SAF relies on the underlying network and DMVPN in order to know about and communicate with its peers, as through the tunnel they are effectively one hop away. There can be confusion here again, as it easy to see EIGRP and assume this is related to underlying network connectivity, but for SAF, provided there is IP connectivity to the domain Border routers  (i.e. through the DMVPN tunnel) peers can communicate and pass service advertisements between each other as they have established SAF topology awareness and neighbours (peers) through the overlay EIGRP control engine.

Also SAF is efficient in that it only sends out incremental updates when advertisement changes occur and does not periodically broadcast/flood service advertisements.

In Iwan you have the concept of a Path

A Path name in IWAN identifies a transport

These Paths are identified with a Path-ID  You manually define the path name and path-id on the Hub and Transit BRs,  they then start sending discovery probes to the branches and these probes contain information about the path:  namely the Path Name, Path-ID and POP-ID.

The Path-ID is unique per site.

Paths in IWAN also have three labels.  Preferred path, Fallback Path and the next Fallback Path, and under each of these labels are three actual paths

For example you might say for this DSCP or this application (e.g EF) your preferred path is MPLS,  for something else the preferred path is Inet1 etc.

Each transport is a Path, and understanding the concept of a Path and Path-ID certainly makes troubleshooting easier when you are looking at traffic that has changed paths, when, and the reason why.

Path of Last Resort

One final bit of confusion in IWAN is that at the Hub you need a BR per transport.  At the branch/CPE you can have 2 transports per router, and if you want to do 3 transports, then you must have a second router for the 3rd transport (There are rumours this will be up to 5 transports per router in a future release, but we wait and see). There is also this thing called Path of Last resort, which some see as, “great, 3 transport are really supported per router. ” Turns out, No!

Basically if all other paths are unreachable, then we can fallback to the path of last resort.  This is not the same as the monitoring and control you get on the other paths.

PfR will not probe as usual (smartprobes – instead of sending 20pps, will be reduced to 1 packet every 10 seconds – so really just a keepalive).  Also path unreachable will be extended to 60 seconds.  So really this is to be used if you have a 4G/LTE connection as a last resort or backup path for your traffic if all else fails.  You add this config on the central site.

SummarySteps of setting policy for PfR

PfR – First you define a Class (a but like a class-map if you are familiar with QOS policy config in Cisco IOS), then this has a match on DSCP or Applicaton,  then you have your transport preference (Preferred, Fallback, Next Fallback), then your performance threshold based on loss, latency or jitter to decide which is the preferred path.

PfR actually work on the basis of a traffic class which is not an individual flow, but an aggregation of flows.

The traffic class is based on Destination Prefix, DSCP and Application name. (obviously not app name if just DSCP is used).  For each traffic class PfR will look at the individual next hop.

Performance metrics are collected per channel:

Per channel means:

  • per DSCP
  • per Source and Destination site
  • per Interface

A Channel is essentially a path between 2 sites per DSCP, or a path between 2 sites per next hop.

A Traffic Class will be mapped to a channel.

Channels are added containing the unique combination of DSCP value received, site id and exit.

PfR controlled and uncontrolled traffic

There is a concept of PfR controlled and uncontrolled traffic – and if some new traffic is seen for the spoke, then for 30 secs the normal routing table controls the traffic destination.  When this comes under the control of PfR then it abides by Threshold Control and is directed accordingly.

There is also an unreachable timer in PfR determined by PfR probes do detect the reachability of a channel.  This is seen as down once traffic is not seen either for 1 second, or if there is no traffic on the link and smart probes are not seen for 1 second.  These are the defaults I believe, but there is a recommendation to set the timer to 4 secs.  I assume this will become the new default at some point.

So for failover, blackout will be the 4 seconds above, for brownout then this is 30secs by default, but again can be reduced down to 4 or 6 seconds.

Performance Monitoring (PerfMon)

Unified Monitoring under PfR is enabled through Performance Monitor (PerfMon) which has been around a while and you might be familiar with it from Voice roll-outs.  It is responsible for performance collection and traffic statistics.

Application visibility includes bandwidth, performance,  and correlation to QoS queues

When it come to IWAN domain policies and domain monitoring there are 3 performance monitors to be aware of in PfR

  • Performance Monitor 1: to learn site prefixes (applied on external interfaces on egress)
  • Performance Monitor 2: to monitor bandwidth on egress (applied on external interfaces on egress)
  • Performance Monitor 3: to monitor performance on ingress (applied on external interfaces on ingress)

IWAN uses these performance monitors to get a view of traffic in flight (traffic flowing through the interfaces) to look for performance violations and to make path decisions based on this.  Border Routers (BRs) collect data from their Performance Monitor cache, along with smart probe results (described below), aggregate the info and then direct traffic down the optimal forwarding path as dictated by the Master Controller.

Monitoring and optimisation – Smart probes

When there is no user traffic, (e.g. a backup path) then probes are sent to get the monitoring. These are called Smart Probes

Smart Probes are used to help with the discovery but also for measurement when there is no user traffic. These probes are generated from the dataplane.  Smart Probe traffic is RTP and measured by Unified Monitoring just like other data traffic

Smart probes add a small overhead to bandwidth on a link, but this is not performance impacting in general and can be tuned.

The Probes (RTP packets) are sent over added Channels to the sites discovered via the prefix database. Without actual traffic, BR sends 10 probes spaced 20ms apart in the first 500ms and another similar 10 probes in the next 500ms, thus achieving 20pps for channels without traffic. With actual traffic, a much lower frequency is observed over the channel. Probes sent every 1/3 of [Monitor Interval], I.e. every 10 sec by default.

That is 20pps per channel or per DSCP.  

Zero SLA is another feature that is often missed and should be mentioned. If you are concerned about a very low bandwidth link and that you would be sending smart probes per channel or DSCP, then you can configure Zero SLA so only DSCP 0 uses smart probes on secondary paths. All the other channels  now do not get smart probes, only DSCP 0.  If you have a 4G or low bandwidth link and are worried about overhead this is definitely an option to have in the back pocket.

Smart probes are of three types:

  • Active Channel Probe: Active channel probe is sent out to measure network delay if no probe is sent out for past 10 seconds interval.
  • Unreachable Probe: Unreachable probe is used to detect channel reachability when there is no traffic send out.
  • Burst Probe: Burst probes are used to calculate delay, loss, jitter on a channel that is not carrying active user traffic.

For low-bandwidth links (e.g a DSL or 4G.LTE) it is possible to tune this further to have even less overhead – for example the below command:

smart-probes burst quick number-of-packets packets every seconds seconds


The whole point of defining thresholds is to look for a passing of a threshold or a performance violation – if we see this then an alert is sent to the source Master controller (a Threshold Crossing Alert or TCA) from the Border Router.  It is at this point that PfR controls the route and changes the next hop for an alternative path as per the policy configuration i.e. re-routed to a secondary path.  It is not PBR (policy based routing) as you might already be familiar with, but is similar in that the remote site device knows what to do with this traffic class and routes it accordingly based on policy.  The local Master Controller makes this decision.

All the paths are continuously monitored so we have awareness across the transports.

Routing and PfR

In part one we went through some of the choices around routing in DMVPN.  Well there are additional considerations with PFR.

One of the reasons EIGRP and BGP are preferred for IWAN is that alternate paths are available in the BGP and EIGRP topology table, and as PfR looks into these tables to make decisions and change next hops based on policy, they are well suited.

Scale.  The first thing pfr does is look at the next hop of all paths.  It looks in the BGP or EIGRP table.  If you show your routing table you have one next hop per path, but because PfR looks at both the routing table and the topology table, it see the next hops for both paths.

With EIGRP you can adjust the delay to prefer MPLS, so this combined with the EIGRP stub feature means you can control routing loops.

With BGP you would have the hubs configured as the route reflector for BGP, and to prefer MPLS you can simply set a high local pref for MPLS.  If you have say OSPF on the branch then you redistribute the BGP into OSPF, and set a route tag on the spokes to identify routes redistributed from BGP.

As ever there are many ways to configure BGP, but the validated designs guide you to one relatively simple way.

If you looked at using OSPF for example, well PfR does not look into the OSPF database, therefore relies on the RIB (Routing Information Base), so in order to support multiple paths for decision making you would need to run ECMP (Equal Cost Multi Path) – far from ideal.


When a site or a branch is part of PfR it advertises its prefix to the HUB-MC, and then this forward this to all the MCs in the domain.  

This can be confusing because obviously BGP or EIGRP send prefixes, but PfR also sends prefixes.  One of the performance monitors will collect the source prefix and mask and advertise this to all Master Controllers.  It uses the domain peering with the HUB-MC and then this will reflect this prefix out to all the other MCs in the domain.

Ultimately you end up with a table with a mapping of site-id to site prefix and how this was learned i.e learned through IWAN peering (SAF service advertisement framework), configured, or shared.

It is important that attention is paid to your routing (of course, it is always important that you pay attention to the routing) because in advertising the prefixes, PfR looks in the topology table based on the IP address and mask to dig out the prefix.

There are two Prefix concepts to be aware of 1) Enterprise Prefix List and 2) Site-Prefix

Enterprise Prefix list  is a summary of all your subnets, or all your prefixes in your IWAN domain.  This is defined on the HUB-MC for the domain.

A prefix that is not in this prefix-list is seen as an Internet prefix and load-balanced over the DMVPN tunnels.  This is important, as If there is no Site-id for example, (the site is not enabled for PfR), then you don’t necessarily want traffic to be load-balanced, such as Voice for example.  So it is important to make sure you have a complete Enterprise Prefix List.  Once included in the Enterprise prefix list, PfR will know that traffic is heading to a site where PfR is not enabled and will subsequently know not to load balance it.

Site-Prefix – the site prefix is dynamic in PfR, so that on a site Perfmon will collect the traffic outbound, look at the IP address and mask, and then advertise that prefix up to the hub through PfR.  On the hub and transit site however you want to manually advertise the Site-Prefix that is advertised.

Prefixes specific to sites are advertised along with site-id. The site-prefix to site-id mapping is used in monitoring and optimization.

It is important that the right Site-Prefix is advertised by the right Site-id

Transit site preference

When you have multiple transit sites (or multiple DCs) with the same set of prefixes advertised from the central sites, you can prefer a specific transit site for a given prefix – the decision is made on routing metrics and advertised masks, and this preference takes priority over path preference.  The feature is called transit site affinity and is on by default (you can turn this off with no transit-site-affinity).

Traffic Class timers – if no traffic is detected for 5 minutes then the dynamic tunnel is torn down.

BACKUP Master Controller

BACKUP Master Controller – you can have a backup Master Controller but it should be noted that today there is no way to provide a stateful replication to the Backup Master Controller – the two are not synched.  The way to do this is to configure the same PfR config on both, and the same loopback address on the backup controller but instead use a /31 mask so that should the primary go away the BRs will detect the failure and reconnect to the Backup Master Controller – so stateless redundancy.

The backup MC will then re-learn the traffic.

In the meantime Branch MCs keep the same config and continue to optimise traffic as you would expect – Anycast Ip.   We follow the routing table and do not drop packets, which is why you set the MPLS prefer.

On the branch you need a direct connection between the BRs – on the HUB you just need IP connectivity.

Finally VRFs

VRF-Lite is used with all the same config ideas but per VRF.  Your overlay tunnel is per VRF (DMVPN per VRF) and your overlay routing is also per VRF  (VRFs are not multiplexed in one DMVPN tunnel!).   Under PfR, I mentioned that SAF (Service Advertisement Framework) was part of the magic behind PfR, well the SAF peering for advertisements is also per VRF, as is MC and BR config, and also policies are per VRF.

Monitoring – all the PfR stats are exported using netflow, so it is important to have a monitoring platform that supports Netflow V9 to get the most out of your monitoring for PfR.


Too much to learn

Ok, so that was a lot to take in I agree.  But hopefully by breaking down the component parts a little, next time you look at IWAN you will at least have a place to start ,and understand what is actually going on when you select the drop downs in an IWAN GUI.

When you first look at IWAN you have terminology flying at you at an alarming rate. Much of it sounds familiar(ish), and it is easy to leap to a feeling of general understanding, until you realise that you are talking at cross purposes when it comes to EIGRP, or you are not sure exactly what Transit is, or the meaning of a Path.  Hopefully the above provides some context for deployments.

What I would say is that once you understand the components, deployment is surprisingly light touch and easy through your choice of IWAN app and gui.  In fact it is not too bad without really understanding it all,  but it is always best to understand what you have just done.   If you look at other SD-WAN vendors (and I will cover some of the broader protocol choices in another post), the GUIs have abstracted much of the underlying workings.  This makes it all seem “oh so simple”, and to be honest it should be like this.  But as long as you understand that abstractions have been made and that there is no magic, you will quickly get a good feel for the various technologies.   You will understand the protocols and design choices, and be able to identify the innovations that have been sprinkled along the way.

Finally you have a number of options when it comes to monitoring and orchestration with IWAN.  All take the pain away from setup and all work towards enhanced visibility. The fact you have some products marketing IWAN deployments in 10 minutes shows how the mechanisms can be streamlined through abstraction and automation.  In brief, your main options are below.

  • Orchestration – Cisco Prime, Anuta networks, Glue networks, Cisco VMS, Cisco APIC-EM, Cisco Network Service Orchestrator
  • Visualisation/monitoring – Cisco Apic-EM, Living Objects, Liveaction, Cisco Prime/VMS.

Hopefully by now you have enough of a feel for the technology to jump into the validated designs for IWAN productively, and deploy a whizzy tool with growing confidence. You never know, IWAN might be less painful than you might have feared, despite first impressions.


Adventures in IWAN – Part 1 – Transport Independence

For a number of reasons, and part of wider interactions with SD-WAN, I have been having a few adventures with Cisco’s current SD-WAN offering  – Intelligent WAN or IWAN (2.1).  Whether IWAN is an SD-WAN as you understand it from other vendors is a topic for another day, but I thought it might be useful to cover a few things I have come across.

This is intended to be a multi-part post with the first 2 parts covering the first 2 pillars (hopefully throw in a few config examples after part 2), and the third eventually covering pillars 3 and 4.

As at least 80% of getting IWAN up and running is in the first 2 pillars, I  am going to focus on these primarily.

One thing to note from dealing with Cisco IWAN so far, is a lot of the underlying mechanisms are exposed.  This has helped me, personally, to improve my understanding of all SD-WAN vendors and how their solutions fit together (e.g.  what is the overlay and how does it actually work? What affects the control plane?  What is being provided in the Data Plane?  Overlay routing for dynamic point-to-multipoint with encryption?  How exactly are you doing encryption?  What protocols are you using?  How are you managing key distribution and re-keying?  How is traffic diverted to the device or inline? How is performance monitoring working?  App optimisation?  Is it flow based?  How are you looking into the flow? Application identification – what method? Controller for traffic control?  Real time? Orchestration?  etc.)  Basically, what is the magic?

Are we ready?  OK. Strap in, here we go.

The Building blocks of IWAN

Have a quick look at the Cisco picture below and you can see the 4 pillars of IWAN


Each pillar of IWAN has underlying technology building blocks and those technologies also have foundational components.  Hopefully I will provide some clarity on the the building blocks to layer on top of each other to help produce a shiny polished IWAN solution.

The first 2 pillars of IWAN are – 1) Transport Independence and 2) Intelligent Path Control.

Part 1

 Transport Independence

The fundamental technology underpinning transport independence in IWAN is DMVPN (Dynamic Multipoint VPN) as the transport overlay technology, and this also has component parts.

So what is DMVPN fundamentally?

It is a combination of 4 things:

  1. Multipoint GRE tunnels
  2. NHRP (Next Hop Routing Protocol) – basically creates a mapping database of the spoke’s GRE tunnel interfaces to real (or public) addresses.  Think of this like tunnel overlay IPs ARPing for the “real” underlay IP addresses.
  3. IPSEC tunnel protection – creates and applies encryption policies dynamically
  4. Routing – Essentially the dynamic advertisement of branch networks via routing protocols, e.g BGP, EIGRP, OSPF, RIP, ODR.

Let’s cover each one in turn, then you will have your tunnel overlay or secure transport independence sorted.

DMVPN – your overlay transport technology

Multipoint GRE tunnels

If you are familiar with GRE you will be familiar that you create a tunnel with an extra GRE header  between two endpoints.  You create a tunnel interface (virtual interface) with an address, and tie this to a real source and destination address on actual interfaces that terminate the tunnel.

A couple of pre-canned Cisco diagrams do the trick here for the sake of illustration:

GRE tunnel


Multipoint GRE  broadens this idea by allowing a tunnel to have “multiple” destinations and you can terminate the tunnels on a single interface.  Handy for Hub-and-Spoke, and Spoke-to-Spoke I think you will agree.

So Multipoint GRE is your tunnel overlay SD-WAN transport in the Cisco world.  Well that was simple, so onward to the less straightforward.

Next Hop Resolution Protocol – NHRP

The next building block of DMVPN is NHRP, and this provides a way of dynamically mapping all those multi-point GRE tunnel interfaces you just created with their associated real addresses or underlay transport network.

NHRP has actually been around a while in different forms and originates from an extension of the ATM ARP routing mechanism which dates back to 1998/1999 as a technology.

Think of NHRP (Next Hop Resolution Protocol) as like ARP but for the underlying real IP addresses.  So you have a physical interface on your wan router with an address, and you have a GRE tunnel address on that same router.  One is your IP underlay and one your IP tunnel overlay.  You now need a way to map your IP underlay network to your IP tunnel overlay network, and NHRP does this job.

By way of visualization, I particularly like the below diagram from Cisco which shows very clearly which are your overlay addresses, which are your tunnel addresses, and which are your real addresses or NBMA addresses.  As a distinction it might help to think of GRE as your transport overlay technology (each multipoint GRE tunnel maps to a WAN transport), and your overlay network as the network addresses you wish to send over this tunnel, so a network overlay.


A spoke router will register with a Next Hop Server (NHS) as it comes up,  (you will give the spoke a NHS address to register with, and incidentally a multicast address for broadcast over the tunnel if the underlying network does not support IP multicast – useful for routing protocols).  Once registered, the NHRP database will maintain a mapping of Real addresses to Tunnel Addresses.  Once registered, if a spoke needs to dynamically discover the logical tunnel IP to physical NBMA IP mapping for another Next-Hop-Client (spoke) within the same NBMA network, then it will do an NHRP resolution request to find this.  This discovery means you do not have to go via the Hub every time for Spoke to Spoke communication – so the Dynamic part of DMVPN really. You can create dynamic GRE tunnels from a spoke (and ultimately encrypted tunnels) on the fly by querying NHRP, find the real NBMA address of another spoke and, voila, you have the peer information to set up your tunnels direct.

Nb. There are some interesting CEF details with NHRP between DMVPN Phase 1, 2 and 3 but that is follow on reading I would say.  Allowing a layer 2 resolution protocol to ultimately control your layer 3 direction and interactions is maybe controversial for the purist, and I will doubtless attempt to cover this when looking at some other SD-WAN techniques in other posts.

In short all spokes register their NBMA addresses with a Next hop Server (hub typically), and when a spoke needs to send a packet via a next hop (spoke) on the mGRE cloud or transport overlay, it asks the NHS (via a resolution request) “can I please have real/NBMA address of this next hop?”, the NHS replies with the NBMA address of the other spoke, and from this point the spokes can speak directly.


IPSEC tunnel protection 

IPSEC is the suite of protocols that enable the end to end encryption over the network in IWAN.  We are using IKEv2 and IPSEC.  Remember you can get DMVPN working as on overlay transport without encryption; this is optional (but good practice for security). Technically you just need your routing, multipoint GRE tunnel overlay network, and NHRP,  then you can add encryption once network connectivity is sorted.  I have found this is a good way to build the solution in blocks to make troubleshooting easier.

It is a little involved to go into here, but essentially IPSec Phase 1 identifies who you want to form an encrypted tunnel with and securely authenticates the peer (and sets some parameters for Phase 2), and then Phase 2 agrees on what to use to actually encrypt the traffic.  The fundamental problem is that when you have to create a lot of point to point IPSEC tunnels, you need some way to tell the devices what the address of the peer is so it can create an encrypted tunnel.  Each would then be an individual configuration for every peer to peer connection, managing keepalives (Dead Peer Decpection), and failover etc.   If you want on-demand dynamic spoke-to-spoke encryption, then IPSEC needs some work.  There are a number of ways to solve this, but DMVPN  phase 3 (Multipoint GRE and NHRP)  has been used for some time and is the method of choice today in IWAN.

With DMVPN it is always worth covering headers and how they are used in the real world should you choose to use IPSEC.  This way you can visualise the overlay network.

Typically you use transport mode with DMVPN so what does this mean and why use this with DMVPN?

Header confusion

There are Encryption Headers and GRE headers, do not confuse or conflate the two.

IPSEC uses 2 distinct protocols to either encrypt or authenticate your Layer 3 payload. These are ESP header (Encapsulating Security Payload) and AH   (Authentication Header) and both add headers to your packet.  They both also run in one of two modes, tunnel or transport.  These modes either use the original IP header (transport), or add a new IP header (tunnel) in order to traverse the network.  This is outlined clearly in the diagram below.


The next level of header confusion comes with GRE – which also adds an IP header.

Your original packet might look something like:

IP hdr 1   |   TCP hdr  |    Data

GRE Encapsulation:

IP hdr 2   |    GRE hdr  |   IP hdr 1   |    TCP hdr  |   Data

GRE over IPsec Transport Mode (with ESP):

IP hdr 2   |   ESP hdr |    GRE hdr  |    IP hdr 1   |   TCP hdr   |   Data

GRE over IPsec Tunnel Mode (with ESP):

IP hdr 3   |   ESP hdr   |   IP hdr 2   |   GRE hdr   |   IP hdr 1 |   TCP hdr   |   Data

Transport mode only encrypts the data payload and uses the original IP header – whereas tunnel mode will encrypt the whole IP packet (header + payload) and use a new IP header.

In DMVPN both the GRE peer and IPsec peer addresses are the same, so typically transport mode saves on header addition which is essentially repeating identical information (20 bytes saved right there).

Typically you use ESP with Transport mode for DMVPN

Now you should have a reasonable view of the Encryption overlay and the GRE overlay and the headers that are added end to end.


Routing comes up in two areas of IWAN, one in the transport independence piece, and again in the best path selection with PfR, but it is important not to confuse the two.  For example, PFR uses EIGRP for the Service Advertisement Framework (SAF), but for the transport piece you could use the same or a different routing protocol e.g. BGP, EIGRP or OSPF.  When EIGRP is used for your underlay and overlay routing as well (which is highly likely) conversations can get confusing.

You have a router at the customer edge, trying to get to another router at another edge. In between you have a Service Provider network.  Typically in order to get traffic to where you want to go you need to interact with the Service Provider’s BGP network, whether that is BGP advertised default routes, statics, redistribution, whatever is most suitable for you and your SP.

Now with IWAN you are adding a tunnel overlay, and this overlay network needs to be advertised into your current Enterprise network so that traffic that needs to get to another one of your sites knows which next-hop to use.  That the next-hop will now be a tunnel , i.e. you need to use a tunnel to get there.  Remember NHRP is used to do the mappings here to actually get the tunnel traffic across to the real address of the remote site to terminate the tunnel.  So where previously you may have used dynamic or static routing or default route in BGP to say,  “if you want to get to an address that lives across the WAN use the following next hop (WAN interface)”,  well with an overlay you are telling traffic to use your tunnel interface as your next hop.  To advertise these tunnel overlay routes into your network you can either use statics or a routing protocol of choice like BGP or EIGRP.  Of course if your routing protocols are covering both the real WAN interface and your tunnel interface networks, you need to take care that the correct route gets installed into the forwarding table, and that you are learning the information from a consistent place so your routing protocols don’t get confused and bounce the tunnel up and down (the recursive routing problem described a little further down).

As mentioned, the other use of a routing control plane in IWAN is for PfR (Performance Routing) , where the EIGRP engine is used for the Service Advertisement Framework and creates its own neighbours and domains accordingly.

Of course this is logically separate from the underlay and actual traffic forwarding and relies on the overlay network to get connectivity across the WAN between members of the SAF domain for sending SAF information to each other.  That is, the tunnels provide connectivity for SAF peers.

So what does all this mean?  Well it means you can very easily have 3 routing protocol names flying around in conversation confusing everyone on a whiteboard – BGP for underlay,  EIGRP for overlay,  EIGRP for PfR (or any mixture e.g. OSPF, BGP, EIGRP for routing and EIGRP for PfR).  The one constant here is the EIGRP engine is always the mechanism for PfR SAF peering.  However if you separate the PfR / SAF process in your mind as a monitoring technology that just happens to use an EIGRP process to set up its domains (nothing to do with network connectivity) – then the rest is really just routing as normal with care taken over your DMVPN.

DMPVN which routing protocol?

If you have ever configured DMVPN you will know that there are limitations or caveats with each routing protocol.  Let’s have a brief look at these at a basic level, and for simplicity, only with DMVPN phase 3 .

OSPF – You can use OSPF of course, but you need to be a little careful with network types. Point to Point? won’t work because you are using a multipoint GRE tunnel interface. Broadcast?  Well this will work but you need to make sure the spokes will never be elected as the DR or BDR  (IP ospf priority 0 on the tunnel interfaces of the spokes should do the trick here).  Non-Broadcast? – yes this will work as with broadcast but you need to statically configure your neighbours.  Point to Multipoint? – works well with phase 3, and you don’t have to worry about DR/BDR election.  With DMVPN phase 2 it is important to note that Point to Multipoint does not work so well, as this changes the next hop so all traffic goes through the hub router, so not ideal for dynamic spoke to spoke. In phase 2 you have the same issue with OSPF point to multipoint non-broadcast with the addition of having to statically define your neighbours.

What are the issues with OSPF? – well a couple that spring to mind are that in DMVPN you use the same subnet, and therefore all OSPF routers would be in the same area. Summarisation is only available on Area Border Routers (ABRs) and Autonomous System Border Routers (ASBRs), therefore the hub router would need to be an ABR for summarisation.  Also as OSPF is link-state, any change in the area will result in an SPF calculation across the area i.e. all the routers will run an SPF calculation on link change. Misconfiguration of the DR/BDR will break connectivity and traffic engineering has its issues with a link-state protocol.

So OSPF is doable, using NSSA (Not So Stubby Areas) on the spoke and careful config, but for larger scale DMVPN people drift towards BGP/EIGRP.

EIGRP – Is not link state, does not have an area concept and you don’t have to think of the topology tweaks you need to do with OSPF above.  One thing to note in DMVPN phase 2 is that you don’t wan’t the hub setting itself as the next hop for routes, but you can configure around this with EIGRP.  Of course you need to disable split-horizon so routing advertisements are allowed out of the same interface (mGRE tunnel int).  Good advice for scale is to turn the spokes into EIGRP stubs and also to watch for the number of adjacencies the hub has, as hellos can become an issue (you can play with timers here too).  Also EIGRP can summarise and manipulate metrics at any point.

EIGRP is well-suited to DMVPN at scale.


BGP also works for DMVPN – we know it scales (the Internet), and the default timers are less onerous that other protocols.   The choice, as ever, is IBGP vs EBGP.  Whereas with IBGP you might require route reflectors at scale and an IGP to carry next hops, EBGP might need several AS numbers, or you could disable loop prevention.

With DMVPN eBGP, the next-hop is changed on updates outbound, so all good there.  Next question is whether to use the same AS for every site, or unique ASs.  This can limit you to 1024 spokes as the 16-bit AS number allows only for 1024 spokes, but good to prevent loops. With a 32 bit number the private AS number is solved, but there is a deal of configuration at the hub with unique AS numbers.

Say you run the same AS at all sites, well in this case the receiving router sees its own AS number in the AS path of a received BGP packet and it assumes the packet came from its own AS, has reached the same place it came from, so drops the packet.  To get round this you can use as-override, but this can produce loops in the control plane.

iBGP then, back to the next hop modification issue – so with phase 1 and 3 you can use “neighbor next-hop-self-all” for reflected routes on a route reflector.  iBGP with this becomes probably the preferred option when it comes to BGP with this.

iBGP is well-suited to DMVPN at scale.

From the above EIGRP or BGP tend to be the preferred choices for DMVPN.

Now the assumption often with IWAN, is that BGP and EIGRP are chosen entirely because of the above typical reasons.

However in addition to the good reasons above, remember with IWAN you want some method of quick failover to an alternate or best path based on monitoring.  With BGP and EIGRP you have topology tables and feasible successors with alternate routes ready and waiting to go on failure to populate the Routing and Forwarding table and facilitate quick change of preferred paths.

Another very good reason for the use of EIGRP and BGP with IWAN.

So there you have it, a brief tour of the 4 building blocks of DMVPN

Finally, of course, no current discussion of DMVPN would be complete without a brief excursion into Front-Door VRFs and recursive routing.

Front Door VRFs.

These are a very useful technique in IWAN as they simplify paths and configuration a good deal.  What is VRF (Virtual Routing and Forwarding)?  Basically it allows multiple instances of a routing table to exist on a router and work simultaneously.  This is useful as it allows network paths to be segmented without using multiple devices.  Effectively in an IWAN design you put your wan interfaces into a separate VRF (front-door you see) and this avoids some recursive routing problems you may be familiar with using GRE (more on that later).

Recursive routing with GRE

If you are familiar with configuring DMVPN you may be aware that you can get yourself into a pickle when it comes to routing, and in particular, recursive routing.  So if you are using a routing protocol for your overlay and another for your underlay, there could be a conflict here.  For example, if you learn your route both inside and outside of the tunnel for the same prefix, well the router gets a little confused.

If you have ever seen “Tunnel temporarily disabled due to recursive routing” then you know what I am talking about.  The first time you bump into this it can lead to furrowed brows and prolonged head scratching until the light-bulb fires.

So here is the crux of this issue:

If, for example, you have two routers with NBMA (Wan) interfaces addressed at one end with  and at the other, well these are on different networks so you use a routing protocol to get across any intermediate hops to the other end.  Say we use OSPF for this.  E.g Router_A ( – Router_B(ospf) – Router_C(ospf) – Router_B(  These are also your tunnel end-points remember.

Now say you want to use EIGRP to advertise your tunnel network, and you make the easy mistake of having an overlapping network i.e. your GRE tunnel interface addresses are and at each end.  So you may set EIGRP for network (which also happens to cover the NBMA or real addresses).

Ok, so the problem here is that you now have the 10 network being advertised for the NMBA addresses in OSPF and then, when the tunnel comes up, you also have the 10 network being advertised through EIGRP over the tunnel.  So as soon as the EIGRP neighbour comes up over the tunnel, the tunnel goes down and with it the EIGRP neighbour – and rinse and repeat.  The problem of course is the NBMA (or wan interface) is now being advertised over the tunnel network using EIGRP.

Given the way the tunnel gets set up, which is to rely on OSPF (to find the actual NBMA tunnel endpoint), then this is simply not going to work

In short, the EIGRP neighbour comes up and you are saying the way to get to the real address (or tunnel endpoint) is over the tunnel, while simultaneously overriding the way the tunnel actually gets connectivity to that real address (tunnel endpoint) to set itself up as a tunnel (over OSPF).  The only way the EIGRP neighbour could come up in the first place is that OSPF had already provided the underlay routing to set up a tunnel.  All clear?  Yeah, I know, this can make you rub your forehead the first time you come across it.

The way to get round this usually is to be very careful with your subnets and routing to avoid the recursive.

But there is another way to avoid this – enter (or enter through) Front Door-VRF.

The principle here is that you have a separate routing table for the physical WAN interface (the front-door), and the tunnel or overlay network – so a VRF for each.  Or most simply, a separate VRF for the WAN interface and everything behind this is in the global routing table if you so wish.  As we are not learning the routes for the tunnel and NBMA through the same routing table, bingo you have solved your recursive routing problem.

There is still some magic needed, as there must be a way to tell the tunnel you are creating to use the Wan interface as a tunnel endpoint.  Create your WAN interfaces in their own VRF, then create your tunnel interfaces with these addresses as the source and destination tunnel endpoints, and finally just stitch these together with a VRF command under the tunnel interface (the stitching is the internal pixie dust).  Your network and routing over the tunnel are now separated from your transit network underneath.


Shut the front door – that is much simpler for DMVPN 

Part 2 Intelligent Path Control