A few basic dB wireless tips

If you have done a lot of wireless the below is bread and butter and appears in many text books in various guises, but for those that need a quick summary or someone looking for a quick “in” to further reading and practise the below might clear a few things up.

First Logarithms

Why? (trust me here, this is brief and worthwhile).  Well Logarithms are useful to represent the ratio between two values (i.e. values that can be measured), and we use ratios all the time in Wireless. This is because it is so so much easier than using real numbers with many decimal places when looking at signal strength, gain and power levels.  So what are they?

Logarithms are actually fairly easy at a basic level.

For example –   how many 2s do I need to multiply to get 8?

2 x 2 x 2 = 8

So I had to multiply three 2s to get 8, so the Log is 3 – and the way you write this is below:

log1

The base is the little 2, so log base 2 of 8 = 3.  We are essentially using three numbers here:  The number we are multiplying (2 in this case), how many times it is multiplied (3 in this case), and the number we want to end up with (8 in this example).

Ok, one more just so we are clear.

Work out the following:      log2

so 6 x 6 x 6 x 6 = 1296, so we need four of the 6s multiplied to get 1296 so the answer is:

log3

Incidentally this also tells you the exponent – so 6 to the power 4 = 1296.

All well and good and a refreshing trip down memory lane but what about wireless?

Well log base 10 is used a lot in wireless, particularly when it comes to dB.

What is dB (decibel)? Well it is a ratio, and what is good at representing ratios? Well logarithms of course.  The decibel is actually a unit of measure that came out of Bell Telephone Labs, and was very useful around the attenuation of audio frequency signal down a mile long telephone cable.

The decibel (dB) therefore is a good way to express the ratio of two power levels.  A couple of equations are coming up, nothing too difficult, but hold on for the result as that is where the trick comes in to impress your friends (most people have friends who are impressed by RF conversations right?).

When you express a power ratio in decibels then it is 10 times the 10-based logarithm of the ratio.  What on earth does that mean?

We are trying to find a ratio, so for 1 Bel (B)

RatioB = log10(P1 / P0)

To take this a step further 1 Bel is 10 decibels, so that is where you get your 10 x from. Therefore for dB:

RatiodB = 10 x log10(P1 / P0)

So this gives you a handy way to work out a ratio between two real power values, and in the wireless world you would typically use this when looking at Gain (how well an antenna converts input power into radio waves headed in a specified direction.)

Let’s have a look where this is useful.

GdB = 10 log10(P2 / P1)

P2 is the power level.

P1 is the referenced power level.

GdB is the power ratio or gain in dB.

So for the gain in dB for a system with input power of 10W and output power of 20W then

GdB = 10 log10(Poutput/Pinput) = 10 log10(20W/10W) = 3.01dB

Now remember that figure 3.01dB

The first of 2 decibel values an RF engineer has etched into their head is 3dB, because as you have just seen, this is a ratio of 2 (yes we know it is actually 3.01dB, but that is close enough for RF design) – 20/10 is 2 – voila.

So this is great to work out power ratios in your head.  If you know the power level has doubled then you have a 3dB gain, if the signal level is four times higher at the output than the input you have a 6dB gain.  Equally if you have -3db then the ratio is 1/2 or a half.

The second figure to have hardwired into your brain is 10dB.  Remember dB is a ratio, and 10dB is handily a ratio of 10 🙂  Equally -10dB is 1/10 or 1 tenth.

For example, if the signal level at the output is 10 times higher than the input then you have a 10 times ratio (i.e. 10W input and 100W at output, a 10x gain) which is 10dB.

Even more usefully you can now combine the two.  Say you have an amplifier with a gain ratio of 20 ( 20 times or 10 x 2), then the gain value is 10 + 3 which is 13dB.  (3dB is a 2 x ratio).

Got it?

So now say I want to calculate the power ratio of a given value….

P2 is equal to the reference power P1 times 10 raised by the gain in GdB divided by 10.

P2 = P1  10(GdB / 10)

P2 is the power level.

P1 is the referenced power level.

GdB is the power ratio or gain in dB.

log table

 

Basically a positive gain means there is more power at the output than the input and a negative gain means less power at the output than the input. Now consider a 40dB gain, well that is 10000 times more power at the output than the input, whereas a 20 dB negative gain is 100 times less power at the output than the input.  You can see now where these factors of 10 and logarithms can be useful for quick calculations.

 

 

 

 

dBm

So we have established that dB is a ratio, but what about dBm?  Well this is also a ratio but a ratio to a real value, i.e. the ratio of a power level relative to 1mW (or 1 thousandth of a Watt – 0.001W).  dBm therefore is a way to express absolute power.  10 dBm then is 10mW, or 10 x 1 mW.   20 dBm is 100 mW.  Remember the factor of 10 earlier?  So 20dBm references 1mW x 10 x 10 = 100mW.

Finally you need to be careful when expressing the difference between 2 power levels. 20 dBm – 10dBm = 3dB and not dBm because again, here we are expressing a ratio between 2 values as dictated by decibels.

So why use ratios and logarithms at all?  Well look at the table below of typical 802.11 power levels and signal strength.  At -90dBm you are talking 0.000000001 mw. Calculations based on a ratio or logarithm become much easier to compare than saying “the signal strength is 0.0000something as opposed to 0.00000something so we know the power has decreased by?  and the negative gain is?”.  In essence it provides meaning to the low power levels involved in wireless communication and makes working out real-world designs a whole lot easier.

dbm to mw

So there you have it a few handy basic wireless tips.

A couple more tips to keep up your sleeve is knowing that 5db is a ratio of 3 and -5dB is 1/3 or a third.  Not exactly of course, but close enough to work things out.

dBi

dBi is another measure around antennas that is handy to keep in mind, as this is the gain with reference to an antenna that transmits evenly around a 360 degree sphere. This is the perfect antenna which radiates power in all directions equally.  Of course real antennas like this do not exist, but it can be a useful reference point when looking at antenna gain in relation to the theoretical perfect antenna.

Isotropic
Isotropic radiation pattern in free space

One of the most common antennas that you will come across is the dipole antenna, or you may have heard it called a rubber duck antenna.  This has a doughnut shaped radiation pattern (toroidal).  It is handy to know that this has a gain of 2.15dB over an isotropic antenna, so if you had a different type of antenna with a gain of 5dB, you now know this is 2.85dB of gain higher than a common dipole – 2.15dB + 2.85dB (sounds a bit like we are bird-watching now).

Dipole
Dipole antenna radiation pattern

And what is 5dB again?  Well it is a ratio, and coincidentally ratio of around 3 times the power gain over an isotropic antenna. See, I told you knowing the 5dB ratio was handy.

So remember, 3dB, 10dB, (maybe 5dB) and 2.15dBi and you can work things out like a wireless design bod in no time at all:-)

Hopefully these very basic wireless tips will send you on your way into the magical world of RF with a little less confusion when dB is thrown around like confetti.

EBITDA, Cashflow and Service Providers

It might seem odd to see a piece about cash flow in a technical blog, but looking at EBITDA and cash flow got me thinking about whether the new wave of service provision (NFV, SDN, SDO, SD-WAN etc.) has any impact on the traditional ways of reporting for Service Providers currently based on high up-front costs for future service and revenue.

For many years EBITDA has been a preferred reporting indicator of Telecommunications Providers, and in turn, Service Providers.  Execs knew the score – grow revenue, sort out your EBITDA, and get good Enterprise Value Multiples.  Enterprise Value is how much it would cost to buy a company’s business free of its debts and liabilities.  Enterprise Value Multiple is your Enterprise Value (EV) divided by your EBITDA earnings number.

But with automatic rapid growth for telecoms providers on hold for now, and the fact that even though companies may be hitting market expectations, they are not always seeing the Enterprise Value they would like in the market, the EV multiples are simply not there.

Based on that, I thought it might be interesting to look at what EBITDA is?   Why the preference exists? And do changing cycles of investment with SDO/SDN/NFV/Agile change anything at all?

——————

First up – What is EBITDA?

ebitda-cartoon

EBITDAEarnings Before Interest, Taxes, Depreciation and Amortisation.

The way to calculate this is:

EBITDA = Income before taxes + Interest Expense + Depreciation + Amortisation

So basically you add the deductions (Interest Expense, Taxes, Depreciation and Amortisation) to your net income.

Let’s break this down a little.  Before the 1980s EBIT was a key indicator of a company’s ability to service its debt (Earnings before Interest and Tax).  With the advent of Leveraged Buyouts (LBO) EBITDA was now used as a tool to measure a company’s cash. LBO is essentially a way of acquiring another company with a large amount of borrowed money.  The assets from the acquiring company as well as the assets of the company being acquired are used as collateral for the loans.  This enables acquisitions without committing lots of capital..

Some conflate operating income (or operating cash flow) with EBIT but these are not the same, as EBIT makes adjustments for items not accounted for in operating income and thus are non-GAAP (Generally Accepted Accounting Principles).  Briefly, operating income is gross income minus operating expenses, depreciation and amortization but does not include taxes and interest expenses as in EBIT.

————–

Let’s build this up from Cash Flow and Net Income.

Net income and Operating Cash Flow may seem very similar but they are not really the same.  Operating cash flow, or raw cash flow in general might need converting through accrual accounting to get a more realistic measure of income, namely net income. Accrual accounting?  Very briefly this recognises when an expense is incurred not when that expense is paid (think of buying something one month but then having 60 days before you need to pay it off, well the expense was incurred the moment you bought it really).  As a result businesses can list credit and cash sales in the same reporting period that sales were made.

Cash flow or Operating Cash Flow (OCF) = Revenue minus Operating Expenses

Ideally you want your Operating Cash Flow to be higher than your operating expenses or you might not be in business for too long.  Certainly an important measure.

Net income = Gross Revenue minus Expense.

Net income reflects the profit that is left after accounting for ALL expenditures, every pound/dollar the company earns minus every pound/dollar the company spends.

As I mentioned Cash Flow and Net Income are not the same, with Cash Flow showing not only how much a company earned and how much it spent, but when the cash actually changed hands.  This difference is significant and an illustrative example is below:

Say you sign $20million worth of contracts in a year, complete work on $10million of them, and collect $8million of that work in cash in the year.  But say you have also paid out $6million dollars in equipment, your raw cash flow would be $2million.  Net income however might look significantly different.

So if you have provided economic value to your customers (revenue) of say $15m of the contracts (i.e. you have completed $10m of the contracts, are 50% of the way through the remaining $10m, so $5m of economic value to customers), and of the $6million of equipment purchased you consume only a third of this, and they are estimated to last 3 years, then $6million divided over 3 years is $2million, so your net income for the year would be $15million minus $2 million in expenses, so $13million.

$13 million is a very different number than the raw cash flow value of $2million, and perhaps a better indication of the operating income of the company.

Earnings and cash therefore are not the same thing but that is not to say that operating cash flow is not important.  Companies must still pay interest, and they must pay taxes, and that cash has to come from somewhere, so using this to look at Free Cash Flow (Cash Flow after all capital expenditure), and looking at working capital to see whether a company can service its short term debts is useful..  A negative working capital might suggest a company will struggle with its short term debts.

Working Capital = Current Assets – Current Liabilities

Current Assets  – assets that can be converted to cash in less than a year.

Current liabilities  – liabilities that must be paid this year.

——————-

Now we have had a quick look at why Net Income might be a preferable way to get a read of a company’s operating income over and above raw cash flow, is there a way to calculate this to give us a better standard view of Net Income without all the financing decisions and conditions that are peculiar to each company? i.e. provide a better method of comparison?

Well EBITDA adds back (or deducts from the calculation the expenses associated with) interest, taxes, depreciation and amortization.  This therefore removes the effects of financing and accounting decisions.

The idea being that by ignoring expenses like interest, taxes, depreciation and amortization you strip away the costs that aren’t directly related to the core operations of a company.  The proposition is that what you are left with (EBITDA), is a purer measure of a company’s ability to make money.

Let’s take a brief look at what these costs are to reach EBITDA:

Interest expense – is the interest payable on any borrowings  such as bonds, loans, convertible debt or lines of credit (interest rate X outstanding principle amount of debt). You hear this referred to as just the Principle sometimes.

Taxes – Generally refers to income i.e. by a state or country.  Business taxes (property, payroll taxes etc. are considered operating expenses and therefore no factored into EBITDA.

Depreciation – In this sense the reduction of the value of an asset over time.  Tangible assets  – both fixed assets (land, buildings, machinery) and current assets (Inventory). Basically things that can be physically harmed.  Depreciation is a way of giving a cost to a tangible asset over its useful life, or how much of an assets value has been used up over time.

You also have something called intangible assets, which are things you can’t really touch like copyrights, patents , brand recognition etc. (as opposed to equipment, machinery, stocks, bonds and cash for example).  Goodwill is also a part of this, so solid customer base, reputation, employee relations, patents and proprietary technology – an amount you are willing to pay over the book value of the company (If the company was liquidated, the value of the assets the shareholders would theoretically receive)

Amortisation – Deals with these intangible assets.  So the paying off of debts such as a loan or mortgage with a fixed regular repayment schedule over a time period.  Also the spreading out of capital expenses of intangible assets over the assets’ useful life, i.e. a fixed period.  Say you spend $10 million on a patent with a useful life of 10 years, then over these 10 years you can spread the cost as a $1million a year amortisation expense.

———-

So EBITDA is not really a good measure of cash flow, but can be a reasonable measure of a company’s profitability or net income.  However it does leave out the cash required to fund working capital and the replacement of old equipment, which can be significant.

This is particularly useful for new companies or those trying to attack a new large market where depreciation and amortisation spreads out the expenses of large capital investments, which can be considerable.

Conversely EBITDA has a bit of a bad rep as it has been used by a bunch of dangerously leveraged companies in boom times, but that is not to say it is all bad by any means.

There are many pros and cons to EBITDA but most are out of scope of what I am trying to convey, namely that telecoms providers tend to like it as a reporting method, have large initial infrastructure outlay to get return on investment over time through telecoms or service provision.  They are also not seeing the Enterprise Value multiples they would like despite a focus on market expectations around EBITDA (which doesn’t penalise a company in the market for having to invest in longer term infrastructure providing earnings are going in the right direction).

————-

As with all accounting it is useful to look under the covers when you get chance to get a better read.  In Service Providers or Telecommunications companies where there is significant expense in building out the network, buying frequency, and installing physical towers or radios, it makes a deal of sense to judge them more fairly over a longer term as repeated income will come some time after the initial investment.

In the boom years however, rapid growth was a sure fire way of getting great EV multiples, but with growth in the industry now at GDP levels and below, fixed lines in decline, mobile revenues tailing off and there being no business case in the transport of bits, EBITDA focus does not automatically equate to investor or customer value.

In essence it gets harder and harder to hide behind growth.  Knee jerk reactions from execs can zone in on cash and EBITDA while they under-invest in the very things that provide economic value to customers (your revenue as a provider) i.e. better products and services to facilitate ROI, and lead to better capital returns.  Essentially the focus needs to be on creating value and not just controlling costs.

—————————

So my question is, with SDN, NFV, Software Defined Orchestration, Agile product development etc. can you facilitate some of the above?  Are the providers who are simultaneously adopting flexible automation and orchestration to roll out new services faster (services chained to customers’ business logic), with greater visibility and with better quality, likely to be the ones who get the EV multiples they hope for?

dinosaur

“What is that huge fiery technology shift I see?  Should we prepare?” .   “Don’t worry it’s miles away….stop thinking, keep cutting costs and we’ll be fine!”

As customers start to play providers off against each other with shorter term contracts to drive down cost and improve service (as with SD-WAN), does flexibility to respond with new services and efficient operations become more important than high sunk costs, long contracts, eventual gain, and a relentless focus on internal cost cutting with traditional technology to justify business as usual?

Is this all doubly worrying with the reduction of some SPs to mere local cloud access providers (just give me a pipe to the nearest cloud provider thanks)?  Is it coincidence that the big cloud providers are the ones physically laying fat cables to connect their services nowadays?

Reducing operating costs and quick iterative roll-outs with new techniques and technology certainly seems to appeal to the largest and more innovative SPs.  Using the latest methods to hold vendors to account and change traditional network operating practices (e.g  vCPE), while simultaneously recognizing they can provide new services to customers with more flexibility, seems primed for technology shifts like 5G.  Will all of this lead to a renewed focus on ROI rather than EBITDA growth or Capex/Sales? At the moment EBITDA growth and Capex/Sales don’t appear to be moving the EV multiple needle.

Are those who are not adopting new ways destined for extinction?

Of course EBITDA isn’t going to be ditched by Telecoms providers anytime soon, but maybe a softening of the relentless internal focus on one reporting measure will only lead to better value for customers, and in return better results for those providers focusing on customer value.  You never know, internal business cases might actually start to incorporate vision.

Architecture Reflections

 

Arch

If you follow IT news and commentary around Architectures at the moment it might seem that Enterprise Architecture is in a bit of a quandary. On the one hand there seems broad agreement that to drive IT advantages in line with business needs, at speed, a sound architectural base is needed. What is also needed is efficient processes to drive this architecture.

On the other hand there are articles talking about Enterprise Architecture being broken, or of business irrelevance of architectural frameworks as we know them today; a view of self-referential documentation which only the architecture community ever really read, while others “haven’t got time for this, there is a business to run and improve”.

As reliance on IT increases (it certainly isn’t going away), and while new models of IT consumption and service delivery are morphed all the time, I believe architecture will become more important, not less. The problem, as with so many things, is one of communication. The challenge is to communicate relevance to business stakeholders and show how we are improving or introducing new and better business relevant services, and solving business problems through architecture. There is a need to create and communicate the base architecture to achieve this.

Although the aim of this post is not to compare and contrast the different frameworks and methodologies, a brief mention of the options does highlight some of the problems we face.

We have a LOT of choice around governance, framework, service and methodologies. (COBIT, Zachmann, Togaf, ITIL, Prince2, PMBOK, CMMI, MSP, FEA, DOGAF, ISO 20000, ISO 27000, Agile, Six Sigma/DMAIC, and Lean to name but a few). Of course you will notice that, while not all mentioned are directly comparable (they overlap, and also address slightly different concerns), it does illustrate that the landscape is far from simple.

As an aside, while it is not quite accurate and too simplistic to view COBIT as the “why”, ITIL as the “how”, or with other frameworks the “how” and “what”, it can serve as a useful starting point in questioning what you can get out of each, and mapping the overlapping functions in getting them to work together.

To further illustrate the complexity let’s briefly go back to 1987 and the Zachman framework with its 4 domains and numerous sub-domains as an initial point of reference.

1) Process Domain – Business Context Engines, Planning Engines, Visualisation Engine, Business tools

2) Information/Knowledge Domain – Business Data, Business Profiles, Business Models, Data Models

3) Infrastructure Domain – Computers, Operating Systems, Display Devices, Networks

4) Organisation Domain – People, Roles, Organisational Structures, Alliances

Domains have been added in other frameworks and, as you can see, this isn’t getting any simpler even if the constructs are useful. Even a single domain can have mind boggling degrees of complexity.

If I take a component of the Infrastructure domain with which I am familiar (Network) there is a vast array of technology to architect around, each with their functional and deep technical specialists. From Software Defined Networking, Control Plane emulation and Policy creation, to layer 3 identity separation (LISP), OTV for layer 2 over Layer 3 tunneling, FabricPath and TRILL for layer 2 routing (MAC in MAC, MAC in UDP – no more spanning tree), and VXLAN (MAC in UDP) for extension of layer 2 segments over shared physical infrastructure, to name just a few recent headlining technologies. And this is just one small part of one sub-domain

You will, of course, have spotted an error in the complexity I have just outlined. A good base architecture will not have to architect around each new technology, but identify solutions to fit seamlessly into the architecture as they solve a business problem, enable a service, or support business functions. This is why architecture is there in the first place.

So we ask ourselves what business problem are we solving, what service are we enabling, or function we are supporting? For example with SDN are you gaining greater flexibility? more open platform support? better visibility? better policy control? more customisation? lower costs? better security? reduced risk? and does this let you roll out new services more quickly and robustly to serve the business? Or with some of the other technologies, are you able to move workloads faster regardless of location with less operational overhead and cost, or spin up services more quickly, reliably, cheaper? How does it aid mobility? Public / private cloud? Security? Once you ask, and indeed have your own answers to such questions, the technology seems to slot naturally into a base architecture.

Given the complexities, how do we get everyone on the same page?

We could just throw around nebulous buzz phrases like “business outcomes” and hope everyone nods in agreement, but a more practical method might yield better results.

What I am not saying is that some of this is not covered in Architecture Vision or Business Architecture phases of TOGAF for example, but it is all too easy to slip into the technicalities of the process, or explain the entire process in order to try get everyone on board with this piece, let alone contribute meaningfully.  This can often be a challenge.

One practical suggestion is briefly outlined below.

As all of the above frameworks, methodologies, processes etc. were (to a greater or lesser degree) born out of the Deming cycle (Plan, Do, Check, Act), it does allow common ground to be established and serve as a foundation of getting all stakeholders on the same page. We can use this to simplify communication and create a common understanding.

The aim is the get business stakeholders involved as early in the process as possible, to understand, and to avoid the redundancy and time wasted from erroneous requirements.

If you allow the value stream to “pull” the process and architect with this in mind, it can really help in making architectures business relevant. By this I mean viewing the process from the perspective of customer value, business value, and demand, then working backwards to eliminate waste and improve service quality. As obvious as this sounds, it is rarely done effectively.

This brings us to the option of a process/tool that can literally get everyone on the same page: the Lean A3 process/tool.

With the A3 report everything is discussed and agreed upon in one A3 sized document, which is highly visible, has agreed reference data, and follows a simple common sense process. As this process revolves loosely around the initial Deming cycle it has instant familiarity with architects, developers, designers, manufacturers and business process professionals across the board. The idea here is to get everyone to agree on the problem being solved, the service offered, or the function supported. This in turn enables a more seamless flow into the base architecture.

Although the above might indeed sound like “common sense”, and increasingly there is a reliance on this quality in architects; by formalising, and standardising this common sense in one place, with common agreed data in a concise format, with stakeholder contribution and understanding, we can then provide a solid base for the detailed architecture to really achieve what it sets out to do. It also makes it easier for anyone new, to initially understand the architecture and contribute meaningfully without wading through reams of framework documentation for weeks on end. As they say ” put talented people into average processes and you often get poor results, but put average people into great processes and you often get excellent results”.

Like anything, the A3 process/tool takes practice, (it is not something you write individually and present), but the idea of having a one-page reference that everyone has contributed to and agreed upon, can be a very powerful way of getting different functions to work together and most importantly understand why things are happening the way they are. Does it have to be the A3 process/tool? Of course not, but it does seem to be a useful reference or starting point.

Another advantage is that the components of the A3 process can quite easily be mapped to individual architecture phases in other frameworks such as TOGAF.

IT organisations will be increasingly measured by their alignment with the business; by speed and flexibility, productivity and growth, with security and risk mitigation embedded at every level. Combined with this, the idea of service managers running an IT service as a function of the business and measured as such, will be a powerful one. For me, this only puts greater emphasis on making sure everyone is referring to the same thing to avoid costly misunderstanding.

Through process/tools such as A3, allowing architectures to be pulled from the value stream, making things as simple and visible as possible, and having stakeholders embedded in the process as early as possible, maybe we can cut through some of the communication issues commonly associated with architecture relevance of late.

 

Some examples of the A3 process/tools can be found below:

Explanation of an A3 example – pdf

A3 example templates can be found here

Think before you leap

What Do You Know About Why You Are Doing This A3?

Is the A3 a tool, process or both?

—–

There are several formal definitions of terminology within the various frameworks. I try, where possible, to ground myself in standard English definitions of the terminology firstly to remind myself of what I am trying to achieve, and secondly to gain common understanding. Some of these basic dictionary definitions are included below:

TOGAF defines architecture as

  1. 1. A formal description of a system, or a detailed plan of the system at a component level to guide its implementation
  2. The structure of components, their inter-relationships, and the principles and guidelines governing their design and evolution over time”

Dictionary definitions

Architecture – the complex or carefully designed structure of something

Architect – ” a person responsible for inventing or realizing a particular idea or project.” from the Latin or Greek arkhitekton meaning “chief – builder”

Framework – a skeletal structure designed to support or enclose something – a frame or structure composed of parts fitted together, or a set of assumptions, concepts, values, and practices that constitutes a way of viewing reality

 

Adventures in IWAN – Part 2 Intelligent Path Control

Intelligent Path Control

Broadly speaking Intelligent Path Control looks at monitoring traffic live across multiple transport links and making a next-hop decision on the fly to make sure the traffic you define in policy always chooses the best path automagically.  Application aware routing really.

If your expensive backup-links are underutilized, or you want to take advantage of multiple Wan transports and you want to do any kind of load-sharing then traditionally you would be using all your complex BGP tricks at the routing level, and if you wanted to do any kind of application monitoring to connect the two, then there would be some manual Netflow checking, SLA probes etc.  The entire  SD-WAN market was spawned, in part, to solve this problem.

Intelligent Path Control looks to remedy this by automatically routing traffic per class based on real-time performance of links to the most optimal transport on the fly.  This is useful, as routing protocols in the main are blissfully unaware of brownouts, soft errors or intermittent short lived flapping etc.

In a nutshell –  Intelligent Path Control is intelligent routing based on performance.

This “Performance Routing (PfR)” is what enables Intelligent Path Control, so here we can end the circuitous marketing and talk about PfR from this point in.

Under the covers PfR consists of Cisco’s SAF (Service Advertisement Framework), Monitoring Mechanisms, NBARv2 and Netflow v9.  It influences forwarding decisions in PfRv3 not by altering the routing table in the main, but through dynamic route maps or Policy Based Routing to change the next hop.  It can also enable some prefix control, injecting of BGP or static routes.

Before PfR, in a Cisco environment you would rely on manually using a bunch of scripts, static routes, PBR etc. to get anything like intelligent path control, and this would be a long way from dynamic or automatic. Then you would somehow be trying to stitch this together with some monitoring maybe using Netflow, or your IPSLA probes.  All very manual, all very labour intensive.

As with DMVPN, PfR is made up of a number of components, these are below and I will cover each one in turn to get an understanding of how this solution all fits together.

  • Performance Routing
  • PfR IWAN components – Controllers and Border Routers
  • Monitoring and Performance routing – NBAR2, Netflow
  • Service Advertisement Framework (SAF)
  • Paths
  • Prefixes
  • Smart Probes (optimistation, Zero SLA)
  • Thresholds
  • Steps to set up PfR – Traffic Classes and Channels
  • Routing
  • Transit site Preference
  • Backup Master controller
  • Prefixes
  • Routing
  • Controlled and uncontrolled traffic
  • Path of Last Resort.
  • VRFs

Performance Routing (PfR)

PFR (performance routing) initially came out of OER (Optimised Edge Routing) and version 1 appeared in the year  2000 with prefix route optimisation.

PFRv2 with application path optimisation then came in 2007.

PFRv3 is the latest version with new functionality, evolving a well-established and long standing Cisco IOS feature for over 10 years.

Essentially Pfr monitors network performance and makes a routing decision based on policies for application performance.  It load-balances traffic based on linked utilization to use all available wan links to the best performance per application.

PfR is really a passive engine for monitoring and gives you a superset of Netflow monitoring with around 40+ measurements.

  1. First you start with defining your polices – There are two ways here, either by DSCP or by Application.  If you use application you enable NBARv2.  There are some very handy defaults here.
  2. Then you learn your traffic.
  3. Once the traffic is learned the next step is to measure the traffic flow and network performance and report this to a Master Controller
  4. Finally you choose your path.  Performance Routing, via the Master controller, will change the next hop based on what you have learned about the traffic and how you have set-up your policies for each traffic class and link.

PfR can automatically discover the branch once connectivity is established.  The CPE/Branch (BR) is discovered and is automatically configured – auto starts.  The whole idea was to make the PfR configuration as simple and light-touch as possible.


PfR IWAN components

A device (Cisco ISR/ASR router, virtual or physical) can play one of 4 roles in PfRv3.  It is important to outline these, as the HQ Border Routers are used simultaneously as DMVPN hubs and PfR BRs.  By separating out the functionality you know which design you are talking about –  the transport overlay, or performance routing.

Let’s now look at the components involved in PfR.

Master Controller

The Master Controller(MC) is the decision maker and the Border Router(BR) is the forwarding path (roughly analogous to Controllers in other vendors’ SD-WAN architectures, this one just happens to be a router, physical or virtual).

You need a Master Controller at every site, so inevitably there is some confusion when it comes to the Hub Master Controller.  I look at the Hub Master Controller as the place where all the policy is configured and then distributed to the rest of the Master Controllers in your network as they join your IWAN domain.  As the Hub MC is looking after the IWAN domain this will normally be a separate device at the hub (physical or virtual) for scale.  An IWAN domain is basically all your routing devices that participate in the Path Control.

If you look at a typical IWAN design you see a Hub Master Controller at the central site (and a second maybe on a redundant central site or DC).  Cisco should probably have called this an IWAN domain controller or something.  Instead they call these Hub Master Controllers (HUB MCs),  central sites are IWAN POPs (Points of Presence), and each central site has a unique POP-id.

In Cisco IWAN domain you must have a Hub site, you may have a Transit Site and of course you have your Branch sites.

The Master Controller functionality doesn’t do any packet forwarding or inspection, but simply applies policy, verifies and reports.  The MC can also be standalone or combined with a Border Router (BR).  You have to have a Master controller per site (branch, hub etc.) so policy can be applied and verified.  If you only had one router at the branch with 2 transports then the border router would be a combined Master Controller and Border Router MC/BR.

Master Controller as a term comes up in several places then, but all you have to do is differentiate between the functionality of the one at the central site, and the ones everywhere else and it all becomes simpler to understand.

So a ultimately there is central place for configuration, policies, timers etc. (your Hub MC), but a completely distributed MC control plane.  It is important to know that even if you lose your HUB MC, you still have the local MCs optimising and controlling your traffic. A distributed control plane.

Let’s briefly go through each of the components involved in PfR in turn, and then look at the monitoring:

  • Hub Master Controller:  This is where all the polices are configured, and is the Master Controller at the hub site, Data Center or HQ.  Also for this central site it makes any optimisation decisions for traffic on the Border Routers (BRs).
  • Hub Border Router:  At each central site you need Border Routers to terminate WAN interfaces and PfR needs to be enabled on these.  The BR needs to know the address of its local Master Controller (the Hub Master Controller in this case), and you can have several hub BRs and indeed interfaces per BR.  As far as PfR is concerned you also need a path name for external interfaces – I will come onto Paths shortly.
  • Branch Master Controller:  I mentioned you need an MC on each site to make the optimisation decisions, but in this case there is no policy configured as with the MC at the Hub.  Instead it receives the policy from the Hub Master Controller. Obviously it therefore needs the IP address of the hub MC.  At a branch the MC can be on the same device as the border router – an MC/BR.
  • Branch Border Router:  The branch Border Router (BR) terminates the Wan interface and you need to enable PfR on this.  It also needs to know where its MC is (if it is on a separate box).  One enabled for PfR the Wan interface is detected automatically.  As noted earlier the branch Border Router can also house the Master Controller for he site.

All Master controllers  peer with the hub MC (or IWAN Domain Controller) for their configuration and policies.

Branches will always be DMVPN spokes.

Every site runs PfR to get path control configuration and policies from the IWAN domain controller through the IWAN Peering Service

A Cisco diagram probably does a better job than a scribbling of mine to get the components across visually.

IWAN domain

 

One other confusing term in the architecture is that of Transit (this is the problem when you use terms with multiple meanings, even if it accurately describes a functionality.)  I understand a transit site as a redundant HUB Data Centre with a Redundant HUB MC (IWAN domain controller).  So exactly the same as the HUB site, the only difference is you do not define the policies here, these are copied from the HUB MC.  The Transit site also gets a POP-ID

Remember each central site gets a POP-id.

Technically traffic can transit a branch site to get to another branch site if you get the routing and advertisements wrong, but this can get pretty messy and is best avoided.

As with most network architectures, a solid predictable routing design where you know your expected routed paths is the key to a stable and robust IWAN deployment.


Monitoring and Performance Routing

Unified Monitoring (Performance Monitor) and application visibility includes bandwidth, performance, correlation to QoS queues, etc. and is responsible for performance collection and traffic statistics.

As I mentioned at the start PfRv3 is an evolution of Optimised Edge Routing (OER) which was prefix route optimisation with traditional Netflow for passive monitoring and IPSLA probes for active monitoring.  This moved to PFRv2 to add application routing based on real-time performance metrics and then PFRv3 which adds a bunch of stuff including smart probes, NBARv2, one touch provisioning, Netflowv9, Service Advertisement Framwork, VRF awareness etc.

For application recognition (dynamic or manual) IWAN and PfR use NBAR2.

NBAR2 – Network Based Application Recognition, is way of inspecting streams of packets up to layer 7 to identify applications,  it provides stateful deep packet inspection on the network device to identify applications, attributes and groupings.

Cisco defines this as a Cisco cross platform protocol classification mechanism.  It can support and identify 1500+ applications and sub-classifications, and Cisco adds new signatures and updates through monthly released protocol packs.  It can identify Ipv4, Ipv6, ipv6 transition mechanisms, a bunch of application like TOR, Skype, MS-Lync, Facetime, Youtube, Office 365 etc.  then you can configure policy based on this.

(ios commands)

Ip NBAR protocol discovery

IP NBAR custom transport

You match the protocol from NBAR when setting up your QOS policy for IWAN and then your TCA levels (Threshold Crossing Alerts).  TCAs are pretty much how they sound, you cross predefined thresholds around jitter, loss, and delay then create an alert that the controller can then act upon for a path.

Netflow – (developed by Cisco) essentially collects IP traffic information and monitors network traffic.  There is Netflow V5 and V9.  Netflow v5 is a fixed packet format and has fields like src and dst Ip, number of packets in the flow, src and dst ports, number of L3 packets in the flow, protocol, Type of Service, TCP flags etc.   PfR also takes advantage of Netflow v9 which adds a cornucopia of extra information to customise and report on, and you can define what you want to report on as well by creating your own custom flexible flow record.

For more information see Netflow V5 export format

For more information on Netflow v9 see Netflow v9 Format

IWAN Monitoringall the PfR stats are exported using Netflow, so it is important to have a monitoring platform that supports Netflow V9 to get the most out of your monitoring for PfR for visibility.

 

Service Advertisement Framework (SAF)

PFRv3  has a concept of a peering domain or Enterprise domain for service coordination and exchange at the wan edge.  SAF is the underlying technology here.

SAF creates the framework for passing service messages between routers, with SAF forwarders and clients – basically a service advertisement protocol.  There is some considerable detail under the covers, but in IWAN it has been made very simple to configure.  (If you have history here you will remember SAF headers, database, client APIs, Client and Forwarding protocols, transitive forwarders, services etc.  Fortunately improvements mean the exact nature of the underlying mechanisms are improved and hidden now).

In essence the advertisement of the SAF service uses EIGRP as the transport layer for the advertisement, and completely separate to the IP routing protocol you are using to actually forward packets . It also uses link-local multicast for neighbour discovery.

Since you are using EIGRP as the engine for service advertisments then this also comes with split-horizon and DUAL (Diffusing Update Algorithm) to prevent service advertisement loops.

SAF relies on the underlying network and DMVPN in order to know about and communicate with its peers, as through the tunnel they are effectively one hop away. There can be confusion here again, as it easy to see EIGRP and assume this is related to underlying network connectivity, but for SAF, provided there is IP connectivity to the domain Border routers  (i.e. through the DMVPN tunnel) peers can communicate and pass service advertisements between each other as they have established SAF topology awareness and neighbours (peers) through the overlay EIGRP control engine.

Also SAF is efficient in that it only sends out incremental updates when advertisement changes occur and does not periodically broadcast/flood service advertisements.

In Iwan you have the concept of a Path

A Path name in IWAN identifies a transport

These Paths are identified with a Path-ID  You manually define the path name and path-id on the Hub and Transit BRs,  they then start sending discovery probes to the branches and these probes contain information about the path:  namely the Path Name, Path-ID and POP-ID.

The Path-ID is unique per site.

Paths in IWAN also have three labels.  Preferred path, Fallback Path and the next Fallback Path, and under each of these labels are three actual paths

For example you might say for this DSCP or this application (e.g EF) your preferred path is MPLS,  for something else the preferred path is Inet1 etc.

Each transport is a Path, and understanding the concept of a Path and Path-ID certainly makes troubleshooting easier when you are looking at traffic that has changed paths, when, and the reason why.

Path of Last Resort

One final bit of confusion in IWAN is that at the Hub you need a BR per transport.  At the branch/CPE you can have 2 transports per router, and if you want to do 3 transports, then you must have a second router for the 3rd transport (There are rumours this will be up to 5 transports per router in a future release, but we wait and see). There is also this thing called Path of Last resort, which some see as, “great, 3 transport are really supported per router. ” Turns out, No!

Basically if all other paths are unreachable, then we can fallback to the path of last resort.  This is not the same as the monitoring and control you get on the other paths.

PfR will not probe as usual (smartprobes – instead of sending 20pps, will be reduced to 1 packet every 10 seconds – so really just a keepalive).  Also path unreachable will be extended to 60 seconds.  So really this is to be used if you have a 4G/LTE connection as a last resort or backup path for your traffic if all else fails.  You add this config on the central site.

 

SummarySteps of setting policy for PfR

PfR – First you define a Class (a but like a class-map if you are familiar with QOS policy config in Cisco IOS), then this has a match on DSCP or Applicaton,  then you have your transport preference (Preferred, Fallback, Next Fallback), then your performance threshold based on loss, latency or jitter to decide which is the preferred path.

PfR actually work on the basis of a traffic class which is not an individual flow, but an aggregation of flows.

The traffic class is based on Destination Prefix, DSCP and Application name. (obviously not app name if just DSCP is used).  For each traffic class PfR will look at the individual next hop.

Performance metrics are collected per channel:

Per channel means:

  • per DSCP
  • per Source and Destination site
  • per Interface

A Channel is essentially a path between 2 sites per DSCP, or a path between 2 sites per next hop.

A Traffic Class will be mapped to a channel.

Channels are added containing the unique combination of DSCP value received, site id and exit.

 

PfR controlled and uncontrolled traffic

There is a concept of PfR controlled and uncontrolled traffic – and if some new traffic is seen for the spoke, then for 30 secs the normal routing table controls the traffic destination.  When this comes under the control of PfR then it abides by Threshold Control and is directed accordingly.

There is also an unreachable timer in PfR determined by PfR probes do detect the reachability of a channel.  This is seen as down once traffic is not seen either for 1 second, or if there is no traffic on the link and smart probes are not seen for 1 second.  These are the defaults I believe, but there is a recommendation to set the timer to 4 secs.  I assume this will become the new default at some point.

So for failover, blackout will be the 4 seconds above, for brownout then this is 30secs by default, but again can be reduced down to 4 or 6 seconds.

 

Performance Monitoring (PerfMon)

Unified Monitoring under PfR is enabled through Performance Monitor (PerfMon) which has been around a while and you might be familiar with it from Voice roll-outs.  It is responsible for performance collection and traffic statistics.

Application visibility includes bandwidth, performance,  and correlation to QoS queues

When it come to IWAN domain policies and domain monitoring there are 3 performance monitors to be aware of in PfR

  • Performance Monitor 1: to learn site prefixes (applied on external interfaces on egress)
  • Performance Monitor 2: to monitor bandwidth on egress (applied on external interfaces on egress)
  • Performance Monitor 3: to monitor performance on ingress (applied on external interfaces on ingress)

IWAN uses these performance monitors to get a view of traffic in flight (traffic flowing through the interfaces) to look for performance violations and to make path decisions based on this.  Border Routers (BRs) collect data from their Performance Monitor cache, along with smart probe results (described below), aggregate the info and then direct traffic down the optimal forwarding path as dictated by the Master Controller.

 

Monitoring and optimisation – Smart probes

When there is no user traffic, (e.g. a backup path) then probes are sent to get the monitoring. These are called Smart Probes

Smart Probes are used to help with the discovery but also for measurement when there is no user traffic. These probes are generated from the dataplane.  Smart Probe traffic is RTP and measured by Unified Monitoring just like other data traffic

Smart probes add a small overhead to bandwidth on a link, but this is not performance impacting in general and can be tuned.

The Probes (RTP packets) are sent over added Channels to the sites discovered via the prefix database. Without actual traffic, BR sends 10 probes spaced 20ms apart in the first 500ms and another similar 10 probes in the next 500ms, thus achieving 20pps for channels without traffic. With actual traffic, a much lower frequency is observed over the channel. Probes sent every 1/3 of [Monitor Interval], I.e. every 10 sec by default.

That is 20pps per channel or per DSCP.  

Zero SLA is another feature that is often missed and should be mentioned. If you are concerned about a very low bandwidth link and that you would be sending smart probes per channel or DSCP, then you can configure Zero SLA so only DSCP 0 uses smart probes on secondary paths. All the other channels  now do not get smart probes, only DSCP 0.  If you have a 4G or low bandwidth link and are worried about overhead this is definitely an option to have in the back pocket.

Smart probes are of three types:

  • Active Channel Probe: Active channel probe is sent out to measure network delay if no probe is sent out for past 10 seconds interval.
  • Unreachable Probe: Unreachable probe is used to detect channel reachability when there is no traffic send out.
  • Burst Probe: Burst probes are used to calculate delay, loss, jitter on a channel that is not carrying active user traffic.

For low-bandwidth links (e.g a DSL or 4G.LTE) it is possible to tune this further to have even less overhead – for example the below command:

smart-probes burst quick number-of-packets packets every seconds seconds

Thresholds

The whole point of defining thresholds is to look for a passing of a threshold or a performance violation – if we see this then an alert is sent to the source Master controller (a Threshold Crossing Alert or TCA) from the Border Router.  It is at this point that PfR controls the route and changes the next hop for an alternative path as per the policy configuration i.e. re-routed to a secondary path.  It is not PBR (policy based routing) as you might already be familiar with, but is similar in that the remote site device knows what to do with this traffic class and routes it accordingly based on policy.  The local Master Controller makes this decision.

All the paths are continuously monitored so we have awareness across the transports.

Routing and PfR

In part one we went through some of the choices around routing in DMVPN.  Well there are additional considerations with PFR.

One of the reasons EIGRP and BGP are preferred for IWAN is that alternate paths are available in the BGP and EIGRP topology table, and as PfR looks into these tables to make decisions and change next hops based on policy, they are well suited.

Scale.  The first thing pfr does is look at the next hop of all paths.  It looks in the BGP or EIGRP table.  If you show your routing table you have one next hop per path, but because PfR looks at both the routing table and the topology table, it see the next hops for both paths.

With EIGRP you can adjust the delay to prefer MPLS, so this combined with the EIGRP stub feature means you can control routing loops.

With BGP you would have the hubs configured as the route reflector for BGP, and to prefer MPLS you can simply set a high local pref for MPLS.  If you have say OSPF on the branch then you redistribute the BGP into OSPF, and set a route tag on the spokes to identify routes redistributed from BGP.

As ever there are many ways to configure BGP, but the validated designs guide you to one relatively simple way.

If you looked at using OSPF for example, well PfR does not look into the OSPF database, therefore relies on the RIB (Routing Information Base), so in order to support multiple paths for decision making you would need to run ECMP (Equal Cost Multi Path) – far from ideal.

 

PREFIXES

When a site or a branch is part of PfR it advertises its prefix to the HUB-MC, and then this forward this to all the MCs in the domain.  

This can be confusing because obviously BGP or EIGRP send prefixes, but PfR also sends prefixes.  One of the performance monitors will collect the source prefix and mask and advertise this to all Master Controllers.  It uses the domain peering with the HUB-MC and then this will reflect this prefix out to all the other MCs in the domain.

Ultimately you end up with a table with a mapping of site-id to site prefix and how this was learned i.e learned through IWAN peering (SAF service advertisement framework), configured, or shared.

It is important that attention is paid to your routing (of course, it is always important that you pay attention to the routing) because in advertising the prefixes, PfR looks in the topology table based on the IP address and mask to dig out the prefix.

There are two Prefix concepts to be aware of 1) Enterprise Prefix List and 2) Site-Prefix

Enterprise Prefix list  is a summary of all your subnets, or all your prefixes in your IWAN domain.  This is defined on the HUB-MC for the domain.

A prefix that is not in this prefix-list is seen as an Internet prefix and load-balanced over the DMVPN tunnels.  This is important, as If there is no Site-id for example, (the site is not enabled for PfR), then you don’t necessarily want traffic to be load-balanced, such as Voice for example.  So it is important to make sure you have a complete Enterprise Prefix List.  Once included in the Enterprise prefix list, PfR will know that traffic is heading to a site where PfR is not enabled and will subsequently know not to load balance it.

Site-Prefix – the site prefix is dynamic in PfR, so that on a site Perfmon will collect the traffic outbound, look at the IP address and mask, and then advertise that prefix up to the hub through PfR.  On the hub and transit site however you want to manually advertise the Site-Prefix that is advertised.

Prefixes specific to sites are advertised along with site-id. The site-prefix to site-id mapping is used in monitoring and optimization.

It is important that the right Site-Prefix is advertised by the right Site-id

 

Transit site preference

When you have multiple transit sites (or multiple DCs) with the same set of prefixes advertised from the central sites, you can prefer a specific transit site for a given prefix – the decision is made on routing metrics and advertised masks, and this preference takes priority over path preference.  The feature is called transit site affinity and is on by default (you can turn this off with no transit-site-affinity).

Traffic Class timers – if no traffic is detected for 5 minutes then the dynamic tunnel is torn down.

BACKUP Master Controller

BACKUP Master Controller – you can have a backup Master Controller but it should be noted that today there is no way to provide a stateful replication to the Backup Master Controller – the two are not synched.  The way to do this is to configure the same PfR config on both, and the same loopback address on the backup controller but instead use a /31 mask so that should the primary go away the BRs will detect the failure and reconnect to the Backup Master Controller – so stateless redundancy.

The backup MC will then re-learn the traffic.

In the meantime Branch MCs keep the same config and continue to optimise traffic as you would expect – Anycast Ip.   We follow the routing table and do not drop packets, which is why you set the MPLS prefer.

On the branch you need a direct connection between the BRs – on the HUB you just need IP connectivity.

Finally VRFs

VRF-Lite is used with all the same config ideas but per VRF.  Your overlay tunnel is per VRF (DMVPN per VRF) and your overlay routing is also per VRF  (VRFs are not multiplexed in one DMVPN tunnel!).   Under PfR, I mentioned that SAF (Service Advertisement Framework) was part of the magic behind PfR, well the SAF peering for advertisements is also per VRF, as is MC and BR config, and also policies are per VRF.

Monitoring – all the PfR stats are exported using netflow, so it is important to have a monitoring platform that supports Netflow V9 to get the most out of your monitoring for PfR.

Summary

Too much to learn

 

Ok, so that was a lot to take in I agree.  But hopefully by breaking down the component parts a little, next time you look at IWAN you will at least have a place to start ,and understand what is actually going on when you select the drop downs in an IWAN GUI.

When you first look at IWAN you have terminology flying at you at an alarming rate. Much of it sounds familiar(ish), and it is easy to leap to a feeling of general understanding, until you realise that you are talking at cross purposes when it comes to EIGRP, or you are not sure exactly what Transit is, or the meaning of a Path.  Hopefully the above provides some context for deployments.

What I would say is that once you understand the components, deployment is surprisingly light touch and easy through your choice of IWAN app and gui.  In fact it is not too bad without really understanding it all,  but it is always best to understand what you have just done.   If you look at other SD-WAN vendors (and I will cover some of the broader protocol choices in another post), the GUIs have abstracted much of the underlying workings.  This makes it all seem “oh so simple”, and to be honest it should be like this.  But as long as you understand that abstractions have been made and that there is no magic, you will quickly get a good feel for the various technologies.   You will understand the protocols and design choices, and be able to identify the innovations that have been sprinkled along the way.

Finally you have a number of options when it comes to monitoring and orchestration with IWAN.  All take the pain away from setup and all work towards enhanced visibility. The fact you have some products marketing IWAN deployments in 10 minutes shows how the mechanisms can be streamlined through abstraction and automation.  In brief, your main options are below.

  • Orchestration – Cisco Prime, Anuta networks, Glue networks, Cisco VMS, Cisco APIC-EM, Cisco Network Service Orchestrator
  • Visualisation/monitoring – Cisco Apic-EM, Living Objects, Liveaction, Cisco Prime/VMS.

 

Hopefully by now you have enough of a feel for the technology to jump into the validated designs for IWAN productively, and deploy a whizzy tool with growing confidence. You never know, IWAN might be less painful than you might have feared, despite first impressions.

Adventures in IWAN – Part 1 – Transport Independence

For a number of reasons, and part of wider interactions with SD-WAN, I have been having a few adventures with Cisco’s current SD-WAN offering  – Intelligent WAN or IWAN (2.1).  Whether IWAN is an SD-WAN as you understand it from other vendors is a topic for another day, but I thought it might be useful to cover a few things I have come across.

This is intended to be a multi-part post with the first 2 parts covering the first 2 pillars (hopefully throw in a few config examples after part 2), and the third eventually covering pillars 3 and 4.

As at least 80% of getting IWAN up and running is in the first 2 pillars, I  am going to focus on these primarily.

One thing to note from dealing with Cisco IWAN so far, is a lot of the underlying mechanisms are exposed.  This has helped me, personally, to improve my understanding of all SD-WAN vendors and how their solutions fit together (e.g.  what is the overlay and how does it actually work? What affects the control plane?  What is being provided in the Data Plane?  Overlay routing for dynamic point-to-multipoint with encryption?  How exactly are you doing encryption?  What protocols are you using?  How are you managing key distribution and re-keying?  How is traffic diverted to the device or inline? How is performance monitoring working?  App optimisation?  Is it flow based?  How are you looking into the flow? Application identification – what method? Controller for traffic control?  Real time? Orchestration?  etc.)  Basically, what is the magic?

Are we ready?  OK. Strap in, here we go.

The Building blocks of IWAN

Have a quick look at the Cisco picture below and you can see the 4 pillars of IWAN

 

iwan-technical-overview-with-management

Each pillar of IWAN has underlying technology building blocks and those technologies also have foundational components.  Hopefully I will provide some clarity on the the building blocks to layer on top of each other to help produce a shiny polished IWAN solution.

The first 2 pillars of IWAN are – 1) Transport Independence and 2) Intelligent Path Control.

Part 1

 Transport Independence

The fundamental technology underpinning transport independence in IWAN is DMVPN (Dynamic Multipoint VPN) as the transport overlay technology, and this also has component parts.

So what is DMVPN fundamentally?

It is a combination of 4 things:

  1. Multipoint GRE tunnels
  2. NHRP (Next Hop Routing Protocol) – basically creates a mapping database of the spoke’s GRE tunnel interfaces to real (or public) addresses.  Think of this like tunnel overlay IPs ARPing for the “real” underlay IP addresses.
  3. IPSEC tunnel protection – creates and applies encryption policies dynamically
  4. Routing – Essentially the dynamic advertisement of branch networks via routing protocols, e.g BGP, EIGRP, OSPF, RIP, ODR.

Let’s cover each one in turn, then you will have your tunnel overlay or secure transport independence sorted.

DMVPN – your overlay transport technology

Multipoint GRE tunnels

If you are familiar with GRE you will be familiar that you create a tunnel with an extra GRE header  between two endpoints.  You create a tunnel interface (virtual interface) with an address, and tie this to a real source and destination address on actual interfaces that terminate the tunnel.

A couple of pre-canned Cisco diagrams do the trick here for the sake of illustration:

GRE tunnel

tunnel

Multipoint GRE  broadens this idea by allowing a tunnel to have “multiple” destinations and you can terminate the tunnels on a single interface.  Handy for Hub-and-Spoke, and Spoke-to-Spoke I think you will agree.

So Multipoint GRE is your tunnel overlay SD-WAN transport in the Cisco world.  Well that was simple, so onward to the less straightforward.

Next Hop Resolution Protocol – NHRP

The next building block of DMVPN is NHRP, and this provides a way of dynamically mapping all those multi-point GRE tunnel interfaces you just created with their associated real addresses or underlay transport network.

NHRP has actually been around a while in different forms and originates from an extension of the ATM ARP routing mechanism which dates back to 1998/1999 as a technology.

Think of NHRP (Next Hop Resolution Protocol) as like ARP but for the underlying real IP addresses.  So you have a physical interface on your wan router with an address, and you have a GRE tunnel address on that same router.  One is your IP underlay and one your IP tunnel overlay.  You now need a way to map your IP underlay network to your IP tunnel overlay network, and NHRP does this job.

By way of visualization, I particularly like the below diagram from Cisco which shows very clearly which are your overlay addresses, which are your tunnel addresses, and which are your real addresses or NBMA addresses.  As a distinction it might help to think of GRE as your transport overlay technology (each multipoint GRE tunnel maps to a WAN transport), and your overlay network as the network addresses you wish to send over this tunnel, so a network overlay.

dmvpn

A spoke router will register with a Next Hop Server (NHS) as it comes up,  (you will give the spoke a NHS address to register with, and incidentally a multicast address for broadcast over the tunnel if the underlying network does not support IP multicast – useful for routing protocols).  Once registered, the NHRP database will maintain a mapping of Real addresses to Tunnel Addresses.  Once registered, if a spoke needs to dynamically discover the logical tunnel IP to physical NBMA IP mapping for another Next-Hop-Client (spoke) within the same NBMA network, then it will do an NHRP resolution request to find this.  This discovery means you do not have to go via the Hub every time for Spoke to Spoke communication – so the Dynamic part of DMVPN really. You can create dynamic GRE tunnels from a spoke (and ultimately encrypted tunnels) on the fly by querying NHRP, find the real NBMA address of another spoke and, voila, you have the peer information to set up your tunnels direct.

Nb. There are some interesting CEF details with NHRP between DMVPN Phase 1, 2 and 3 but that is follow on reading I would say.  Allowing a layer 2 resolution protocol to ultimately control your layer 3 direction and interactions is maybe controversial for the purist, and I will doubtless attempt to cover this when looking at some other SD-WAN techniques in other posts.

In short all spokes register their NBMA addresses with a Next hop Server (hub typically), and when a spoke needs to send a packet via a next hop (spoke) on the mGRE cloud or transport overlay, it asks the NHS (via a resolution request) “can I please have real/NBMA address of this next hop?”, the NHS replies with the NBMA address of the other spoke, and from this point the spokes can speak directly.

Encryption 

IPSEC tunnel protection 

IPSEC is the suite of protocols that enable the end to end encryption over the network in IWAN.  We are using IKEv2 and IPSEC.  Remember you can get DMVPN working as on overlay transport without encryption; this is optional (but good practice for security). Technically you just need your routing, multipoint GRE tunnel overlay network, and NHRP,  then you can add encryption once network connectivity is sorted.  I have found this is a good way to build the solution in blocks to make troubleshooting easier.

It is a little involved to go into here, but essentially IPSec Phase 1 identifies who you want to form an encrypted tunnel with and securely authenticates the peer (and sets some parameters for Phase 2), and then Phase 2 agrees on what to use to actually encrypt the traffic.  The fundamental problem is that when you have to create a lot of point to point IPSEC tunnels, you need some way to tell the devices what the address of the peer is so it can create an encrypted tunnel.  Each would then be an individual configuration for every peer to peer connection, managing keepalives (Dead Peer Decpection), and failover etc.   If you want on-demand dynamic spoke-to-spoke encryption, then IPSEC needs some work.  There are a number of ways to solve this, but DMVPN  phase 3 (Multipoint GRE and NHRP)  has been used for some time and is the method of choice today in IWAN.

With DMVPN it is always worth covering headers and how they are used in the real world should you choose to use IPSEC.  This way you can visualise the overlay network.

Typically you use transport mode with DMVPN so what does this mean and why use this with DMVPN?

Header confusion

There are Encryption Headers and GRE headers, do not confuse or conflate the two.

IPSEC uses 2 distinct protocols to either encrypt or authenticate your Layer 3 payload. These are ESP header (Encapsulating Security Payload) and AH   (Authentication Header) and both add headers to your packet.  They both also run in one of two modes, tunnel or transport.  These modes either use the original IP header (transport), or add a new IP header (tunnel) in order to traverse the network.  This is outlined clearly in the diagram below.

Headers

The next level of header confusion comes with GRE – which also adds an IP header.

Your original packet might look something like:

IP hdr 1   |   TCP hdr  |    Data

GRE Encapsulation:

IP hdr 2   |    GRE hdr  |   IP hdr 1   |    TCP hdr  |   Data

GRE over IPsec Transport Mode (with ESP):

IP hdr 2   |   ESP hdr |    GRE hdr  |    IP hdr 1   |   TCP hdr   |   Data

GRE over IPsec Tunnel Mode (with ESP):

IP hdr 3   |   ESP hdr   |   IP hdr 2   |   GRE hdr   |   IP hdr 1 |   TCP hdr   |   Data

Transport mode only encrypts the data payload and uses the original IP header – whereas tunnel mode will encrypt the whole IP packet (header + payload) and use a new IP header.

In DMVPN both the GRE peer and IPsec peer addresses are the same, so typically transport mode saves on header addition which is essentially repeating identical information (20 bytes saved right there).

Typically you use ESP with Transport mode for DMVPN

Now you should have a reasonable view of the Encryption overlay and the GRE overlay and the headers that are added end to end.

 

Routing

Routing comes up in two areas of IWAN, one in the transport independence piece, and again in the best path selection with PfR, but it is important not to confuse the two.  For example, PFR uses EIGRP for the Service Advertisement Framework (SAF), but for the transport piece you could use the same or a different routing protocol e.g. BGP, EIGRP or OSPF.  When EIGRP is used for your underlay and overlay routing as well (which is highly likely) conversations can get confusing.

You have a router at the customer edge, trying to get to another router at another edge. In between you have a Service Provider network.  Typically in order to get traffic to where you want to go you need to interact with the Service Provider’s BGP network, whether that is BGP advertised default routes, statics, redistribution, whatever is most suitable for you and your SP.

Now with IWAN you are adding a tunnel overlay, and this overlay network needs to be advertised into your current Enterprise network so that traffic that needs to get to another one of your sites knows which next-hop to use.  That the next-hop will now be a tunnel , i.e. you need to use a tunnel to get there.  Remember NHRP is used to do the mappings here to actually get the tunnel traffic across to the real address of the remote site to terminate the tunnel.  So where previously you may have used dynamic or static routing or default route in BGP to say,  “if you want to get to an address that lives across the WAN use the following next hop (WAN interface)”,  well with an overlay you are telling traffic to use your tunnel interface as your next hop.  To advertise these tunnel overlay routes into your network you can either use statics or a routing protocol of choice like BGP or EIGRP.  Of course if your routing protocols are covering both the real WAN interface and your tunnel interface networks, you need to take care that the correct route gets installed into the forwarding table, and that you are learning the information from a consistent place so your routing protocols don’t get confused and bounce the tunnel up and down (the recursive routing problem described a little further down).

As mentioned, the other use of a routing control plane in IWAN is for PfR (Performance Routing) , where the EIGRP engine is used for the Service Advertisement Framework and creates its own neighbours and domains accordingly.

Of course this is logically separate from the underlay and actual traffic forwarding and relies on the overlay network to get connectivity across the WAN between members of the SAF domain for sending SAF information to each other.  That is, the tunnels provide connectivity for SAF peers.

So what does all this mean?  Well it means you can very easily have 3 routing protocol names flying around in conversation confusing everyone on a whiteboard – BGP for underlay,  EIGRP for overlay,  EIGRP for PfR (or any mixture e.g. OSPF, BGP, EIGRP for routing and EIGRP for PfR).  The one constant here is the EIGRP engine is always the mechanism for PfR SAF peering.  However if you separate the PfR / SAF process in your mind as a monitoring technology that just happens to use an EIGRP process to set up its domains (nothing to do with network connectivity) – then the rest is really just routing as normal with care taken over your DMVPN.

 

DMPVN which routing protocol?

If you have ever configured DMVPN you will know that there are limitations or caveats with each routing protocol.  Let’s have a brief look at these at a basic level, and for simplicity, only with DMVPN phase 3 .

OSPF – You can use OSPF of course, but you need to be a little careful with network types. Point to Point? won’t work because you are using a multipoint GRE tunnel interface. Broadcast?  Well this will work but you need to make sure the spokes will never be elected as the DR or BDR  (IP ospf priority 0 on the tunnel interfaces of the spokes should do the trick here).  Non-Broadcast? – yes this will work as with broadcast but you need to statically configure your neighbours.  Point to Multipoint? – works well with phase 3, and you don’t have to worry about DR/BDR election.  With DMVPN phase 2 it is important to note that Point to Multipoint does not work so well, as this changes the next hop so all traffic goes through the hub router, so not ideal for dynamic spoke to spoke. In phase 2 you have the same issue with OSPF point to multipoint non-broadcast with the addition of having to statically define your neighbours.

What are the issues with OSPF? – well a couple that spring to mind are that in DMVPN you use the same subnet, and therefore all OSPF routers would be in the same area. Summarisation is only available on Area Border Routers (ABRs) and Autonomous System Border Routers (ASBRs), therefore the hub router would need to be an ABR for summarisation.  Also as OSPF is link-state, any change in the area will result in an SPF calculation across the area i.e. all the routers will run an SPF calculation on link change. Misconfiguration of the DR/BDR will break connectivity and traffic engineering has its issues with a link-state protocol.

So OSPF is doable, using NSSA (Not So Stubby Areas) on the spoke and careful config, but for larger scale DMVPN people drift towards BGP/EIGRP.

EIGRP – Is not link state, does not have an area concept and you don’t have to think of the topology tweaks you need to do with OSPF above.  One thing to note in DMVPN phase 2 is that you don’t wan’t the hub setting itself as the next hop for routes, but you can configure around this with EIGRP.  Of course you need to disable split-horizon so routing advertisements are allowed out of the same interface (mGRE tunnel int).  Good advice for scale is to turn the spokes into EIGRP stubs and also to watch for the number of adjacencies the hub has, as hellos can become an issue (you can play with timers here too).  Also EIGRP can summarise and manipulate metrics at any point.

EIGRP is well-suited to DMVPN at scale.

BGP

BGP also works for DMVPN – we know it scales (the Internet), and the default timers are less onerous that other protocols.   The choice, as ever, is IBGP vs EBGP.  Whereas with IBGP you might require route reflectors at scale and an IGP to carry next hops, EBGP might need several AS numbers, or you could disable loop prevention.

With DMVPN eBGP, the next-hop is changed on updates outbound, so all good there.  Next question is whether to use the same AS for every site, or unique ASs.  This can limit you to 1024 spokes as the 16-bit AS number allows only for 1024 spokes, but good to prevent loops. With a 32 bit number the private AS number is solved, but there is a deal of configuration at the hub with unique AS numbers.

Say you run the same AS at all sites, well in this case the receiving router sees its own AS number in the AS path of a received BGP packet and it assumes the packet came from its own AS, has reached the same place it came from, so drops the packet.  To get round this you can use as-override, but this can produce loops in the control plane.

iBGP then, back to the next hop modification issue – so with phase 1 and 3 you can use “neighbor next-hop-self-all” for reflected routes on a route reflector.  iBGP with this becomes probably the preferred option when it comes to BGP with this.

iBGP is well-suited to DMVPN at scale.

From the above EIGRP or BGP tend to be the preferred choices for DMVPN.

Now the assumption often with IWAN, is that BGP and EIGRP are chosen entirely because of the above typical reasons.

However in addition to the good reasons above, remember with IWAN you want some method of quick failover to an alternate or best path based on monitoring.  With BGP and EIGRP you have topology tables and feasible successors with alternate routes ready and waiting to go on failure to populate the Routing and Forwarding table and facilitate quick change of preferred paths.

Another very good reason for the use of EIGRP and BGP with IWAN.

So there you have it, a brief tour of the 4 building blocks of DMVPN

Finally, of course, no current discussion of DMVPN would be complete without a brief excursion into Front-Door VRFs and recursive routing.

Front Door VRFs.

These are a very useful technique in IWAN as they simplify paths and configuration a good deal.  What is VRF (Virtual Routing and Forwarding)?  Basically it allows multiple instances of a routing table to exist on a router and work simultaneously.  This is useful as it allows network paths to be segmented without using multiple devices.  Effectively in an IWAN design you put your wan interfaces into a separate VRF (front-door you see) and this avoids some recursive routing problems you may be familiar with using GRE (more on that later).

Recursive routing with GRE

If you are familiar with configuring DMVPN you may be aware that you can get yourself into a pickle when it comes to routing, and in particular, recursive routing.  So if you are using a routing protocol for your overlay and another for your underlay, there could be a conflict here.  For example, if you learn your route both inside and outside of the tunnel for the same prefix, well the router gets a little confused.

If you have ever seen “Tunnel temporarily disabled due to recursive routing” then you know what I am talking about.  The first time you bump into this it can lead to furrowed brows and prolonged head scratching until the light-bulb fires.

So here is the crux of this issue:

If, for example, you have two routers with NBMA (Wan) interfaces addressed at one end with 10.10.1.10/16  and 10.12.1.12/16 at the other, well these are on different networks so you use a routing protocol to get across any intermediate hops to the other end.  Say we use OSPF for this.  E.g Router_A (10.10.1.10/16) – Router_B(ospf) – Router_C(ospf) – Router_B(10.12.1.12/16).  These are also your tunnel end-points remember.

Now say you want to use EIGRP to advertise your tunnel network, and you make the easy mistake of having an overlapping network i.e. your GRE tunnel interface addresses are 10.2.1.10 and 10.2.1.12 at each end.  So you may set EIGRP for network 10.0.0.0. (which also happens to cover the NBMA or real addresses).

Ok, so the problem here is that you now have the 10 network being advertised for the NMBA addresses in OSPF and then, when the tunnel comes up, you also have the 10 network being advertised through EIGRP over the tunnel.  So as soon as the EIGRP neighbour comes up over the tunnel, the tunnel goes down and with it the EIGRP neighbour – and rinse and repeat.  The problem of course is the NBMA (or wan interface) is now being advertised over the tunnel network using EIGRP.

Given the way the tunnel gets set up, which is to rely on OSPF (to find the actual NBMA tunnel endpoint), then this is simply not going to work

In short, the EIGRP neighbour comes up and you are saying the way to get to the real address (or tunnel endpoint) is over the tunnel, while simultaneously overriding the way the tunnel actually gets connectivity to that real address (tunnel endpoint) to set itself up as a tunnel (over OSPF).  The only way the EIGRP neighbour could come up in the first place is that OSPF had already provided the underlay routing to set up a tunnel.  All clear?  Yeah, I know, this can make you rub your forehead the first time you come across it.

The way to get round this usually is to be very careful with your subnets and routing to avoid the recursive.

But there is another way to avoid this – enter (or enter through) Front Door-VRF.

The principle here is that you have a separate routing table for the physical WAN interface (the front-door), and the tunnel or overlay network – so a VRF for each.  Or most simply, a separate VRF for the WAN interface and everything behind this is in the global routing table if you so wish.  As we are not learning the routes for the tunnel and NBMA through the same routing table, bingo you have solved your recursive routing problem.

There is still some magic needed, as there must be a way to tell the tunnel you are creating to use the Wan interface as a tunnel endpoint.  Create your WAN interfaces in their own VRF, then create your tunnel interfaces with these addresses as the source and destination tunnel endpoints, and finally just stitch these together with a VRF command under the tunnel interface (the stitching is the internal pixie dust).  Your network and routing over the tunnel are now separated from your transit network underneath.

F-VRF

Shut the front door – that is much simpler for DMVPN 

Part 2 Intelligent Path Control

 

 

It’s all about the Bayes.

bayestheoem

The defensive side of Security technology is an interesting place to be at the moment, with a vast number of products and techniques trying to defend against an ever-changing attack landscape.

Where there is uncertainty, people want to be assured, and reduce the likelihood that they will get breached.  Is the information gathered real and actionable?  Have we been breached or not?  What is the probability?

Bayes Theorem is fashionable across a number of fields today, and the idea of ‘machine learning’ to solve a security problem seems compelling.

Bayes was an “amateur” mathematician and Church Minister in the 18th Century, so no knowledge of computers, but he set out to solve a fundamental problem and this is where lasting ideas come from.

So why Bayes?

If you have read Daniel Kahneman’s book “Thinking Fast and Slow” (highly entertaining read), you will be aware that humans are not always great at instinctive decisions based on statistics.  Or if you add context it sometimes overrides the facts, when it really shouldn’t.

Consider a drug being brought to market that definitely cures a disease 99.9% of the time. (I like it, where can I get it?)  I know what 99.9% means, that means pretty much a sure thing?  Hold on, how often does it fail and what are the consequences when it does?  Well in a 40,000 seater stadium this fails for 40 people.  What if then, when it fails, it kills the person 50% of the time?  So 20 people in that stadium would die.  Clearly not acceptable, the drug is shelved.

Extending the base-rate fallacy.  If I have a test for a disease that is 99% accurate, what are the odds I have the disease if  am tested positive? That is 99 out of 100 people who have the disease will test positive, and 99 out of 100 who don’t will test negative.  Turns out the answer is around 50%.  Take the test again, the odds go back up to 99%.  Take a 90% accurate test, and even after the 2nd test the odds are still not at 50%.  As an exercise, once you have been through the below examples, and fully understand, I encourage you to take these numbers and plug them into the Bayes equation.

Another example of this flaky judgement is a conjunction fallacy.

The classic example is below:

“Linda is 31 years old, single, outspoken, and very bright. She majored in philosophy. As a student, she was deeply concerned with issues of discrimination and social justice, and also participated in anti-nuclear demonstrations.

Which is more probable?

  1. Linda is a bank teller.
  2. Linda is a bank teller and is active in the feminist movement.”

The idea is that some may think the second is more likely, when the “and” makes it less likely given the base rates.  How many 31 year old females are Bank Tellers?  How many 31 year old females are active in a feminist movement?  Does the “AND” make sense?

The additional information often confuses our instinctive noodle.

The whole idea with Bayes is we can add numbers to things that seem subjective or confusing, use any additional info, and get a more accurate read.

 

Bayes states that…

The prior odds times the likelihood ratio equals the posterior odds.

bayes-rule

The formula for Bayes is not difficult, the hard part is what to plug into the formula and how.  You need to decide on your tests and events.  There is a test for a condition, and there is the event that someone actually has that condition.

You are saying “what is the probability of the event given the following test was positive?”

Also what is the probability of the  positive test being accurate?  (False positives – and you can use false negatives depending how you frame the test and the event.)

—-

You can see how this can get a little confusing but let’s have a go anyway  with Bayes below.

The point about Bayes is that you have some Data and you make a claim about the data, or a hypothesis.

So you have a Hypothesis and you want to know what the probability of that hypothesis is given the Data.

The notation P(A|B) can be summarised as the Probability of A assuming B to be true

So P(A|B) where A is the hypothesis given B (which is the Data).  You can see the “|” sign as the word “given” if you like.  Altogether this is called the Posterior

So this is equal(=) to

P(A) the Probability of the hypothesis  – we call this the Prior

Multiplied by

P(B|A) –  the probability of the Data given a particular hypothesis, call this the Likelihood

Take all this and divide by

P(B)  the probability of the data itself.

Got that?  Good.  Let’s plug in some numbers with an example

We will do this in steps and then the equation.

 

We have a condition, let’s call it “Geekiness”.  What if we had a test to try to identity Geekiness? (aside from writing blog posts about Bayes)

We are testing 100 students for this condition.

We know Geekiness affects 20% of the students tested

The test for Geekiness involves watching a Star Trek film trailer and seeing if the pupils dilate excessively.

Among the students with Geekiness, 90% of the pupils dilate when tested.

But among those without Geekiness, 30% also dilate when seeing the trailer.

hmmm..

So what is the probability that the test shows a student actually has “Geekiness” from 100 students?

Or the hypothesis – what is the probability that students who get excited at a Star-Trek trailer have “Geekiness” given a positive test result.

Step 1: Find the probability of a true positive on the test. That is people who actually have Geekiness (20%)  multiplied by true positive results (90%) = 0.18 (or 18 out of 20 from the 100 students)

Step 2: Find the probability of a false positive on the test. That equals people who don’t have the Geekiness (80%) multiplied by false positive results (30%) = 0.24  (or 24 people out of the 100)

Step 3: Figure out the probability of getting a positive result on the test. That equals the chance of a true positive (Step 1) plus a false positive (Step 2) = .0.18 + 0.24 = 0.42

Step 4: Finally find the probability of actually having Geekiness given a positive result. Divide the chance of having a real, positive result (Step 1) by the chance of getting any kind of positive result (Step 3) = .0.18/0.42 = 0.43, or 43%.  So considerably less than the 90% that we started with.  With that additional info, a test that starts as 90% accurate for those with the condition, is less that 50% accurate when you take into account everyone (the base rate).

Surprising for some but we get a real figure, and this is the power.

Let’s now plug the same info as above into the Bayes equation.

—————-

A Posterior, a Prior and a Likelihood walk into a bar….

P(A|B) is the probability the student has “Geekiness” given a positive test result.  (Posterior)

P(A) = Probability of having  Geekiness = 20%  (Prior)

P(B|A) = Chance of a positive test result given student actually has Geekiness = 90%. (Likelihood)

P(B) = Chance of a positive test in the overall student population of 100, which is  42%

Now we have all of the information we need to put into the equation:

P(A|B) = P(B|A) * P(A) / P(B)

so

P(A|B) = P(0.9) * P(0.2) / P(0.42) = 0.43 (43%)

or

P(A|B) = (90% * 20%) / 42% = 43%

Another way to express this:

Prior odds * Relative likelihood = Posterior Odds

bayesproof

So there you have it.  Try some examples yourself, and be patient, don’t expect to be a whizz in 5 minutes.

There is more to Bayes than I have covered, but hopefully you should get a feel for how taking into account data, sample size, and accuracy can affect your probability.  We need to be rigorous in questioning the data.   A number of Security start-ups are using these techniques to better predict and detect anomalies or breaches, and although it doesn’t promise to be a panacea, I am excited to see where all this leads over the next few years.

and remember..

BGP? In the Enterprise? Part 1

BGP in the Enterprise via some overlay, underlay, VXLAN and leaf-spine discussion

You have a routing protocol.   It works for large scale inter-domain routing (Autonomous Systems) and it is a flooding protocol for an n-squared mesh.

It doesn’t have a link metric, it has no real policy metrics, and slow convergence (the Internet never really converges anymore), but you want to use it in the Enterprise?

Bad idea?  Well maybe in a traditional Enterprise network, but with leaf-spine architecture for Data Centres could this be a goer?  A good few people think so and are implementing today.  Let’s have a look at some of this.

BGP is effectively built from two pieces, the protocol itself and the TCP transport which is used to carry protocol messages between peers.

Some acronyms follow, but hang on, hopefully this will become clear as you continue to read down.

If you are familiar with leaf-spine architectures there is a lot of noise around using BGP within the fabric, either eBGP or iBGP, (AS per leaf-spine, or everything in the same AS) as a DC routing protocol of choice.

There is also a lot of discussion around Multi-Protocol-BGP (MP-BGP) being used with VXLAN for DC interconnect (Ethernet VPN or EVPN) which effectively loads the NLRI portion of BGP with MAC addresses with the RD  (Route Distinguisher) to ensure uniqueness, ultimately to extend Layer 2 segments across Data Centres over Layer 3.

https://tools.ietf.org/html/rfc2796

VXLAN is L2 extension technology over a shared Layer 3 underlay infrastructure.

As briefly as possible (as this will be covered in more detail throughout), VXLAN will encapsulate the MAC address in a VXLAN header or tunnel (tunnels need an end-point, which is why you have VTEPs – VXLAN Tunnel End Points).  From there traffic between end hosts in the same VNI (the virtual network you are in) needs to tunneled through the L3 underlay network, which means that VTEP devices for a given VNI need to be able tunnel MAC addresses to other end hosts in this VNI over Layer 3.  You can now also use MP-BGP to augment this and get some extra advantages when connecting Data Centres over L3.

If the above all sounds like gobbledy-gook,  hang in there.  Let’s go back a few steps and try to make things get a little clearer below.

So how did we get here?  We are going climb a number of steps to get to where we want to go.

First of all back to Layer 2 (think MAC address identifiers and broadcast domains).  This is always seen as easy stuff from an application point of view.  Why?  Because you can keep your IP addresses the same, move wherever you like and the network underneath will just sort itself out.  However, a Layer 2 domain (or VLAN) is of course a broadcast domain. Connect this together in the wrong way and you can create loops.  Loops mean broadcast storms and network meltdowns – this is bad!  This is why you have spanning tree to block these loops.  So in short, Layer 2 doesn’t scale, and you need to take a lot of care to avoid loops, or paths to network Armageddon.  But on the surface, from an application view, it does seem easy.

So why do people want to spread their VLANs or L2 domains everywhere?  Like extending L2 across Data Centres?  Well, it means as an app developer or server guy or gal, you don’t have to change your IP address whenever you move.  If you want to move a Virtual Server to another platform for business and app continuity, you can just Vmotion it.  You don’t have to change a thing.  The traffic will sort itself out and find the new location.

Vmotion certainly became a compelling driver here.   From a network point of view, there are a bunch of traffic tromboning caveats, and lots to consider, but from a server view you don’t have to touch anything and all seems golden.

This comes to the classic tension between servers, apps and networks.  “Just give me a Vlan for my servers everywhere.  I just want to spec and forget”.  As an app developer, I may have an awareness of IP addresses, but I really don’t want to be changing these all the time just because I move stuff around, or I move my server/VM.  I want to develop my app to work in a world of unlimited resources, memory, compute, infrastructure and bandwidth. I am oversimplifying to illustrate a point of course.

So why do I have to change things like IP address when I move?  Well if I move across a layer 3 boundary then you hit one of the characteristic of IP addresses.  An IP address bundles together two characteristics, 1) identity and 2) location.  So if I move a server with the same IP address, my identity has stayed the same, but my location has changed.  I now need to play with the network to make sure that traffic for my app gets to the new location.  BUT my location info has stayed THE SAME!

Imagine moving house, your name never changes, but you also keep the same home address as well.  The postman is going to get pretty confused as to where, physically, to  deliver your mail.  You will need to put in mechanisms to say, “hey Postman, we are not here anymore we are in another town.. oh and with exactly the same address.  What?  You don’t cover that district?  Ok, you need to tell the post-office to redirect it to the new place, and yes, it has the same address…. I don’t care how you do it, you are a clever postman, I am sure you will work something out…”

At Layer 2 however you just have an identity, which is directly mapped to IP addresses on the host, so your identifier piece to is not just the IP address, the network just sorts itself out through flood and learn semantics.

Ok, so how can you move wherever you like,  and how does the network just sort itself out at Layer 2?

Time for some basics…

Very simply, switches are a bunch of ports connected together at Layer 2.  Frames arrive on the switch port with a source MAC address and destination MAC address.

In a normal switch infrastructure, if you know the IP address of the device you want to get to at Layer 3, then you ARP for the L2 address or MAC address that is associated with that IP address.

I am an end host or PC and I want to get to  another device somewhere and I know the IP address of the device I want to get to.  Well  I sends out a broadcast ARP message saying “any of you guys know what MAC address is associated with this IP address?”  The switch receives this on the port the host is attached to and, because the frame has a source MAC address, it knows that this identifier/MAC address of the host is attached to this particular switch port.  It then makes a note and stores it in its MAC-address table, e.g. Mac A is associated with Port 1 etc.   You may also see it referred to as CAM (Content Addressable Memory) which is essentially the structure of the store of the information in memory i.e. where fixed length addresses are stored for fast lookup – the MAC address is a fixed 48bit (6 byte) address.

If everything is on the same broadcast domain or VLAN, then everything is dandy.  The target end host or PC will receive the query (because it will receive the broadcast ARP packet flooded by the switch), and say, “yep that’s me, here is my actual MAC address”, and reply to the source MAC address.  As this reply goes through the switch, the switch says, “cool, I now know which port that specific MAC address is on, let’s log that in my MAC-address table”.  If a packet subsequently is addressed to that destination MAC, well now the switch knows which port it is on. Additionally if the switch receives traffic for a MAC address it has not seen before (unknown unicast), it floods out of all ports to see if it gets a reply, then makes a note as it passes through.

A confusion I see (often enough to note), is whether a switch broadcasts. In short, switches flood frames, they do not broadcast.  If the switch receives a broadcast frame (like an ARP destination broadcast frame), it typically floods this frame out all ports except the receiving port.  If you think that end-points send broadcast frames and switches flood frames based on whether it is a broadcast, unknown unicast, or multicast (BUM),  then you are good to go in general.

If you ever go on a switch and and do an equivalent of show mac-address table it will show you the Vlan, Mac address and associated port.

switch-op

Ok so why this long-winded explanation of the basics?

I mentioned that at Layer 2, a VLAN is a broadcast domain.   You can reduce the size of the broadcast domain by limiting the number of hosts in the VLAN, and putting other hosts in other VLANs, so if you want to communicate between the VLANs you now need to go across a router or Layer 3 boundary..  Remember routers or Layer 3 boundaries can also be seen as broadcast firewalls.

Say I am a server guy or gal.  I like the flexibility of Layer 2 and hate changing IP addresses. (some might remember  LAM – Local Area Mobility at this point :-))  Also I don’t want my Virtual Machines to really be that aware of the above gubbins, so wouldn’t it be nice if I could get to any Virtual Machine anywhere in my Datacentre, or even across Data Centres as if it was in the same L2 network?  I know, I will ask my network team to just extend L2 Vlans everywhere! Dead easy.  Hmmm. that’s odd?  The network folk seem to be going very red in the face and screaming spanning tree to the heavens.  Some of them are even starting to cry… Not sure what I said?  Seems simple enough to me?

Enter VXLAN, and I can now do the above without ever talking to these over-emotional jitter-bugs.  I can tunnel everything over their network and they don’t have to worry.

I should start by saying that VXLAN is Layer 3 encapsulation of layer 2 traffic (MAC in UDP-IP).  It allows you to tunnel your Layer 2 network over a Layer 3 network.  So now the network team can have their separate, tidy, layer 3 network between Datacentres, or even within the Datacentre, and get rid of spanning tree wherever they like.  They don’t have to extend the actual Layer 2, and we can now have our virtual Layer 2 segment over the top of the IP network – an overlay!

The VXLAN header (which is 50-54 bytes so watch that MTU), has a 24 bit VNI (VXLAN network identifier), and enables 16million+ segments as opposed to 4096 with Vlans.  How many Enterprises have you seen exceed 4096 vlans in their current DC environment? Ok, but at serious scale, for cloud, fair enough.


VXLAN ENCAPSULATION

vxlan-header

Physical addresses with VXLAN:

Outer Dst MAC addr (MAC of the tunnel endpoint VTEP)

Outer Src MAC addr (MAC of the tunnel source VTEP)

Outer IP Dst addr (IP of the tunnel endpoint VTEP)

Outer IP Src addr (IP of the tunnel source VTEP)

————————————————————————-

OK, we now have a small issue. How on earth do you know about the MAC addresses in the same Layer 2 segment or VLAN, that are now on another IP subnet?  This is where we use VXLAN. (Virtual Extensible LAN), effectively extending the concept of a VLAN across a Layer 3 network through tunneling.

Remember, fundamentally you just want to get a packet from A to B.  Always keep this in mind, because no matter what kind of abstraction or fancy acronyms you use, in a packet-switched network you will always be getting a frame or packet from A to B.  If you can walk this path you are good to go.  (Incidentally when this path seems overly convoluted, hitting lots of different way-points on its travels, it is a sure sign that efficiency is being traded for some other functionality or abstraction).vtep

A VTEP is a VXLAN Tunnel end-point.   I mentioned earlier that the MAC address is tunneled in UDP- IP to get across the network. The VTEP provides this association, and has a VTEP IP address associated with itself (source IP address).  It also knows which VTEP IP address it needs to go to in order to get the destination MAC to break out of the tunnel where that destination physically lives..

We are getting to some of the meat now when it comes to the underlying infrastructure.

We still need to find out where all these MAC addresses live, which VTEP they are associated with, so I can forward the frame.

So say I am to ping (ICMP echo) Virtual Machine-2 (VM2) from VM1.

Back to that packet walk.  A frame arrives and hits the Vswitch destined for VM2.

If the destinations MAC address is local and in the Vswitch MAC-address table, then it simply forwards it out of the local port to that host.  If the destination MAC is not local it needs to know which VTEP to forward it to  (package it up in UDP-IP packet, send it to a remote VTEP over Layer 3 to where the MAC resides, and pop it out for normal Ethernet forwarding of the frame at the other end).

The VTEP maintains a MAC address table similar to a standard Ethernet switch. However, instead of just associating an address to an interface, the VTEP additionally associates a Virtual Machince (VM) MAC address to a remote VTEP IP address.

As VM1, if I don’t know VM2’s MAC address then I send out an ARP,  “does anyone in my network segment at Layer 2 have the MAC for the following IP address? Can you please respond with the associated MAC address?”  I like to think switches remember their manners and say “please”.

Here is where the debate begins!

The Vswitch either knows where all the MAC addresses live in the system, which VTEP – IP address they are associated with etc. or it needs to go on a journey of discovery.

The debate revolves around how you populate these tables with MAC addresses to VTEP mappings across your infrastructure.

There are a few options here.  You can either manually pre-populate the MAC tables and associations of switches and VTEPS, because you, with your all seeing eye, know where everything is on all the switches (manual provisioning).

OR you can query a controller (in an SDN something like  Opendaylight, or an NSX  controller for VXLAN), which will populate the mac-addresses statically or dynamically.

Alternatively, on the control-plane discovery side, the way this was initially done with VXLAN was using multicast.  Want to know where the MAC addresses are?  Get the VTEPS to join an IP multicast group and we will we share what we know locally with the other VTEPS in the group.

At this point most people said. “errrr, what?  Ok…no.. no,  I don’t think I am turning on multicast across my core infrastructure for that.”

IP multicast with all its complexity, security problems, bugs, lack of skill set, vendor support etc. is not something you enable on a whim across your core infrastructure to solve a trivial problem in my view.  Finance houses, who arguably need it and have spent countless man-years getting it to work properly for them, often don’t go for reliable multicast as its adds latency but go for Live-Live either with redundancy on a separate physical network layer path or redundancy on the server side.

If you don’t know what I am talking about and want to get started, then there are worse places than the below

http://www.cisco.com/c/en/us/td/docs/solutions/Verticals/Financial_Services/md-arch-ext.html#wp859801

There is a view that if you just understand the complexity of multicast well enough (usually as a result of having been forced to spend far too much time wrestling with it for CCIE, others must share the pain ;-)), and you ignore the proportionally vast number of multicast bugs from vendors over the years, and know how to work around all of them, then it is just fine!   I don’t feel I need to go further here, anyone can write a 100 pager on the pros and (many) cons of multicast, but opinion is just that.

My point is that for VXLAN?  Erm no thanks!

Ok, so say I am not sold on using an SDN  controller to populate this information in my infrastructure (maybe i have decided to use controllers for other reasons, service functions, flow control, orchestration, whatever), and I really don’t want to use multicast. Have I any got other control plane options?

Can I still use a network control plane to get MAC addresses shared around in a leaf-spine architecture or DC interconnect?  Oh and by the way, can I keep the number of MAC addresses flying around to a minimum please?

Remember with all the above, if you are not using multicast groups to which all the VTEPS belong, you somehow need the VTEPS to know about other VTEPS and their associations.

A VTEP needs to know which VTEP to send traffic to.

So what do we need to do now?   Man, you are just making this simple thing way complicated again eh?  Whether you think this makes everything simpler or more complicated, welcome to the world of abstraction.

Let’s look at physical and then logical underlays….  (PART 2)