Denial of Service – for once a technical term that is pretty much exactly what it says on the tin. Essentially the service you are trying to provide (be that internet access, a corporate application, connectivity), is denied. I unplug your TV and you can no longer watch TV, denial of service, simple.
In a volumetric Denial Of Service, where a large “volume” of traffic is created to fill up your connectivity pipe, typically you need some way of creating the traffic and sending it towards your victim. This could be a single machine pumping out traffic, but this is rather hard to scale nowadays for the amount of traffic needed to clog some pretty beefy pipes.
Distributed Denial of Services, again, means what it says. Instead of using one machine, you use a lot of distributed machines to all generate traffic simultaneously and point it at your victim (or many victims) to deny a service.
What is the problem?
Denial of Service and Distributed Denial of Service have been around for some time, but with the advent of IOT (Internet of Things) and proliferation of cheap devices with basic distros and firmware, the scale of DDOS has exploded over the last year or so. The major headlines making people take notice were around the Dyn DNS attacks – which were a combination of the Mirai botnet and Bashlite, affecting the services of Twitter, Github, Dyn DNS, Netflix, Reddit etc. Variants thereafter affected many Deutsche Telecom and Talk Talk routers to name a few.
This, in turn, signaled the dawn of Terabit per second DDOS attacks with tens of millions of IP addresses.
Mirai and Bashlite
The Mirai botnet scans Ip addresses of vulnerable IOT devices with default usernames and password, reports to a control server, and from there can become a BOT to be used for DDOS. It also removes competing malware from memory and blocks remote admin ports. The source code for Mirai was dropped into the open, so effectively became open source.
Bashlite came out of a flaw in the Bash Shell (ShellShock) which took advantage of devices running BusyBox. BusyBox is software to provide unix tools in a single binary – originally for Debian distro, so instead of each app having its own binary, you can call the single BusyBox binary with different names for various apps. Essentially this allows code to be shared between multiple applications without requiring a library to reduce overhead. Useful for memory constrained, portable operating system environments – POSIX – hence IOT).
From booters and stressers, to unpatched NTP or DNS internet services, old home routers, 4g/5g handsets, all variety of IOT devices with basic firmware, flimsy linux distros with secondary passwords etc. if these types of devices are converted to a network of botnets, then you can ask them all to send traffic to somewhere and the traffic scales rather quickly. A brief Internet search for booters and stressers reveals just how easy it is to rent a service to cause annoyance and disruption (not that you would generally want to do that of course, as they are mostly illegal for use in the wild, but for testing purposes for your own infrastructure with correct permissions … maybe).
Unfortunately in a rush to get basic functionality out there, the market has been flooded with a variety of cheap IOT devices (or IOCT – Internet of cheap or crappy things).
As many of these cheap IOT devices never get upgraded through hardware, firmware or software, or swapped out all too often, we shouldn’t expect this problem to go away any time soon. (Mirai, incidentally, means “future” in Japanese.)
This is the general problem, so where typically on your infrastructure does it present?
The Denial of Service problem typically affects several places in the network, most commonly the Internet Pipe, the Firewall and the actual server under attack, but SQL servers, load-balancers, IPS/IDS are also affected. The failure of Firewall, IDS/IPS or Application delivery controllers to address the problem (they tend to get hosed in a serious attack), has lead to dedicated DDOS solutions in the market. As Internet pipes tend to be hit the most a lot of focus has ended up there.
So what is the solution, and is there a 100% effective solution today? (I am guessing we already know the answer).
Historically the first way to mitigate an attack would be for an enterprise who is being DOS’d to contact their friendly Service Provider upon detection and ask for help. The SP would then perhaps install an ACL or provide filtering ad-hoc. This could be a short conversation and ultimately money based – so not altogether that practical long term.
Another way would be to use DRTBH (Destination-based remote-triggered black hole), which essentially needs SP acceptance of BGP advertisements from the Enterprise as part of an agreement. The Enterprise, by means of a BGP community attribute, would (upon detection of an attack), advertise say a /32 to the SP and then this would be blocked. Of course this effectively completes the DOS for the host but could be of use for investigation and damage limitation.
A second method following this is to use SRTBH (Source-based remote-triggered black hole) which again involved BGP interaction, but this time the SP would look at blocking the source of the attack, typically in relation to something like uRPF (unicast Reverse Path Forwarding). Remember BCP 38?
There are two forms of uRPF – Strict mode and Loose mode. In strict mode the packet must be received on the interface that the router would use to forward the return packet. This has the potential to drop legitimate traffic if you have asymmetric routing for example, i.e. say you receive traffic on a interface that is not the choice of the router to forward return traffic.
In loose mode, the source address must appear in the routing table, not necessarily the actual interface, and allows default routes. You can also configure filters to permit or deny certain sources. There is also an option for the ISP-to-ISP edge (uRPF-VRF) to allow uRPF to query the VRF table containing all routes for a specific eBGP peering session over the interface, verifying the source addresses of packets matching the advertised routes from the peered ISP.
Not so much DDOS mitigation as a fairly manual or prescriptive dropping / blackholing of traffic really. I tend to see this as damage limitation.
Of course if you can identify an attack manually or by notifying your service provider then you don’t actually have to drop the traffic. You can redirect to a scrubbing or cleaning service, e.g. maybe a manual PBR redirect to drop the traffic onto a VPN/VRF, but generally it is better if there is a method to inform and push policy to redirect real-time, based on an attack in progress.
To this end we now consider BGP Flowspec, which is a little like enhanced PBR (Policy Based Routing) for more granular policy decision making and policy distribution across the infrastructure.
If you are going to drop traffic or at least differentiate between what you think is bad traffic and have some granularity of control, then BGP Flowpsec becomes a good option. If you know the type of traffic you should not be seeing, you can do worse than Flowspec to both identify traffic and distribute policy to the routers in your network that need to drop the traffic. Indeed, based on policy, you can also determine which traffic you need to redirect and then redirect to a dedicated DDOS scrubbing device or service where “clean” traffic is returned to the user and service continues despite an attack taking place.
So what does BGP flowspec do?
Well you can effectively create a policy to match a particular flow with source AND destination, and L4 parameters with packet specifics such as length, fragment etc, and allow for a dynamic installation of an action at the border routers.
If this policy is matched then you can perform an action:
- drop the traffic
- Redirect the traffic – e.g. inject it into a different vrf (for analysis)
- or allow it, but police it at a specific defined rate.
Much like rate-limiting polices and QOS but with specific malicious flows that you recognise.
BGP Flowspec basically adds a new NLRI into BGP (AFI=1, SAFI=133), NLRI is a field in BGP that, at its simplest, is used to identify the prefix for BGP advertisements (literally Network Layer Reachability Information), but as a variable field, you can use NLRIs in BGP to represent pretty much anything you wish (BGP is being overloaded with all sorts nowadays with this NLRI field, think EVPN etc.).
In this case you add information about a flow as below :
1.Destination IP Address
2.Source IP Address
Now you can match based on the above and define what characteristics of traffic are most likely DDOS. Indeed it is typical to have a Netflow feed off to an analysis engine to determine which traffic needs cleaning or is DDOS traffic and then use BGP Flowspec to instruct the network which precise flows to redirect to a cleaning service based on specific defined parameters as per the above – real time! (redirect/next hop modification, DSCP remark, drop or police, VRF leaking).
BGP flow-spec is a client-server model, so you can have the analyser as the server dictating to the client what you would like to match and action on, and then instruct the client (router for example) what to drop, police or redirect.
Flowspec is therefore a useful tool for policy distribution, drilling into specific actions on specific flows with traffic characteristics, and as a method to inform redirection, rate-limiting or dropping, real-time. Now let’s have a look at some typical types of DDOS attack.
Types of attack
Amplification attacks (dns, ntp, ssdp, snmp, chargen, qotd etc.). The idea here is that traffic is spoofed. By not using a full handshake, a large answer is sent to the victim’s address, or takes advantage of vulnerable protocols on large servers. DNS is a prime example, where DNS responses can be much larger than the initial request – up to 4096 bytes with EDNS.
An example mitigation for this is using rate limiters for traffic and ports that have no business crossing network boundaries – SSDP UDP 1900, Netbios UDP 138, NTP 123, Chargen UDP 19 (character generation stream, not that prevalent, but there have been cases), fragments and large TCP Syn packets.
On some platforms you can be even more granular. Instead of rate-limiting per class of traffic per interface, you can rate-limit per user (micro-flow policing) e.g. for DNS and NTP – if you see excessive traffic of this type then it is likely an amplification attack. Understanding your traffic patterns as best you can is of course key with these techniques.
At a Service Provider level, many of these amplification attacks can be blocked with router config. There is no desperate need to send off to a cleaner, as routers can do a quicker job without the redirect if patterns are identified at the edge router with ACLs or BGP Flow-spec (maybe informed by an analyser). You do however need to be extremely careful and know exactly what you are doing.
At the Enterprise level it is usually a bit late by the time the attack reaches you, so BGP flow-spec or cloud services might suit best.
It also makes sense to implement a bunch of security best practices on your infrastructure itself e.g. Generalized TTL Security Mechanism (GTSM) for TTL Security (RFC 5082 if you are interested.) Protect your control plane and management plane on devices through policing and protections, , MD5 auth for your routing protocols, key chaining, using SSH/SFTP/SCP and of course AAA where possible.
Layer 3/4 stateless volumetric attacks (udp frag, icmp flood ) usually filtered at the edge router typical of an SP or inline device for on premise (DDOS inline box). If using on-premise or inline (e.g. scrubbing device inline, maybe on a s firewall), then if the pipe is already hosed before you get a chance on prem, seeking help from your service provider is a priority.
Stateful volumetric attacks – tcp syn, http, ssl, sip (e.g look deep into Syn packets for legitimate replies) For these attacks you need a more intelligent scrubbing device, and ideally deploy as close to the edge as possible. These attacks are typically scrubbed at the PE or SP data centre (like an SP with one of the leading DDOS applicance vendors). This is locally hosted in the provider and gets around the tricky routing with BGP external and returned scrubbed traffic using GRE tunnels – much easier when it is your own infrastructure and routing. Another option would be cloud scrubbing of traffic and clean traffic returned. If you are tackling this on premise then maybe you can collapse this function into a FW or ddos box., as long as the DDOS inspection and mitigation is done first before other security or traffic controls.
Finally your slow pace attacks that can be stopped in the cloud or require an inline solution (slowloris slow and low, http flood, ssl floods, sql injections, xss csrf, app misuse, brute force, server resource hogs etc.) where an SP doesn’t typically have much visibility of the types of attacks that slowly exhaust the resources on the server. For example Slowloris where sending HTTP headers in tiny chunks as slow as possible and waiting to send the next chunk until just before the server times out the request. The server then is forced to wait for the headers to arrive – so if you quickly open enough connections, the server struggles to handle legitimate requests. Or R.U.D.Y (R U dead yet?) where you send HTTP POST requests with an abnormally long ‘content-length’ header field and then start injecting the form with information, one byte-sized packet at a time, every 10 seconds (or random intervals in some ddos protection evasion techniques), with the long content header stopping the connection being closed and exhausting the connection table.
You should consider enhanced reporting on these attacks. Netflow is essentially a sampling technology, so in order to spot these type of attacks you might need an inline device. They need some pretty deep inspection of each packet and anomalies so it is nigh on impossible to do this effectively at several 100 Gbps. Keep this in mind when paying for cloud solutions and any premiums. The rule in general is to do this as close to the resources you would like to protect as possible, hence inline, and understand the performance limitations.
Types of redirection
DNS redirection to the cloud is one method, and very popular at the moment (everyone loves clouds in IT but, oddly, no-one expects rain). With DNS redirection, when you resolve to an IP address, you actually resolve to the DDOS protection service address and therefore go through the scrubbing protection before being served. This can be on-demand or always on.
A second method is BGP based “inter-as” DDOS protection. Similar to BGP hijacking. Effectively your Service provider advertises a more specific /24 block for your site address (e.g. ip address 1.2..3.4/24) and as this is more specific routed traffic is magnetically drawn to this advertisement, which in turn redirects you to a scrubbing site first, before returning clean traffic back to you over a GRE tunnel. You need this tunnel of course to prevent a routing loop – you send traffic back to the original address, which again gets picked up by the /24 and ends up back at the scrubber for ever and ever. It is also possible to tie up a /24 permanently to specifically cater for DDOS. One other method that some providers are using is to effectively act as an IP proxy – give away their own public IP addresses dedicated to you to obfuscate your own in an “always on” type service (for l3/l4 volumetric), bit like dns redirection, but a new IP advertised that belongs to the cloud. When you get the traffic back you need to NAT etc. for your own range. The caveat here is that it is always on, and all your traffic goes through the cloud provider first.
Typically the tunnel is what you are paying for from the cloud provider. Remember, if you are doing a /24 BGP advertisement, you are essentially doing BGP hijacking. BGP origin validation could make this approach more difficult in future.
There are various flavours here – your edge router can act as a detector sending sflow or Netflow to your cloud provider, who, upon detection of DDOS, takes over the routing via BGP advertisement to redirect to the Cloud DDOS provider for the duration of the attack – on demand service (GRE back to the edge router for scrubbed traffic), or a permanent IP advertisement so everything goes to the cloud by default – always on service.
Service providers can of course bypass this and provide a hosted DDOS solution, where all the redirection happens internal to the service provider in a local SP Data Centre where they scrub the traffic for your Internet connectivity. As above, this can be a premium “always on” service, or a service that kicks in under attack notification either automatically based on suspicious detected traffic or on request from the customer.
Finally let’s have a brief look at places in the network
Places in the network
Once you understand the type of attack you are protecting, then you can look at which service and place in the network is most effective for each, and whether you have covered what you need. To be honest, whenever most people talk about DDOS they jump to volumetric – my pipe gets filled, re-direct and clean please.
To summarise some of what I have mentioned above, you have a bunch of options:
“In the cloud” services from a DDOS vendor where traffic is redirected either permanently or upon detection of an attack, cleaned, and returned ( Your method of redirection is either DNS based or BGP inter-as based). An ISP hosted DDOS service; your SP can redirect you to a cloud vendor DDOS service, or stand up their own detectors and scrubbing service in their own Data Centres to monitor traffic at their Internet peering points and potentially provide a service back to a customer – like a “Clean Pipes” solution. Or you can go On Prem (centralised, distributed, mixed, inline). Finally you can of course mix and match these approaches to make sure you cover as many conceivable denial of service attacks as possible at a latency that suits your applications, even down to deep packet inspection. I guess there is no one size fits all. One thing to note is if you use a Cloud provided DDOS, then to protect web-sites/L7 a proxy based DNS service usually works for most stuff and they maybe look at the l3/l4 attack prevention on demand should the attack circumvent the DNS and head straight for the real IP. In truth multi-terabit, deep L7, slow and low cleaning etc. at any kind of reasonable latency and cost doesn’t exist today, which is why you protect per asset or web address or as a percentage of traffic.
Ultimately in future I expect DDOS protection to be a given as part of any Internet service consumed (it mostly already is), and premiums for this kind of service will likely come down, depending on volume of attacks, complexity and scale. Cloudflare, for example, have just announced that DDOS is bundled at any scale within their Internet services as part of the usual rate.
New vectors tend to mean new solutions to problems so I would expect this to be charged for, wherever significant investment has been needed to solve the problem.
So there you have it, a brief tour of DDOS approaches being used in the market today. Implement the above, scrub yourself down, redirect yourself to the nearest good cup of coffee, and relax knowing your traffic flows gloriously uninterrupted…for now.