Damian Zaremba Blog

What's Anycast and why would I use it?

Recently I've heard some confusion as to what Anycast is, ranging from it being part of the IPv6 specification to multicast.

While I'm not a network dude, hopefully this post will give you an overview of what Anycast is, how it's useful and what to watch out for.

Overview

Firstly to clear up some confusion, Anycast is not a protocol (TCP/UDP) or protocol version (IPv4/IPv6), but a type of addressing such as Unicast or Broadcast.

As far as the 'client' is concerned, they are talking unicast (to a single node), but this may be routed to one of many nodes depending on the routing table.

A quick example of this happening can be seen if you traceroute to an Anycast IP address from different locations in the world as seen below

traceroute to ian.ns.cloudflare.com (173.245.59.118), 30 hops max, 60 byte packets
 1  router1 (10.44.200.254)  15.334 ms  17.279 ms  19.183 ms
 2  host-92-23-160-1.as13285.net (92.23.160.1)  29.204 ms  32.556 ms  36.016 ms
 3  host-78-151-225-101.static.as13285.net (78.151.225.101)  38.386 ms  40.704 ms  42.598 ms
 4  host-78-151-225-84.static.as13285.net (78.151.225.84)  63.415 ms xe-11-2-0-bragg002.bir.as13285.net (78.151.225.72)  44.347 ms  46.697 ms
 5  xe-11-1-0-rt001.the.as13285.net (62.24.240.6)  49.117 ms xe-11-1-0-rt001.sov.as13285.net (62.24.240.14)  51.454 ms  54.374 ms
 6  host-78-144-1-61.as13285.net (78.144.1.61)  56.102 ms  48.605 ms  41.366 ms
 7  host-78-144-0-180.as13285.net (78.144.0.180)  42.400 ms host-78-144-0-116.as13285.net (78.144.0.116)  39.599 ms host-78-144-0-164.as13285.net (78.144.0.164)  37.310 ms
 8  195.66.225.179 (195.66.225.179)  38.239 ms  73.818 ms  72.494 ms
 9  ian.ns.cloudflare.com (173.245.59.118)  38.989 ms  37.828 ms  38.971 ms
traceroute to ian.ns.cloudflare.com (173.245.59.118), 30 hops max, 60 byte packets
 1  router1-dal.linode.com (67.18.7.161)  19.229 ms  19.291 ms  19.392 ms
 2  xe-2-0-0.car03.dllstx2.networklayer.com (67.18.7.89)  0.197 ms  0.220 ms  0.202 ms
 3  po101.dsr01.dllstx2.networklayer.com (70.87.254.73)  0.513 ms  0.537 ms  0.624 ms
 4  po21.dsr01.dllstx3.networklayer.com (70.87.255.65)  0.902 ms  0.983 ms  1.020 ms
 5  ae16.bbr01.eq01.dal03.networklayer.com (173.192.18.224)  16.145 ms  16.138 ms  16.112 ms
 6  141.101.74.253 (141.101.74.253)  0.602 ms  0.566 ms  0.549 ms
 7  ian.ns.cloudflare.com (173.245.59.118)  0.523 ms  0.670 ms  0.493 ms

Note the the DNS (ian.ns.cloudflare.com) resolves to the same IP (173.245.59.118), but the traffic goes to different routers (141.101.74.253 vs 195.66.225.179).

The 'magic' here happens at the routing layer, a router will have multiple 'paths' to the IP (173.245.59.118) and chooses the best one based on its metrics.

A large (public) example of Anycast being used is the root nameservers - combined there are something like 13 (v4) IP addresses serving NS entries, with ~350 servers behind those 13 IPs distributed throughout the world.

For those used to dealing with internal networks

Imagine you have 3 offices configured in a L2 triangle for redundancy:

Your gateway is England, Scotland and Wales need to access it if either of their links fail. You also have servers in each of the offices and want to access them as fast as possible.

Keeping the links at layer 2 and blocking one (via something like STP) would give you gateway redundancy - if link A failed, link B would un-block and start forwarding traffic.

The problem here is you are having to go via Scotland to access Wales from England, when they have a directly link. You can't have both links up because you'd cause a loop and no traffic would pass.

The solution is to make both links layer 3 and use a routing protocol to advertise the ranges over the top of them - there are reasons you'd want to use layer 2, or have one layer 2 and one layer 3, this is outside the scope of this example though.

I'm not going to get into how to configure a routing protocol - there are a few to choose from, for this use case it doesn't hugely matter. Lets assume both links are the same speed and are direct (1 hop away).

The topology looks something like

This means if any of the offices want to talk to another office, they are only 1 hop away and don't have to go though a middle man gateway - but if the direct link fails, they can be re-routed the long (2 hops) way around.

In summary using routing between the 3 offices allows for transparent failover and best-performance access. In a more complicated topology this is very useful.

Essentially, this is also how 'Anycast' works - the main difference being is we're dealing with multiple nodes, rather than just re-advertising a single node (route). The routing doesn't care either way - it will choose the 'best' path based on what it knows about.

Why is this useful?

Anycast gives you a number of benefits such as

  • Higher reliability/availability
  • Higher performance
  • Localisation/migration of DoS/DDoS attacks
  • Client agnostic

Why doesn't everyone use it?

There are some drawbacks to using Anycast addressing

  • Can be complex to deploy - requires an AS, addressing/routing management, more routes being advertised
  • Can be expensive - IP space, network infrastructure, training
  • Harder to troubleshoot
  • Harder to monitor

How would I use it?

One of the most common use in for DNS services, however any stateless protocol can take advantage of Anycast - it is possible to use stateful protocols, however you need a much higher level of control and risk 'breaking' connections across 2 nodes.

Imagine you where serving users out of Europe, US and Asia-Pacific.

A common thing to do would be create 3 NS entries pointing to servers in each location

myawesomedns.something.  360 IN  SOA ns1.myawesomedns.something. root.myawesomedns.something. (1303702991 14400 14400 1209600 86400)

myawesomedns.something. 360 IN  NS  ns1.myawesomedns.something.
myawesomedns.something. 360 IN  NS  ns2.myawesomedns.something.
myawesomedns.something. 360 IN  NS  ns3.myawesomedns.something.

; EU
ns1 360 IN  A   10.0.0.1

; APAC
ns2 360 IN  A   192.168.0.1

; US
ns3 360 IN  A   172.16.0.1

These NS records would then be "glued" on the root nameservers, so the first resolver can find the NS servers IP address.

Now there are 2 problems with this, if you loose a server the resolver (client) will be slow to failover (probably timing out a few requests) and a user in the US might end up going to the server in APAC rather than the 'local' one to them.

Anycast helps to solve both these issues.

If a server goes down you can withdraw the route and (as long as you're not flapping routes) your peers should pick it up in a few minutes.

In the case of a user making a request to the Anycast IP, the request will be routed to the 'nearest' node - in normal operation this would be the 'local' one, if that had failed it would automatically (after the route withdrawal) go to the next 'nearest'.

Now I'm quoting 'nearest' and 'local' and these are determined by your routing protocol, normally based on metrics such as weight, hops or speed, rather than physical location.

The routing before we start using Anycast

Requests to ns{1..3} (192.168.0.1, 127.16.0.1, 10.0.0.1) before we start using Anycast

So lets start using Anycast, we're going to keep the original IPs as 'maintenance' IPs - ones we know only go to that single host and create a new block.

myawesomedns.something.  360 IN  SOA ns1.myawesomedns.something. root.myawesomedns.something. (1303702991 14400 14400 1209600 86400)

myawesomedns.something. 360 IN  NS  ns1.myawesomedns.something.
myawesomedns.something. 360 IN  NS  ns2.myawesomedns.something.
myawesomedns.something. 360 IN  NS  ns3.myawesomedns.something.

; EU
srv-ns1 360 IN  A   10.0.0.1

; APAC
srv-ns2 360 IN  A   192.168.0.1

; US
srv-ns3 360 IN  A   172.16.0.1

; Anycast
ns1 360 IN A 10.1.0.1
ns2 360 IN A 10.1.0.2
ns3 360 IN A 10.1.0.3

10.1.0.1, 10.1.0.2 & 10.1.0.3 now need to be advertised from the EU, APAC and US routers as /32 blocks - with no summary route for 10.1.0.0 0.0.0.255. 10.1.0.1, 10.1.0.2 & 10.1.0.3 also need to be brought up on loopback interfaces on the server and the DNS server configured to bind to them.

Once this propagates out routing will take care of sending your traffic to the right place, in the event of an outage or maintenance just withdraw the routes so they are no longer advertised and traffic will be re-routed.

This is slightly oversimplified as there is a certain level of tuning needed to stop flapping, which can cause your updates to be delayed (due to dampening timers etc).

I won't go into the BGP configuration in this post as it's a little lengthy to explain/draw how routing ables look, essentially you configure your neighbours and send them /32 routes - cisco have a nice example for configuring BGP on IOS. Be careful (use filters) when dealing with BGP/EGP protocols as you can do really weird things ;)

The routing after we start using 'Anycast' IPs

Requests to ns{1..3} (10.1.0.{1..3}) after we start using 'Anycast' IPs

In the case of the EU site failing the routes would withdraw and requests would be re-routed

In the case of the EU and the US sites failing, requests would be re-routed in the same manner

All this happens at layer 3 - in the DNS service scenario we would probably use GeoIP logic at layer 7 to localise the response and then serve the request.

Slow SSH to new CentOS servers

Recently when installing new servers I 'randomly' had an issue where SSH connections would lag out for 20-30 seconds on login then function fine once authenticated.

I noticed this issue on CentOS 5/6 servers, however it probably applies to anything running glibc 2.10+.

Now while there are a few reasons SSH might be doing this (most of which can be seen using verbose flags), it wasn't the usual suspects (GSSAPI etc).

The interesting part was, this wasn't happening on all servers (they are standard config wise thanks to puppet).

It turns out the UseDNS setting was failing, very slowly. Now a simple way to quickly solve the issue is to set UseDNS no in your sshd_config file.

This might be acceptable, depending on how much you have access restricted however there is an alternative.

It turns out there was a change in glibc that can be seen on the changelog here.

Under some circumstances (especially when firewalls are involved) DNS resolution will fail and glibc should fallback, close the socket and start again. It would seem this takes rather a long while when trying to SSH!

Thankfully there is a (recently documented) option that you can add to your resolv.conf file single-request-reopen. This forces glibc to enter fallback mode straight away and kills the delay.

An example of the 'patched' /etc/resolv.conf file

# This file is managed via Puppet

# Servers
nameserver 8.8.8.8
nameserver 8.8.4.4
nameserver 208.67.222.222
nameserver 208.67.220.220

# Options
options single-request-reopen

Now you might want to dig into your network stack a little more and make sure you can handle IPv6/dual stack properly but for now, problem solved!

KVIrc update breaks Irssi Proxy

I recently updated KVIrc on OS X from 3.2.0 to 4.2.0. The update appeared to complete successfully, however on trying to connect to my Irssi Proxy server nothing appeared to happen.

The output from the server console was something like below

This is the first connection in this IRC context: using the global server
setting
Attempting connection to myirssiproxy.local (RawrIRC) on port 8000
Looking up the server hostname (myirssiproxy.local)...
Server hostname resolved to 10.44.1.1
Contacting IRC server myirssiproxy.local (10.44.1.1) on port 8000
Connection established [myirssiproxy.local (10.44.1.1:8000)]
Local host address is 10.44.1.2

It turns out that in 4.2.0 the servers automatically had 'Switch to SSL/TLS by using the STARTTLS extension' enabled.

Unfortunately Irssi Proxy doesn't like STARTTLS, so while the logs show successful connections no data is transferred over the socket.

The simple solution to get things back up and running properly is to disable the option under each server.

To do this browse to 'Configure servers' under Settings, double click on the server under the network, tab over to Advanced and disable 'Switch to SSL/TLS by using the STARTTLS extension'.

If you have a large number of servers you can also directly edit the ini file under ~/.config/KVIrc/config/serverdb.kvc. In this file each network is a section and each server is numbered.

To disabled STARTTLS for the first server in network A, your config should look something like the below

[A]
0_EnabledSTARTTLS=false

This will usually have a number of other items, a full example is below

[FreeNode]
0_Port=8000
0_Hostname=myirssiproxy.local
0_Ip=10.44.1.1
0_Nick=MyNick
NServers=1
RealName=MyName
0_Pass=Mypass
0_AutoConnect=true
0_EnabledSTARTTLS=false
NickName=MyNick
0_Id=Server1
0_EnabledCAP=false
0_RealName=MyName
0_User=MyUser
UserName=Damian

This also makes it quite easy to generally manage servers in KVIrc, outside of wide scale settings changes.

Running a WSGI app via Gunicorn from Python

Anyone who's familiar with Gunicorn will know just how simple it is to get up and running; gunicorn myapp.wsgi.

For a little project I'm working on I wanted to take this up a level and run Gunicorn from a Python script, passing in any options as required and not relying on sys.argv.

Because of the way most the Gunicorn application classes are written they expect to be run from the console_scripts setup and have rather close integration.

Unfortunately this made getting to the point of running run.py and having Gunicorn start, rather harder than it needed to be.

For the purposes of this post my WSGI application is the example Flask app;

# myapp.wsgi
from flask import Flask

application = Flask(__name__)

@application.route("/")
def hello():
    return "Hello World!"

if __name__ == "__main__":
    application.run()

Now, to get this running under Gunicorn we need to create a custom application

# myapp.mycustomapplication
from gunicorn.app.base import Application
from gunicorn import util

class MyCustomApplication(Application):
    '''
    Custom Gunicorn Application
    '''

    def __init__(self, options={}):
        '''__init__ method

        Load the base config and assign some core attributes.
        '''
        self.usage = None
        self.callable = None
        self.options = options
        self.do_load_config()

    def init(self, *args):
        '''init method

        Takes our custom options from self.options and creates a config
        dict which specifies custom settings.
        '''
        cfg = {}
        for k, v in self.options.items():
            if k.lower() in self.cfg.settings and v is not None:
                cfg[k.lower()] = v
        return cfg

    def load(self):
        '''load method

        Imports our application and returns it to be run.
        '''        
        return util.import_app("myapp.wsgi")

As far as I can tell the minimum methods you can define is 3;

  1. __init__ - This sets up our base attributes
  2. init - This is called to load option settings, we use self.options to set the cfg items.

  3. load - This loads the application so when wsgi() is called the app runs!

To run the application we just need to initialize the MyCustomApplication class, passing any options as required.

#!/usr/bin/env python
# myapp.run

from myapp.mycustomapplication import MyCustomApplication

if __name__ == "__main__":
    options = {
        'bind': 'localhost:8080',
    }

    MyCustomApplication(options).run()

In practice you'd probably load the options from a config file in your program.

Hopefully this should help you getting Gunicorn running smoothly in a more integrated manner :)

RADIUS authentication on a Dell PowerConnect M6220 switch

A while back I posted the 'how-to' on setting up HP switches for RADIUS authentication (here for GbE2c L2/L3 switches and here for ProCurve switches).

I also had to setup some Dell PowerConnect M6220 switches; these proved to be slightly more complicated than the HP's, but followed the Cisco style config.

First we need add a custom login and enable method

aaa authentication login "LineName" radius local
aaa authentication enable "LineName" none

These are used for connections over SSH - "LineName" can be anything, I tend to use the company name to keep things standard.

On the second line, you could also use radius local - this would require a password to enter enable mode. Personally I control the login mode from the RADIUS server and find having to enter a password to get to enable again a hassle.

Next we need to setup the RADIUS server

radius-server host auth serverIpHere
name "server1"
usage login
key "SecretKeyHere"
exit

Now we just need to configure the ssh line to use our aaa methods

login authentication LineName
enable authentication LineName

Once complete you should be able to login via ssh using your RADIUS details.

If you require access to the switch over http(s) you can also configure the HTTP server to authenticate against RADIUS

ip http authentication radius local
ip https authentication radius local

We include local on every method to ensure we don't get locked out the switch.