The 5-hour CDN

A content delivery network, or content distribution network (CDN), is a geographically distributed network of proxy servers and their data centers. 

The term "CDN" ("content delivery network") conjures Google-scale companies managing huge racks of hardware, wrangling hundreds of gigabits per second. But CDNs are just web applications. That's not how we tend to think of them, but that's all they are. You can build a functional CDN on an 8-year-old laptop while you're sitting at a coffee shop. I'm going to talk about what you might come up with if you spend the next five hours building a CDN.

It's useful to define exactly what a CDN does. A CDN hoovers up files from a central repository (called an origin) and stores copies close to users. Back in the dark ages, the origin was a CDN's FTP server. These days, origins are just web apps and the CDN functions as a proxy server. So that's what we're building: a distributed caching proxy.

HTTP defines a whole infrastructure of intricate and fussy caching features. It's all very intimidating and complex. So we're going to resist the urge to build from scratch and use the work other people have done for us. We have choices. We could use Varnish (scripting! edge side includes! PHK blog posts!). We could use Apache Traffic Server (being the only new team this year to use ATS!). Or we could use NGINX (we're already running it!). The only certainty is that you'll come to hate whichever one you pick. Try them all and pick the one you hate the least.

What we're talking about building is not basic. But it's not so bad. All we have to do is take our antique Rails setup and run it in multiple cities. If we can figure out how to get people in Australia to our server in Sydney and people in Chile to our server in Santiago, we'll have something we could reasonably call a CDN.

Routing people to nearby servers is a solved problem. You basically have three choices:

  1. Anycast: acquire routable address blocks, advertise them in multiple places with BGP4, and then pretend that you have opinions about "communities" and "route reflectors" on Twitter. Let the Internet do the routing for you. Downside: it's harder to do, and the Internet is sometimes garbage. Upside: you might become insufferable.
  2. DNS: Run trick DNS servers that return specific server addresses based on IP geolocation. Downside: the Internet is moving away from geolocatable DNS source addresses. Upside: you can deploy it anywhere without help.
  3. Be like a game server: Ping a bunch of servers and use the best. Downside: gotta own the client. Upside: doesn't matter, because you don't own the client.

DNS load balancing is pretty simple. You don't really even have to build it yourself; you can host DNS on companies like DNSimple, and then define rules for returning addresses. Off you go!

Anycast is more difficult. We have more to say about this — but not here. In the meantime, you can use us, and deploy an app with an Anycast address in about 2 minutes. This is bias. But also: true. Boom, CDN. Put an NGINX in each of a bunch of cities, run DNS or Anycast for traffic direction, and you're 90% done. The remaining 10% will take you months.

Full Version

The goal is to provide high availability and performance by distributing the service spatially relative to end users.