How Content Delivery Networks decide what server is closest to you

January, 2024

Have you ever wondered how a Content Delivery Network (CDN) decides which server is closest to you? CDNs are famed for making request times shorter - they are marketed for their ability to make the user experience of static content serving better. Under the hood, requests to contents stored on CDNs are the same as every other web request.

For a typical web request, Domain Name Server (DNS) resolvers translate the raw website address string to the corresponding website address IP address. So basically, when we try to download a file through a link in our browser, the browser tries to resolve the IP address that the website host name is associated with. If the browser can't resolve the website address to its IP address, the browser makes a call to the operating system to check if that website's hostname IP address is in the operating system's cache. If the operating system cannot resolve the IP address of the hostname, it makes a call out to a DNS resolver that is set up on the computer.

The DNS resolver set up on the computer is also called a recursive resolver. It tries to find out the IP address of the hostname given to it through making calls to different levels of the domain name servers stack. It calls the root level DNS asking the root level DNS if it knows the owner of the hostname and the IP address so that the user will be able to download the file. If the root level DNS does not know the IP address of the hostname, the root level DNS responds to the DNS resolver on the user's computer with the IP address of the top level DNS that it thinks may know the requested hostname IP address. The DNS resolver on the user's computer can now ask the same question to the top level DNS. If the root level domain name server had the information that the user's computer DNS resolver had asked for, the search will end there and we will have our host name IP address which our browser can now use to directly request a download for the file we are interested in.

However, armed with information about the top level DNSs that may know the IP address of our hostname, the DNS resolver in the operating system makes a request out to the top level DNS. When this request is made to the top level DNS, the top level DNS checks to see if it has the IP address of the provided hostname in its cache and if it doesn't, it responds with Information stating that it knows the authoritative name server that should have information about the IP address of the hostname.

At this point, we might be thinking that our request is taking a lot of time trying to decide what server is closest to us in the CDN. However, with the address of the authoritative name server provided to the user's DNS resolver, the DNS resolver asks the authoritative name server what the IP address of the hostname that we are trying to resolve is? Thankfully, it's able to give us that information because it has it on its end. The authoritative name server is normally the owner of the CDN (or acts on behalf of the CDN), it knows all the IP addresses that the hostname can point to. The authoritative name server returns the IP address that the web content hostname maps to.

However, the process of selecting which IP address to return from a list of IP addresses managed by the authoritative name server may not be straightforward. The process of choosing a corresponding IP address can be a little bit complex because remember: we are dealing with the CDN server and the whole aim of CDN servers is to provide us with information from a server that is closest to us. A CDN server can use different approaches in choosing how to route a user request to the nearest server. It could decide to use a load balancer on its end with a static IP address. This load balancer may be in the middle of the world. The load balancer routes the user's request to the server that's closest to the origin of the request. The effect of this is that The load balancer with a static IP address may be slow to process the request because it is in the middle of the world and has associated latency costs since it's not as close to the user as possible. Users close to the middle of the world will get fast responses. However, users far away from the center of the world will experience some latency.

Basically, the load balancer acts as a reverse proxy and looks for the server that is close to the origin of a user's request. Hopefully, the closest server returns the cached content back to the user. However, CDNs these days decide to use the DNS approach where at the point of IP address resolution, they are able to determine what IP address is closest to the user based on information that authoritative name servers acting on behalf of CDNs can sniff from a user's request payload IP address. So at this point, the authoritative name server is able to determine what IP address to resolve a user's request hostname to based on information that they have about the originating client IP address. The authoritative name server may be able to tell that your request originated from Dublin, Ireland and will return the IP address of a server that's closest to Dublin, Ireland.

Once a request hostname IP address has been returned to the user who is trying to retrieve data on the browser, the browser will now have the IP address of the server that is closest to it and it can now start downloading the data. Instead of the authoritative name server that received the user request for a hostname IP address to send out an IP address that is generic, the authoritative name server uses information about the origin of the request to determine what IP address to return from its internal IP address to server mapping.

The end goal will be a reduction in network latency. It also means less bandwidth consumption in the process of delivering the content to the user. CDNs decide which server will be closest to the user because they are able to resolve the IP address that a hostname maps to at the point of DNS resolution. This is a really quick and clean strategy for delivering content to users.

I found this intriguing while doing my research on how CDNs are able to find the closest servers to an origin request. Originally, I thought CDNs were just using load balancers. It was brain-racking trying to understand how this all works under the hood. Thankfully, I was able to figure it out and I'm excited to share my learnings with you. If there are any parts of this article that aren't clear or incorrect, please feel free to call my attention to it and I would make amends to it. Thanks for reading.