Web caches are temporary stores of website-related data such as documents, images, and other website media. They are used to reduce network traffic and satisfy network requests more quickly and efficiently. Web caches may be hardware or software implemented solutions.
Fast Access to Temporary Data
Web caches temporarily store data from remote hosts such that local client requests can be satisfied by relying more on local resources than outside network resources. A web cache server sits between client requests and host responses and is designed to limit the number of connection requests passed along to the remote server.
In HTTP connections, a client sends a request to a remote host. Assuming all goes well, the host responds with the requested data which is then transmitted back to the client. Web caches are configured to pass client requests along to hosts, receive the response data from the host server, and then pass that response data back to the requesting client connection.
Benefits of Web Caches
During the process outlined above, the web cache server stores a local copy of the response data into local storage. If that data is then requested again by another local client connection, the web cache server can respond to the client directly, rather than having to forward the request to the remote server.
Web caches offer two primary benefits:
- Reduction in time for client requests through local data storage;
- Reduction in traffic on access links to the Internet
As an example, consider a local ISP having configured a web cache server to service their local residential clients. One customer could use their Internet browser to send an HTTP GET request to pbs.org
to see if there are any new articles or media posted.
The GET request would first pass through the ISP’s web cache server and check to see if a copy of that webpage exists in local storage. After determining that no such copy exists, the ISP’s cache server would then forward the request to the pbs.org
server.
Local Storage
Upon receipt of the HTTP GET request from the local ISP’s web cache server, the pbs.org
server would then send response data back to the ISP cache. Upon receipt, the ISP’s cache server will store a local copy of that data and also respond to the original customer’s request with that data.
Now let’s consider the scenario in which another customer from a different home sends an HTTP GET request to the pbs.org
server to check for the latest articles and media. That request will also be directed to the ISP’s web cache server.
Reduced Network Traffic
However, this time the ISP’s cache server will already have the response data from the last time it connected to the pbs.org
server. In this scenario, the ISP cache server will respond directly to the customer request with the correct response data—often cutting down on customer-perceived network delays significantly.
Cache Expiration
The above example illustrates how web caches can help reduce network delay time and also reduce the resource consumption of access network links. In that scenario, one very important consideration was omitted; what happens when the pbs.org host updates its data?
How will the web cache know there may be new articles and media since it was last checked? Finally, how will the web cache ensure that customers requesting remote data will be returned the most up-to-date version available? Fortunately, HTTP protocol has a solution for this: the If-Modified-Since
header field (RFC7232.)
Consider the following sequence of events in the context of web caching:
- An ISP’s customer uses their web browser to send an HTTP GET request for
pbs.org
. - That request is first sent to the ISP’s web cache server, which contains a copy of that website from a previous request. During that previous request, the pbs.org server responded with a
Last-Modified
header field with the value:Sun, 29 Nov 2020 06:00:00 GMT
. - The web cache then sends a conditional GET request to the pbs.org server with the
If-Modified-Since
header field, containing the valueSun, 29 Nov 2020 06:00:00 GMT
- If the requested page has not been modified since the last requested data, the pbs.org server will respond with
HTTP 304 Not-Modified
indicating to the cache that its local copy is still up-to-date. Note: theHTTP 304
response includes an empty body.
Note: The connection between the customer and the ISP’s cache server remains active while the ISP cache server communicates with the pbs.org
server.
In this situation, a web cache server avoids having to request an entire webpage be transmitted. The ISP customer is given little obvious feedback on the details of this process and simply experiences a faster-than-cache-miss loading of the pbs.org
website on their browser.
Final Thoughts
Web cache servers aren’t unlike other cache servers—they’re simply catered to the demands of networked traffic and common HTTP requests. These services and data management systems help reduce network traffic resulting in faster responses with fewer network requests overall. This results in faster page-load speeds and fewer timeouts for temporarily unavailable resources.