Web caches are temporary stores of website-related data such as documents, images, and other website media. They are used to reduce network traffic and satisfy network requests more quickly and efficiently. Web caches may be hardware or software implemented solutions.
Web caches temporarily store data from remote hosts such that local client requests can be satisfied by relying more on local resources than outside network resources. A web cache server sits between client requests and host responses and is designed to limit the number of connection requests passed along to the remote server.
In HTTP connections, a client sends a request to a remote host. Assuming all goes well, the host responds with the requested data which is then transmitted back to the client. Web caches are configured to pass client requests along to hosts, receive the response data from the host server, and then pass that response data back to the requesting client connection.
Benefits of Web Caches
During the process outline above, the web cache server stores a local copy of the response data into local storage. If that data is then requested again by another local client connection, the web cache server can respond to the client directly, rather than having to forward the request to the remote server.
Web caches offer two primary benefits:
- Reduction in time for client requests through local data storage;
- Reduction in traffic on access links to the Internet
As an example, consider a local ISP having configured a web cache server to service their local residential clients. One customer could use their Internet browser to send an HTTP GET request to pbs.org to see if there are any new articles or media posted.
The GET request would first pass through the ISP’s web cache server and check to see if a copy of that webpage exists in local storage. After determining that no such copy exists, the ISP’s cache server would then forward the request to the pbs.org server.
Upon receipt of the HTTP GET request from the local ISP’s web cache server, the
pbs.org server would then send response data back to the ISP cache. Upon receipt, the ISP’s cache server will store a local copy of that data and also respond to the original customer’s request with that data.
Now let’s consider the scenario in which another customer from a different home sends an HTTP GET request to the
pbs.org server to check for the latest articles and media. That request will also be directed to the ISP’s web cache server.
Reduced Network Traffic
However, this time the ISP’s cache server will already have the response data from the last time it connected to the
pbs.org server. In this scenario, the ISP cache server will respond directly to the customer request with the correct response data—often cutting down on customer-perceived network delays significantly.
The above example illustrates how web caches can help reduce network delay time and also reduce the resource consumption of access network links. In that scenario, one very important consideration was omitted; what happens when the pbs.org host updates its data?
How will the web cache know there may be new articles and media since it last checked? Finally, how will the web cache ensure that customers requesting remote data will be returned the most up-to-date version available? Fortunately, HTTP protocol has a solution for this: the
If-Modified-Since header field (RFC7232.)
Consider the following sequence of events in the context of web caching:
- A ISP’s customer uses their web browser to send an HTTP GET request for
- That request is first sent to the ISP’s web cache server, which contains a copy of that website from a previous request. During that previous request, the pbs.org server responded with a
Last-Modifiedheader field with the value:
Sun, 29 Nov 2020 06:00:00 GMT.
- The web cache then sends a conditional GET request to the pbs.org server with the
If-Modified-Sinceheader field, containing the value
Sun, 29 Nov 2020 06:00:00 GMT
- If the requested page has not been modified since the last requested data, the pbs.org server will respond with
HTTP 304 Not-Modifiedindicating to the cache that its local copy is still up-to-date. Note: the
HTTP 304response includes an empty body.
[alert type=yellow ]Note: The connection between the customer and the ISP’s cache server remains active while the ISP cache server communicates with the pbs.org server.[/alert]
In this situation, a web cache server avoids having to request an entire webpage be transmitted. The ISP customer is given little obvious feedback on the details of this process and simply experiences a faster-than-cache-miss loading of the
pbs.org website on their browser.