Computer networks use a variety of data transmission protocols. Popular protocols like Transmission Control Protocol (TCP) sit atop the transport layer and offer reliable services while still using fundamentally unreliable sub-layers. A core aspect of how these reliability services are provided is by the transport layer assessing network traffic conditions and acting accordingly.
When considering congestion control measures it’s first essential to determine whether network congestion feedback is available from the network layer. For example, TCP is an end-to-end transmission protocol that receives no feedback from the network layer regarding network congestion. This means TCP must infer network conditions.
Note: Congestion Control is not the same thing as Flow Control.
Network-Assisted Congestion Control
Network-assisted congestion control is can be provided directly from hardware like routers. Protocols deploying NACC include Asynchronous Transfer Mode (ATM) and Available Bit-Rate (ABR) services. These services offer network condition feedback in one of the following ways:
Packet Marking – routers set a bit in packet headers that are received by network end systems that can then make holistic judgments on network conditions.
Choke Packets – routers offer direct feedback to a previous network link indicating link-level congestion reporting. This method takes at least 1RTT for detection.
End-to-End Congestion Control
Network congestion must be inferred by end systems through observing events like duplicate ACKs and timeouts. When these events are perceived, protocols like TCP take actions to reduce send rates dynamically until more favorable feedback is detected.
TCP is a prime example of a protocol that makes use of End-to-End congestion control. A sender has no direct communication with the network layer with regards to changing network conditions. Packet loss, bit corruption, and network delays all have to be inferred from end-system provided data.
TCP Congestion Control
TCPs congestion control measures are a great illustration of how two network end systems can dynamically change their communication based on network conditions. TCP’s congestion control is constructed to address 3 primary questions:
- How is transmission rate-limited achieved?
- How is network congestion perceived?
- What algorithm(s) should control the sender’s transmission rate?
To help address these questions, TCP implements a congestionWindow
variable. This is a dynamic variable that provides a value reflective of changing network congestion. TCP can use this variable to calculate, and adjust accordingly, the optimal transmission rate.
Congestion Perception
TCP perceives congestion as either the receipt of 3 duplicate ACKs or a number of timeout events beyond a specific threshold. Each of these events indicates that packets are not arriving at their destination. These type of events cause a TCP sender to decrease the congestionWindow
size, thus reducing its transmission rate.
When a sender receives ACKs of previously un-ACKd packets, this is an indication that network conditions are favorable. In such cases, the TCP sender increases the congestionWindow
slightly.
TCP also perceives congestion through a method known as bandwidth probing. Basically, a TCP sender increases its congestionWindow
variable continually while receiving ACKs. When a timeout event or packet loss occurs, TCP then signals to start reducing the congestionWindow
size.
TCP’s Congestion Control Algorithm
The above actions of perceiving, and acting on, signals of network congestion demonstrate the beginnings of a system of congestion control. Once congestion can be detected, it can be addressed. Formal guidance for a TCP congestion control algorithm was described in the 2009 RFC5681 titled Congestion Control which outlined the following four components:
- slow start
- congestion avoidance
- fast recovery (includes fast retransmit)
Slow Start
TCP’s slow start algorithm must be used by a sender to control the transmission rate. This algorithm requires the sender side variable congestionWindow (cwnd)
and receiver side variable receiverWindow (rwnd)
. Making use of each of these; TCP can effectively determine maximum transmission bandwidth, detect changes, and act accordingly.
Slow start is characterized by the TCP sender creating a cwnd
of initial size and incrementally increasing it until packet loss or timeouts occur. Slow start selects an initial cwnd
size of 1 x maximum segment size (MSS). When an ACK is received, the sender doubles the transmitted packet to 2 x MSS. When these ACKs are received, the sender then doubles each packet as such 2 x 2 MSS = 4 MSS. The next round; 2 x 2 x 2 MSS = 8 MSS.
The slow start algorithm is meant to quickly find the maximum network transmission rate. After initial startup values, the cwnd
value increases exponentially. The value at which packet loss or times out were experienced gets saved as another variable named ssthresh
(slow-start threshold.)
The ssthresh
gets a value of cwnd/2 when the cwnd
experience timeouts or packet loss. Once this variable is set, slow start begins again with a 1 MSS value in the cwnd
. When the value of cwnd
reaches the ssthresh
, TCP exits Slow Start and transitions into congestion avoidance.
Slow start can also end if it receives 3 duplicate ACKs in a row. In this scenario, the sender perceives packet loss and transitions into fast retransmit mode.
Congestion Avoidance
Upon entering Congestion Avoidance
mode, the sender’s cwnd
variable is equal to the ssthresh
which is, roughly, half the value at which congestion was detected. Rather than the doubling of cwnd
, TCP now transitions to increasing the cwnd
by a single MSS.
At some point, this approach will still hit a threshold at which network congestion is perceived. Again, the ssthresh
value is set to 1/2 the value of the cwnd
at which congestion was experienced, but only when timeouts or packet loss is detected.
When 3 duplicate ACKs are received, congestion avoidance sets the ssthresh
to 1/2 the cwnd
value, but also adds 3 x MSS to the value. This is seen as reflecting the perception that the network is less likely the culprit of failure, considering the receiver is still ACK-ing packets. In this condition, the fast-recovery state is entered.
Fast Recovery
Fast recovery increases the cwnd
value by 1 MSS for every duplicate ACK received for the segment that caused entry into this state. When the ACK for the next segment finally arrives, the cwnd
is decremented and the congestion control state is once again entered.
Fast Re-Transmit
Fast recovery is an “optional” part of TCP’s congestion control system. Per RFC5681, fast retransmit undergoes three stages, described below:
- The fast retransmit algorithm uses the arrival of 3 duplicate ACKs … as an indication that a segment has been lost.
- After receiving 3 duplicate ACKs, TCP performs a retransmission of what appears to be the missing segment, without waiting for the retransmission timer to expire
- After the fast retransmit algorithm sends what appears to be the missing segment, the “fast recovery” algorithm governs the transmission of new data until a non-duplicate ACK arrives.
Note: Fast Recovery is recommended but not required. when implemented, it is considered part of the Fast Recovery state.
Discussion
Congestion avoidance protocols are essential for supporting TCP’s reliable data transmission services. A combination of cwnd
and rwnd
variables, ACKs, and timeouts allow TCP to accurately perceive shifts in network conditions and act accordingly. This works to avoid receiver buffer overflow and also to ensure TCP utilizes the maximum available bandwidth a network offers at a particular time. The TCP congestion Control Algorithm attacks this in several ways, each enacted by, and resulting in, different levels of severity with respect to transmission rates.