Figure 3.4-1 illustrates the framework for our study of reliable data transfer. The service abstraction provided to the upper layer entities is that of a reliable channel through which data can be transferred. With a reliable channel, no transferred data bits are corrupted (flipped from 0 to 1, or vice versa) or lost, and all are delivered in the order in which they were sent. This is precisely the service model offered by TCP to the Internet applications that invoke it.
It is the responsibility of a reliable data transfer protocol to implement this service abstraction. This task is made difficult by the fact that layer below the reliable data transfer protocol may be unreliable. For example, TCP is a reliable data transfer protocol that is implemented on top of an unreliable (IP) end-end network layer. More generally, the layer beneath the two reliably-communicating endpoints might consist of a single physical link (e.g., as in the case of a link-level data transfer protocol) or a global internetwork (e.g., as in the case of a transport-level protocol). For our purposes, however, we can view this lower layer simply as an unreliable point-to-point channel.
In this section, we will incrementally develop the sender and receiver sides of a reliable data transfer protocol, considering increasingly complex models of the underlying channel. Figure 3.4-1(b) illustrates the interfaces for our data transfer protocol. The sending side of the data transfer protocol will be invoked from above by a call to rdt_send(). It will be passed the data to be delivered to the upper-layer at the receiving side. (Here rdt stands for ``reliable data transfer'' protocol and _send indicates that the sending side of rdt is being called. The first step in developing any protocol is to choose a good name!) On the receiving side, rdt_rcv() will be called when a packet arrives from the receiving side of the channel. When the rdt protocol wants to deliver data to the upper-layer, it will do so by calling deliver_data(). In the following we use the terminology "packet" rather than "segment" for the protocol data unit.. Because the theory developed in this section applies to computer networks in general, and not just to the Internet transport layer, the generic term "packet" is perhaps more appropriate here.
In this section we consider only the case of unidirectional data
transfer, i.e., data transfer from the sending to receiving side. The case
of reliable bidirectional (i.e., full duplex) data transfer is conceptually
no more difficult but considerably more tedious. Although we consider only
unidirectional data transfer, it is important to note that the sending
and receiving sides of our protocol will nonetheless need to transmit packets
in both directions, as indicated in Figure 3.4-1. We will see shortly
that in addition to exchanging packets containing the data to be transferred,
the sending and receiving sides of rdt will also need to exchange
control packets back and forth. Both the send and receive sides of rdt
send packets to the other side by a call to udt_send() (unreliable
data
transfer).
The sending side of rdt simply accepts data from the upper-layer via the rdt_send(data)event, puts the data into a packet (via the action make_pkt(packet,data)) and sends the packet into the channel. In practice, the rdt_send(data)event would result from a procedure call (e.g., to rdt_send()) by the upper layer application.
On the receiving side, rdt receives a packet from the underlying channel via the rdt_rcv(packet) event, removes the data from the packet (via the action extract(packet,data)) and passes the data up to the upper-layer. In practice, the rdt_rcv(packet)event would result from a procedure call (e.g., to rdt_rcv()) from the lower layer protocol.
In this simple protocol, there is no difference between a unit of data and a packet. Also, all packet flow is from the sender to receiver - with a perfectly reliable channel there is no need for the receiver side to provide any feedback to the sender since nothing can go wrong!
Before developing a protocol for reliably communicating over such a channel, first consider how people might deal with such a situation. Consider how you yourself might dictate a long message over the phone. In a typical scenario, the message taker might say ``OK'' after each sentence has been heard, understood, and recorded. If the message taker hears a garbled sentence, you're asked to repeat the garbled sentence. This message dictation protocol uses both positive acknowledgements (``OK'') and negative acknowledgements (``Please repeat that''). These control messages allow the receiver to let the sender know what has been received correctly, and what has been received in error and thus requires repeating. In a computer network setting, reliable data transfer protocols based on such retransmission are known ARQ (Automatic Repeat reQuest) protocols.
Fundamentally, two additional protocol capabilities are required in ARQ protocols to handle the presence of bit errors:
The send side of rdt2.0 has two states. In one state, the send-side protocol is waiting for data to be passed down from the upper layer. In the other state, the sender protocol is waiting for an ACK or a NAK packet from the receiver. If an ACK packet is received (the notation rdt_rcv(rcvpkt) && isACK(rcvpkt) in Figure 3.4-3 corresponds to this event), the sender knows the most recently transmitted packet has been received correctly and thus the protocol returns to the state of waiting for data from the upper layer. If a NAK is received, the protocol retransmits the last packet and waits for an ACK or NAK to be returned by the receiver in response to the retransmitted data packet. It is important to note that when the receiver is in the wait-for-ACK-or-NAK state, it can not get more data from the upper layer; that will only happen after the sender receives an ACK and leaves this state. Thus, the sender will not send a new piece of data until it is sure that the receiver has correctly received the current packet. Because of this behavior, protocols such as rdt2.0 are known as stop-and-wait protocols.
The receiver-side FSM for rdt2.0 still has a single state. On packet arrival, the receiver replies with either an ACK or a NAK, depending on whether or not the received packet is corrupted. In Figure 3.4-3, the notation rdt_rcv(rcvpkt) && corrupt(rcvpkt) corresponds to the event where a packet is received and is found to be in error.
Protocol rdt2.0 may look as if it works but unfortunately has a fatal flaw. In particular, we haven't accounted for the possibility that the ACK or NAK packet could be corrupted! (Before proceeding on, you should think about how this problem may be fixed.) Unfortunately, our slight oversight is not as innocuous as it may seem. Minimally, we will need to add checksum bits to ACK/NAK packets in order to detect such errors. The more difficult question is how the protocol should recover from errors in ACK or NAK packets. The difficulty here is that if an ACK or NAK is corrupted, the sender has no way of knowing whether or not the receiver has correctly received the last piece of transmitted data.
Consider three possibilities for handling corrupted ACKs or NAKs:
Protocol rdt2.1 uses both positive and negative acknowledgements from the receiver to the sender. A negative acknowledgement is sent whenever a corrupted packet, or an out of order packet, is received. We can accomplish the same effect as a NAK if instead of sending a NAK, we instead send an ACK for the last correctly received packet. A sender that receives two ACKs for the same packet (i.e., receives duplicate ACKs) knows that the recevier did not correctly receive the packet following the packet that is being ACKed twice. Many TCP implementations use the receipt of so-called "triple duplicate ACKs" (three ACK packets all ACK'ing the same packet) to trigger a retransmission at the sender. Our NAK-free reliable data transfer protocol for a channel with bit errors is rdt2.2, shown in Figure 3.4-6 and 3.4-7.
There are many possible approaches towards dealing with packet loss (several more of which are explored in the exercises at the end of the chapter). Here, we'll put the burden of detecting and recovering from lost packets on the sender. Suppose that the sender transmits a data packet and either that packet, or the receiver's ACK of that packet, gets lost. In either case, no reply is forthcoming at the sender from the receiver. If the sender is willing to wait long enough so that it is certain that a packet has been lost, it can simply retransmit the data packet. You should convince yourself that this protocol does indeed work.
But how long must the sender wait to be certain that something has been lost? It must clearly wait at least as long as a round trip delay between the sender and receiver (which may include buffering at intermediate routers or gateways) plus whatever amount of time is needed to process a packet at the receiver. In many networks, this worst case maximum delay is very difficult to even estimate, much less know with certainty. Moreover, the protocol should ideally recover from packet loss as soon as possible; waiting for a worst case delay could mean a long wait until error recovery is initiated. The approach thus adopted in practice is for the sender to ``judiciously'' chose a time value such that packet loss is likely, although not guaranteed, to have happened. If an ACK is not received within this time, the packet is retransmitted. Note that if a packet experiences a particularly large delay, the sender may retransmit the packet even though neither the data packet nor its ACK have been lost. This introduces the possibility of duplicate data packets in the sender-to-receiver channel. Happily, protocol rdt2.2 already has enough functionality (i.e., sequence numbers) to handle the case of duplicate packets.
From the sender's viewpoint, retransmission is a panacea. The sender does not know whether a data packet was lost, an ACK was lost, or if the packet or ACK was simply overly delayed. In all cases, the action is the same: retransmit. In order to implement a time-based retransmission mechanism, a countdown timer will be needed that can interrupt the sender after a given amount of timer has expired. The sender will thus need to be able to (i) start the timer each time a packet (either a first time packet, or a retransmission) is sent, (ii) respond to a timer interrupt (taking appropriate actions), and (iii) stop the timer.
The existence of sender-generated duplicate packets and packet (data, ACK) loss also complicates the sender's processing of any ACK packet it receives. If an ACK is received, how is the sender to know if it was sent by the receiver in response to its (sender's) own most recently transmitted packet, or is a delayed ACK sent in response to an earlier transmission of a different data packet? The solution to this dilemma is to augment the ACK packet with an acknowledgement field. When the receiver generates an ACK, it will copy the sequence number of the data packet being ACK'ed into this acknowledgement field. By examining the contents of the acknowledgment field, the sender can determine the sequence number of the packet being positively acknowledged.
Figure 3.4-8 shows the sender FSM for rdt3.0, a protocol that reliably transfers data over a channel that can corrupt or lose packets. Figure 3.4-9 shows how the protocol operates with no lost or delayed packets, and how it handles lost data packets. In Figure 3.4-9, time moves forward from the top of the diagram towards the bottom of the diagram; note that a receive time for a packet is neccessarily later than the send time for a packet as a result of transmisison and propagation delays. In Figures 3.4-9(b)-(d), the send-side brackets indicate the times at which a timer is set and later times out. Several of the more subtle aspects of this protocol are explored in the exercises at the end of this chapter. Because packet sequence numbers alternate between 0 and 1, protocol rdt3.0 is sometimes known as the alternating bit protocol.
We have now assembled the key elements of a data transfer protocol.
Checksums, sequence numbers, timers, and positive and negative acknowledgement
packets each play a crucial and necessary role in the operation of the
protocol. We now have a working reliable data transfer protocol!
To appreciate the performance impact of this stop-and-wait behavior, consider an idealized case of two end hosts, one located on the west coast of the United States and the other located on the east cost. The speed-of-light propagation delay, Tprop, between these two end systems is approximately 15 milliseconds. Suppose that they are connected by a channel with a capacity, C, of 1 Gigabit (10**9 bits) per second. With a packet size, SP, of 1K bytes per packet including both header fields and data, the time needed to actually transmit the packet into the 1Gbps link is
With our stop and wait protocol, if the sender begins sending the packet at t = 0, then at t = 8 microsecs the last bit enters the channel at the sender side. The packet then makes its 15 msec cross country journey, as depicted in Figure 3.4-10a, with the last bit of the packet emerging at the receiver at t = 15.008 msec. Assuming for simplicity that ACK packets are the same size as data packets and that the receiver can begin sending an ACK packet as soon as the last bit of a data packet is received, the last bit of the ACK packet emerges back at the receiver at t = 30.016 msec. Thus, in 30.016 msec, the sender was only busy (sending or receiving) for .016 msec. If we define the utilization of the sender (or the channel) as the fraction of time the sender is actually busy sending bits into the channel, we have a rather dismal sender utilization, Usender, of
That is, the sender was busy only 1.5 hundredths of one percent of the time. Viewed another way, the sender was only able to send 1K bytes in 30.016 milliseconds, an effective throughput of only 33KB/sec - even thought a 1Gigabit per second link was available! Imagine the unhappy network manager who just paid a fortune for a gigabit capacity link but manages to get a throughput of only 33KB! This is a graphic example of how network protocols can limit the capabilities provided by the underlying network hardware. Also, we have neglected lower layer protocol processing times at the sender and receiver, as well as the processing and queueing delays that would occur at any intermediate routers between the sender and receiver. Including these effects would only serve to further increase the delay and further accentuate the poor performance.
The solution to this particular performance problem is a simple one: rather than operate in a stop-and-wait manner, the sender is allowed to send multiple packets without waiting for acknowledgements, as shown in Figure 3.4-10(b). Since the many in-transit sender-to-receiver packets can be visualized as filling a pipeline, this technique is known as pipelining. Pipelining has several consequences for reliable data transfer protocols:
In a Go-Back-N (GBN) protocol, the sender is allowed to transmit multiple packets (when available) without waiting for an acknowledgment, but is constrained to have no more than some maximum allowable number, N, of unacknowledged packets in the pipeline. Figure 3.4-11 shows the sender's view of the range of sequence numbers in a GBN protocol. If we definebase to be the sequence number of the oldest unacknowledged packet and nextseqnum to be the smallest unused sequence number (i.e., the sequence number of the next packet to be sent), then four intervals in the range of sequence numbers can be identified. Sequence numbers in the interval [0,base-1] correspond to packets that have already been transmitted and acknowledged. The interval [base,nextseqnum-1] corresponds to packets that have been sent but not yet acknowledged. Sequence numbers in the interval [nextseqnum,base+N-1] can be used for packets that can be sent immediately, should data arrive from the upper layer. Finally, sequence numbers greater than or equal to base+N can not be used until an unacknowledged packet currently in the pipeline has been acknowledged.
As suggested by Figure 3.4-11, the range of permissible sequence numbers for transmitted but not-yet-acknowledged packets can be viewed as a ``window'' of size N over the range of sequence numbers. As the protocol operates, this window slides forward over the sequence number space. For this reason, N is often referred to as the window size and the GBN protocol itself as a sliding window protocol. You might be wondering why even limit the number of outstandstanding, unacknowledged packet to a value of N in the first place. Why not allow an unlimited number of such packets? We will see in Section 3.5 that flow conontrol is one reason to impose a limt on the sender. We'll examine another reason to do so in section 3.7, when we study TCP congestion control.
In practice, a packet's sequence number is carried in a fixed length field in the packet header. If k is the number of bits in the packet sequence number field, the range of sequence numbers is thus [0,2k-1]. With a finite range of sequence numbers, all arithmetic involving sequence numbers must then be done using modulo 2k arithmetic. (That is, the sequence number space can be thought of as a ring of size 2k, where sequence number 2k-1 is immediately followed by sequence number 0.) Recall that rtd3.0 had a 1-bit sequence number and a range of sequence numbers of [0,1].Several of the problems at the end of this chapter explore consequences of a finite range of sequence numbers. We will see in Section 3.5 that TCP has a 32-bit sequence number field, where TCP sequence numbers count bytes in the byte stream rather than packets.
Figures 3.4-12 and 3.4-13 give an extended-FSM description of the sender and receiver sides of an ACK-based, NAK-free, GBN protocol. We refer to this FSM description as an extended-FSM since we have added variables (similar to programming language variables) for base and nextseqnum, and also added operations on these variables and conditional actions involving these variables. Note that the extended-FSM specification is now beginning to look somewhat like a programming language specification. [Bochman 84] provides an excellent survey of additional extensions to FSM techniques as well as other programming language-based techniques for specifying protocols.
The GBN sender must respond to three types of events:
In our GBN protocol, the receiver discards out-of-order packets. While it may seem silly and wasteful to discard a correctly received (but out-of-order) packet, there is some justification for doing so. Recall that the receiver must deliver data, in-order, to the upper layer. Suppose now that packet n is expected, but packet n+1 arrives. Since data must be delivered in order, the receiver could buffer (save) packet n+1 and then deliver this packet to the upper layer after it had later received and delivered packet n. However, if packet n is lost, both it and packet n+1 will eventually be retransmitted as a result of the GBN retransmission rule at the sender. Thus, the receiver can simply discard packet n+1. The advantage of this approach is the simplicity of receiver buffering - the receiver need not buffer any out-of-order packets. Thus, while the sender must maintain the upper and lower bounds of its window and the position of nextseqnum within this window, the only piece of information the receiver need maintain is the sequence number of the next in-order packet. This value is held in the variable expectedseqnum, shown in the receiver FSM in Figure 3.4-13. Of course, the disadvantage of throwing away a correctly received packet is that the subsequent retransmission of that packet might be lost or garbled and thus even more retransmissions would be required.
Figure 3.4-14 shows the operation of the GBN protocol for the case of a window size of four packets. Because of this window size limitation, the sender sends packets 0 through 3 but then must wait for one or more of these packets to be acknowledged before proceeding. As each successive ACK (e.g., ACK0 and ACK1) is received, the window slides forwards and the sender can transmit one new packet (pkt4 and pkt5, respectively). On the receiver side, packet 2 is lost and thus packets 3, 4, and 5 are found to be out-of-order and are discarded.
Before closing our discussion of GBN, it is worth noting that an implementation of this protocol in a protocol stack would likely be structured similar to that of the extended FSM in Figure 3.4-12. The implementation would also likely be in the form of various procedures that implement the actions to be taken in response to the various events that can occur. In such event-based programming, the various procedures are called (invoked) either by other procedures in the protocol stack, or as the result of an interrupt. In the sender, these events would be (i) a call from the upper layer entity to invoke rdt_send(), (ii) a timer interrupt, and (iii) a call from the lower layer to invoke rdt_rcv() when a packet arrives. The programming exercises at the end of this chapter will give you a chance to actually implement these routines in a simulated, but realistic, network setting.
We note here that the GBN protocol incorporates almost all of the techniques that we will enounter when we study the reliable data transfer components of TCP in Section 3.5: the use of sequence numbers, cumulative acknowledgements, checksums, and a time-out/retransmit operation. Indeed, TCP is often referred to as a GBN style of protocol. There are, however, some differences. Many TCP implementations will buffer correctly-received but out-of-order segments [Stevens 1994]. A proposed modification to TCP, the so-called selective acknowledgment [RFC 2018], will also allow a TCP receiver to selectively acknowledge a single out-of-order packet rather than cumulatively acknowledge the last correctly received packet. The notion of a selective acknowledgment is at the heart of the second broad class of pipelined protocols: the so called selective repeat protocols.
As the name suggests, Selective Repeat (SR) protocols avoid unnecessary retransmissions by having the sender retransmit only those packets that it suspects were received in error (i.e., were lost or corrupted) at the receiver. This individual, as-needed, retransmission will require that the receiver individually acknowledge correctly-received packets. A window size of N will again be used to limit the number of outstanding, unacknowledged packets in the pipeline. However, unlike GBN, the sender will have already received ACKs for some of the packets in the window. Figure 3.4-15 shows the SR sender's view of the sequence number space. Figure 3.4-16 details the various actions taken by the SR sender.
The SR receiver will acknowledge a correctly received packet whether
or not it is in-order. Out-of-order packets are buffered until any missing
packets (i.e., packets with lower sequence numbers) are received, at which
point a batch of packets can be delivered in-order to the upper layer.
Figure figsrreceiver itemizes the the various actions taken by the SR receiver.
Figure 3.4-18 shows an example of SR operation in the presence of lost
packets. Note that in Figure 3.4-18, the receiver initially buffers packets
3 and 4, and delivers them together with packet 2 to the upper layer when
packet 2 is finally received.
Figure 3.4-15: SR sender and receiver views of sequence number space
Figure 3.4-19: SR receiver dilemma with too large windows: a new
packet or a retransmission?
The lack of synchronization between sender and receiver windows has important consequences when we are faced with the reality of a finite range of sequence numbers. Consider what could happen, for example, with a finite range of four packet sequence numbers, 0,1,2,3 and a window size of three. Suppose packets 0 through 2 are transmitted and correctly received and acknowledged at the receiver. At this point, the receiver's window is over the fourth, fifth and sixth packets, which have sequence numbers 3, 0, and 1, respectively. Now consider two scenarios. In the first scenario, shown in Figure 3.4-19(a), the ACKs for the first three packets are lost and the sender retransmits these packets. The receiver thus next receives a packet with sequence number 0 - a copy of the first packet sent.
In the second scenario, shown in Figure 3.4-19(b), the ACKs for the first three packets are all delivered correctly. The sender thus moves its window forward and sends the fourth, fifth and sixth packets, with sequence numbers 3, 0, 1, respectively. The packet with sequence number 3 is lost, but the packet with sequence number 0 arrives - a packet containing new data.
Now consider the receiver's viewpoint in Figure 3.4-19, which has a figurative curtain between the sender and the receiver, since the receiver can not ``see'' the actions taken by the sender. All the receiver observes is the sequence of messages it receives from the channel and sends into the channel. As far as it is concerned, the two scenarios in Figure 3.4-19 are identical. There is no way of distinguishing the retransmission of the first packet from an original transmission of the fifth packet. Clearly, a window size that is one smaller than the size of the sequence number space won't work. But how small must the window size be? A problem at the end of the chapter asks you to show that the window size must be less than or equal to half the size of the sequence number space.
Let us conclude our discussion of reliable data transfer protocols by considering one remaining assumption in our underlying channel model. Recall that we have assumed that packets can not be re-ordered within the channel between the sender and rceiver. This is generally a reasonable assumption when the sender and receiver are connected by a single physical wire. However, when the ``channel'' connecting the two is a network, packet reordering can occur. One manifestation of packet ordering is that old copies of a packet with a sequence or acknowledgement number of x can appear, even though neither the sender's nor the receiver's window contains x. With packet reordering, the channel can be thought of as essentially buffering packets and spontaneously emitting these packets at any point in the future. Because sequence numbers may be reused, some care must be taken to guard against such duplicate packets. The approach taken in practice is to insure that a sequence number is not reused until the sender is relatively ``sure'' than any previously sent packets with sequence number x are no longer in the network. This is done by assuming that a packet can not ``live'' in the network for longer than some fixed maximum amount of time. A maximum packet lifetime of approximately three minutes is assumed in the TCP extensions for high-speed networks [RFC 1323]. Sunshine [Sunshine 1978] describes a method for using sequence numbers such that reordering problems can be completely avoided.
References
[Bochman 84] G.V. Bochmann and C.A.
Sunshine, "Formal methods in communication protocol design", IEEE Transactions
on Communicaitons, Vol. COM-28, No. 4, (April 1980), pp 624-631.
[RFC 1323] V.
Jacobson, S. Braden, D. Borman, "TCP Extensions for High Performance,"
RFC
1323, May 1992.
[RFC 2018] M. Mathis, J.
Mahdavi, S. Floyd, A. Romanow, "TCP Selective Acknowledgment Options,"
RFC 2018, October 1996
[Stevens 1994] W.R. Stevens, TCP/IP
Illustrated, Volume 1: The Protocols. Addison-Wesley, Reading, MA, 1994.
[Sunshine 1978] C. Sunshine and
Y.K. Dalal, "Connection Management in Transport Protocols", Computer
Networks, Amsterdam, The Netherlands: North-Holland", 1978.
Copyright 1999 Keith W. Ross and James F. Kurose, All Rights Reserved.