2.5 DNS - The Internet's Directory Service
We human beings can be identified in many ways. For example, we can be
identified by the names that appear on our birth certificates. We can be
identified by our social security numbers. We can be identified by our
driver's license numbers. Although each of these identifiers can be used
to identify people, within a given context, one identifier may be more
appropriate than an other. For example, the computers at the IRS (the infamous
tax collecting agency in the US) prefer to use fixed-length social
security numbers rather than birth-certificate names. On the other hand,
ordinary people prefer the more mnemonic birth-certificate names rather
than social security numbers. (Indeed, can you imagine saying, "Hi. My
name is 132-67-9875. Please meet my husband, 178-87-1146.")
Just as humans can be identified in many ways, so too can Internet hosts.
One identifier for a host is its hostname. Hostnames -- such as
cnn.com, www.yahoo.com, gaia.cs.umass.edu and surf.eurecom.fr -- are mnemonic
and are therefore appreciated by humans. However, hostnames provide little,
if any, information about the location within the Internet of the host.
(A hostname such as surf.eurecom.fr, which ends with the country code .fr,
tells us that the host is in France, but doesn't say much more.) Furthermore,
because hostnames can consist of variable-length alpha-numeric characters,
they would be difficult to process by routers. For these reasons, hosts
are also identified by so-called IP addresses. We will discuss IP
addresses in some detail in Chapter 4, but it is useful to say a few brief
words about them now. An IP address consists of four bytes and has a rigid
hierarchical structure. An IP address looks like 121.7.106.83, where each
period separates one of the bytes expressed in decimal notation from 0
to 127. An IP address is hierarchical because as we scan the address from
left to right, we obtain more and more specific information about where
(i.e., within which network, in the network of networks) the host is located
in the Internet. (Just as when we scan a postal address from bottom to
top we obtain more and more specific information about where the residence
is located). An IP address is included in the header of each IP datagram,
and Internet routers use this IP address to route s datagram towards its
destination.
2.5.1 Services Provided by DNS
We have just seen that there are two ways to identify a host -- a hostname
and an IP address. People prefer the more mnemonic hostname identifier,
while routers prefer fixed-length, hierarchically-structured IP addresses.
In order to reconcile these different preferences, we need a directory
service that translates hostnames to IP addresses. This is the main task
of the the Internet's Domain Name System (DNS). The DNS is
(i) a distributed database implemented in a hierarchy of name servers
and
(ii) an application-layer protocol that allows hosts and name servers to
communicate in order to provide the translation service. Name servers
are usually Unix machines running the Berkeley Internet Name Domain (BIND)
software. The DNS protocol runs over UDP and uses port 53. Following this
chapter we provide interactive
links to DNS programs that allow you to translate arbitrary hostnames,
among other things.
DNS is commonly employed by other application-layer protocols -- including
HTTP, SMTP and FTP - to translate user-supplied host names to IP addresses.
As an example, consider what happens when a browser (i.e., an HTTP client),
running on some user's machine, requests the URL www.someschool.edu/index.html.
In order for the user's machine to be able to send an HTTP request message
to the Web server www.someschool.edu, the user's machine must obtain
the IP address of www.someschool.edu. This is done as follows. The same
user machine runs the client-side of the DNS application. The browser extracts
the hostname, www.someschool.edu, from the URL and passes the hostname
to the client-side of the DNS application. As part of a DNS query message,
the DNS client sends a query containing the hostname to a DNS server. The
DNS client eventually receives a reply, which includes the IP address for
the hostname. The browser then opens a TCP connection to the HTTP server
process located at that IP address. All IP datagrams sent to from the client
to server as part of this connection will include this IP address in the
destination address field of the datagrams. In particular, the IP datagram(s)
that encapsulate the HTTP request message use this IP address. We see from
this example that DNS adds an additional delay -- sometimes substantial
-- to the Internet applications that use DNS. Fortunately, as we shall
discuss below, the desired IP address is often cached in a "near by" DNS
name server, which helps to reduce the DNS network traffic as well as the
average DNS delay.
Like HTTP, FTP, and SMTP, the DNS protocol is an application-layer protocol
since (i) it runs between communicating end systems (again using
the client-server paradigm), and (ii) it relies on an underlying
end-to-end transport protocol (i.e., UDP) to transfer DNS messages between
communicating end systems. In another sense, however, the role
of the DNS is quite different from Web, file transfer, and email applications.
Unlike these applications, the DNS is not an application with which a user
directly interacts. Instead, the DNS provides a core Internet function
-- namely, translating hostnames to their underlying IP addresses, for
user applications and other software in the Internet. We noted earlier
in Section 1.2 that much of the "complexity" in the Internet architecture
is located at the "edges" of the network. The DNS, which implements
the critical name-to-address translation process using clients and servers
located at the edge of the network, is yet another example of that
design philosophy.
DNS provides a few other important services in addition to translating
hostnames to IP addresses:
-
Host aliasing: A host with a complicated hostname can have one or
more alias names. For example, a hostname such as relay1.west-coast.enterprise.com
could have, say, two aliases such as enterprise.com and www.enterprise.com.
In this case, the hostname relay1.west-coast.enterprise.com is said to
be canonical hostname. Alias hostnames, when present, are typically
more mnemonic than a canonical hostname. DNS can be invoked by an application
to obtain the canonical hostname for a supplied alias hostname as well
as the IP address of the host.
-
Mail server aliasing: For obvious reasons, it is highly desirable
that email addresses be mnemonic. For example, if Bob has an account with
Hotmail, Bob's email address might be as simple as bob@hotmail.com. However,
the hostname of the Hotmail mail server is more complicated and much less
mnemonic than simply hotmail.com (e.g., the canonical hostname might be
something like relay1.west-coast.hotmail.com). DNS can be invoked by a
mail application to obtain the canonical hostname for a supplied alias
hostname as well as the IP address of the host. In fact, DNS permits a
company's mail server and Web server to have identical (aliased) hostnames;
for example, a company's Web server and mail server can both be called
enterprise.com.
-
Load Distribution: Increasingly, DNS is also being used to perform
load distribution among replicated servers, such as replicated Web servers.
Busy sites, such as cnn.com, are replicated over multiple servers, with
each server running on a different end system, and having a different IP
address. For replicated Web servers, a set of IP addresses is thus associated
with one canonical hostname. The DNS database contains this set of IP addresses.
When clients make a DNS query for a name mapped to a set of addresses,
the server responds with the entire set of IP addresses, but rotates the
ordering of the addresses within each reply. Because a client typically
sends its HTTP request message to the IP address that is listed first in
the set, DNS rotation distributes the traffic among all the replicated
servers. DNS rotation is also used for email so that multiple mail servers
can have the same alias name.
The DNS is specified in [RFC 1034] and [RFC
1035], and updated in several additional RFCs. It is a complex
system, and we only touch upon key aspects of its operation here.
The interested reader is referred to these RFCs and the book [Abitz
1993].
2.5.2 Overview of How DNS Works
We now present a high-level overview of how DNS works. Our discussion
shall focus on the hostname to IP address translation service. From the
client's perspective, the DNS is a black box. The client sends a DNS query
message into the black box, specifying the hostname that needs to be translated
to an IP address. On many Unix-based machines, gethostbyname() is
the library routine that an application calls in order to issue the query
message. In Section 2.7, we shall present a Java program that begins by
issuing a DNS query. After a delay, ranging from milliseconds to tens of
seconds, the client receives a DNS reply message that provides the desired
mapping. Thus, from the client's perspective, DNS is a simple, straightforward
translation service. But in fact, the black box that implements the service
is complex, consisting of large number of name servers distributed around
the globe, as well as an application-layer protocol that specifies how
the name servers and querying hosts communicate.
A simple design for DNS would have one Internet name server that contains
all the mappings. In this centralized design, clients simply direct all
queries to the single name server, and the name server responds directly
to the querying clients. Although the simplicity of this design is attractive,
it is completely inappropriate for today's Internet, with its vast (and
growing) number of hosts. The problems with a centralized design include:
-
A single point of failure. If the name server crashes, so too does
the entire Internet!
-
Traffic volumes. A single name server would have to handle all DNS
queries (for all the HTTP requests, email messages, etc. generated from
millions of hosts)
-
Distant centralized database. A single name server cannot be "close"
to all the querying clients. If we put the single name server in New York
City, then all queries from Australia must travel to the other side of
the globe, perhaps over slow and congested links. This can lead to significant
delays (thereby increasing the "world wide wait" for the Web and other
applications).
-
Maintenance. The single name server would have to keep records for
all Internet hosts. Not only would this centralized database be huge, but
it would have to be updated frequently to account for every new host. There
are also authentication and authorization problems associated with allowing
any user to register a host with the centralized database.
In summary, a centralized database in a single name server simply doesn't
scale. Consequently, the DNS is distributed by design. In fact, the
DNS is a wonderful example of how a distributed database can be implemented
in the Internet.
In order to deal with the issue of scale, the DNS uses a large number
of name servers, organized in a hierarchical fashion and distributed around
the world. No one name server has all of the mappings for all of the hosts
in the Internet. Instead, the mappings are distributed across the name
servers. To a first approximation, there are three types of name servers:
local name servers, root name servers, and authoritative name servers.
These name servers, again to a first approximation, interact with each
other and with the querying host as follows:
-
Local name servers: Each ISP - such as a university, an academic
department, an employee's company or a residential ISP - has a local name
server (also called a default name server). When a host issues a DNS query
message, the message is first sent to the host's local name server.
The IP address of the local name server is typically
configured by hand in a host. (On a Windows 95/98 machine, you can find
the IP address of the local name server used by your PC by opening the
Control Panel, and then selecting "Network", then selecting an installed
TCP/IP component, and then selecting the DNS configuration folder tab.)
The local name server is typically "close" to the client; in the case of
an institutional ISP, it may be on the same LAN as the client host; for
a residential ISP, the name server is typically separated from the
client host by no more than a few routers. If a host requests a translation
for another host that is part of the same local ISP, then the local name
server will be able to immediately provide the the requested IP address.
For example, when the host surf.eurecom.fr requests the IP address for
baie.eurecom.fr, the local name server at Eurecom will be able to provide
the requested IP address without contacting any other name servers.
-
Root name servers: In the Internet there are a dozen or so of "root
name servers," most of which are currently located in North America. A
February 1998 map of the root servers is shown in Figure 2.5-1. When
a local name server cannot immediately satisfy a query from a host (because
it does not have a record for the hostname being requested), the local
name server behaves as a DNS client and queries one of the root name servers.
If the root name server has a record for the hostname, it sends a DNS reply
message to the local name server, and the local name server then sends
a DNS reply to the querying host. But the root name server may not have
a record for the hostname. Instead, the rootname server knows the IP address
of an "authoritative name server" that has the mapping for that particular
hostname.
-
Authoritative name servers: Every host is registered with an authoritative
name server. Typically, the authoritative name server for a host is a name
server in the host's local ISP. (Actually, each host is required to have
at least two authoritative name servers, in case of failures.) By definition,
a name server is authoritative for a host if it always has a DNS record
that translates the host's hostname to that host's IP address. When an
authoritative name server is queried by a root server, the authoritative
name server responds with a DNS reply that contains the requested
mapping. The root server then forwards the mapping to the local name server,
which in turn forwards the mapping to the requesting host. Many name servers
act as both local and and authoritative name servers.
Figure 2.5-1: A February 1998 map of the DNS root servers. Obtained
from the WIA alliance Web site (http://www.wia.org).
Let's take a look at a simple example. Suppose the host surf.eurecom.fr
desires the IP address of gaia.cs.umass.edu. Also suppose that Eurecom's
local name server is called dns.eurecom.fr and that an authoritative name
server for gaia.cs.umass.edu is called dns.umass.edu. As shown in Figure
2.5-2, the host surf.eurecom.fr first sends a DNS query message to its
local name server, dns.eurecom.fr. The query message contains the hostname
to be translated, namely, gaia.cs.umass.edu. The local name server forwards
the query message to a root name server. The root name server forwards
the query message to the name server that is authoritative for all the
hosts in the domain umass.edu, namely, to dns.umass.edu. The authoritative
name server then sends the desired mapping to the querying host, via the
root name server and the local name server. Note that in this example,
in order to obtain the mapping for one hostname, six DNS messages were
sent: three query messages and three reply messages.
Figure 2.5-2: Recursive queries to obtain the mapping for gaia.cs.umass.edu.
Our discussion up to this point has assumed that the root name server
knows the IP address of an authoritative name server for every hostname.
This assumption may be incorrect. For a given hostname, the root name server
may only know the IP address of an intermediate name server that in turn
knows the IP address of an authoritative name server for the hostname.
To illustrate this, consider once again the above example with the host
surf.eurecom.fr querying for the IP address of gaia.cs.umass.edu. Suppose
now that the University of Massachusetts has a name server for the university,
called dns.umass.edu. Also suppose that each of the departments at University
of Massachusetts has its own name server, and that each departmental name
server is authoritative for all the hosts in the department. As shown in
Figure 2.5-3, when the root name server receives a query for a host
with hostname ending with umass.edu it forwards the query to the name server
dns.umass.edu. This name server forwards all queries with hostnames ending
with .cs.umass.edu to the name server dns.cs.umass.edu, which is authoritative
for all hostnames ending with .cs.umass.edu. The authoritative name server
sends the desired mapping to the intermediate name server, dns.umass.edu,
which forwards the mapping to the root name server, which forwards the
mapping to the local name server, dns.eurecom.fr, which forwards the mapping
to the requesting host! In this example, eight DNS messages are sent. Actually,
even more DNS messages can be sent in order to translate a single hostname
- there can be two or more intermediate name servers in the chain between
the root name server and the authoritative name server!
Figure 2.5-3: Recursive queries with an intermediate name server
between the root and authoritative name servers.
The examples up to this point assumed that all queries are recursive
queries. When a host or name server A makes a recursive query to a
name server B, then name server B obtains the requested mapping on behalf
of A and then forwards the mapping to A. The DNS protocol also allows for
iterative
queries at any step in the chain between requesting host and authoritative
name server. When a name server A makes an iterative query to name server
B, if name server B does not have the requested mapping, it immediately
sends a DNS reply to A that contains the IP address of the next name server
in the chain, say, name server C. Name server A then sends a query directly
to name server C.
In the sequence of queries that are are required to translate a hostname,
some of the queries can be iterative and others recursive. Such a combination
of recursive and iterative queries is illustrated in Figure 2.5-4. Typically,
all queries in the query chain are recursive except for the query from
the local name server to the root name server, which is iterative. (Because
root servers handle huge volumes of queries, it is preferable to use the
less burdensome iterative queries for root servers.)
Figure 2.5-4: A query chain with recursive and iterative queries.
Our discussion this far has not touched on one important feature of
the DNS: DNS caching. In reality, DNS extensively exploits
caching in order to improve the delay performance and to reduce the number
of DNS messages in the network. The idea is very simple. When a name server
receives a DNS mapping for some hostname, it caches the mapping in local
memory (disk or RAM) while passing the message along the name server chain.
Given a cached hostname/ IPaddress translation pair, if another query arrives
to the name server for the same hostname, the name server can provide the
desired IP address, even if it is not authoritative for the hostname. In
order to deal with the ephemeral hosts, a cached record is discarded after
a period of time (often set to two days). As an example, suppose that surf.eurecom.fr
queries the DNS for the IP address for the hostname cnn.com. Furthermore
suppose that a few hours later, another Eurecom host, say baie.eurecom.fr,
also queries DNS with the same hostname. Because of caching, the local
name server at Eurecom will be able to immediately return the IP address
to the requesting host without having to query name servers on another
continent. Any name server may cache DNS mappings.
2.5.3 DNS Records
The name servers that together implement the DNS distributed database,
store Resource Records (RR) for the hostname to IP address
mappings. Each DNS reply message carries one or more resource records.
In this and the following subsection, we provide a brief overview of DNS
resource records and messages; more details can be found in [Abitz]
or in the DNS RFCs [RFC 1034] [RFC
1035].
A resource record is a four-tuple that contains the following fields:
(Name, Value, Type, TTL)
TTL is the time to live of the resource record; it determines the time
at which a resource should be removed from a cache. In the example records
given below, we will ignore the TTL field. The meaning of Name and Value
depend on Type:
-
If Type=A, then Name is a hostname
and Value is the IP address for the hostname. Thus, a Type A record provides
the standard hostname to IP address mapping. As an example, (relay1.bar.foo.com,
145.37.93.126, A) is a Type A record.
-
If Type=NS, then Name is a domain
(such as foo.com) and Value is the hostname of a server that knows
how to obtain the IP addresses for hosts in the domain. This record is
used to route DNS queries further along in the query chain. As an example,
(foo.com,
dns.foo.com, NS) is a Type NS record.
-
If Type=CNAME, then Value is a
canonical hostname for the alias hostname Name. This record can provide
querying hosts the canonical name for a hostname. As an example, (foo.com,
relay1.bar.foo.com, CNAME) is a CNAME record.
-
If Type=MX, then Value is a hostname
of a mail server that has an alias hostname Name. As an example, (foo.com.
mail.bar.foo.com, MX) is an MX record. MX records allow the hostnames
of mail servers to have simple aliases.
If a name server is authoritative for a particular hostname, then the name
server will contain a Type A record for the hostname. (Even if the name
server is not authoritative, it may contain a Type A record in its cache.)
If a server is not authoritative for a hostname, then the server will contain
a Type NS record for the domain that includes the hostname; it will also
contain a Type A record that provides the IP address of the name server
in the Value field of the NS record. As an example, suppose a root server
is not authoritative for the host gaia.cs.umass.edu. Then the root server
will contain a record for a domain that includes the host cs.umass.edu,
e.g.,
(umass.edu, dns.umass.edu, NS).
The root server would also contain a type A record which maps the name
server dns.umass.edu to an IP address, e.g.,
(dns.umass.edu, 128.119.40.111, A).
2.5.4 DNS Messages
Earlier in this section we alluded to DNS query and reply messages. These
are the only two kinds of DNS messages. Furthermore, both request and reply
messages have the same format, as shown in Figure 2.5-5.
Figure 2.5-5: DNS message format
The semantics of the various fields in a DNS message are as follows:
-
The first 12 bytes is the header section, which has a number of
fields. The first field is a 16-bit number that identifies the query.
This identifier is copied into the reply message to a query, allowing the
client to match received replies with sent queries. There are a number
of flags in the flag field. A one-bit query/reply flag indicates whether
the message is a query (0) or a reply (1). A one bit authoritative
flag is set in a reply message when a name server is an authoritative
server for a queried name. A one bit recursion-desired flag is set when
a client (host or name server) desires that the name server to perform
recursion when it doesn't have the record. A one-bit recursion available
field is set in a reply if the name server supports recursion. In the header,
there are also four "number of" fields. These fields indicate the number
of occurrences of the four types of "data" sections that follow the header.
-
The question section contains information about the query that is
being made. This section includes (i) a name field that contains
the name that is being queried, and (ii) a type field that indicates
the type of question being asked about the name (e.g., a host address associated
with a name - type "A", or the mail server for a name - type "MX").
-
In a reply from a name server, the answer section contains the resource
records for the name that was originally queried. Recall that in
each resource record there is the Type (e.g., A, NS, CSNAME and MX),
the Value and the TTL. A reply can return multiple RRs in the answer, since
a hostname can have multiple IP addresses (e.g., for replicated Web servers,
as discussed earlier in this section).
-
The authority section contains records of other authoritative servers.
-
The additional section contains other "helpful" records. For
example, the answer field in a reply to an MX query will contain the hostname
of a mail server associated with the alias name Name. The additional
section will contain a Type A record providing the IP address
for the canonical hostname of the mail server.
The discussion above has focussed on how data is retrieved from the DNS
database. You might be wondering how data gets into the database
in the first place? Until recently, the contents of each DNS server
was configured statically, e.g., from a configuration file created by a
system manager. More recently, an UPDATE option has been added to
the DNS protocol to allow data to be dynamically added or deleted from
the database via DNS messages. [RFC 2136]
specifies DNS dynamic updates.
DNSNet provides a nice collection of documents pertaining to DNS [DNSNet].
The Internet Software Consortium provides many resources for BIND,
a popular public-domain name server for Unix machines [BIND].
References
[Abitz 1993] Paul Albitz and Cricket
Liu, DNS and BIND, O'Reilly & Associates, Petaluma, CA, 1993
[BIND] Internet Software Consortium page
on BIND, http://www.isc.org/bind.html
[DNSNet] DNSNet page on DNS resources,
http://www.dns.net/dnsrd/docs/
[RFC 1034] P. Mockapetris, "Domain
Names - Concepts and Facilities," RFC
1034, Nov. 1987.
[RFC 1035] P. Mockapetris, "Domain
Names - Implementation and Specification," RFC
1035, Nov. 1987.
[RFC 2136] P. Vixie, S. Thomson, Y.
Rekhter, J. Bound, "Dynamic Updates in the Domain Name System," RFC
2136, April 1997.
Search RFCs and Internet Drafts
If you are interested in an Internet Draft relating to a certain subject
or protocol enter the keyword(s) here.
Return to Table
of Contents
Copyright 1996-2000 Keith W. Ross and James F.
Kurose