One document matched: draft-rfced-info-moats-01.txt
Differences from draft-rfced-info-moats-00.txt
INTERNET DRAFT EXPIRES JAN 1999 INTERNET DRAFT
Internet-Draft Ryan Moats
Rick Huber
Expires in six months AT&T
June 1998
Building Directories from DNS: Experiences from WWWSeeker
<draft-rfced-info-moats-01.txt>
Status of This Memo
This document is an Internet-Draft. Internet-Drafts are working
documents of the Internet Engineering Task Force (IETF), its
areas, and its working groups. Note that other groups may also
distribute working documents as Internet-Drafts.
Internet-Drafts are draft documents valid for a maximum of six
months and may be updated, replaced, or obsoleted by other
documents at any time. It is inappropriate to use Internet-
Drafts as reference material or to cite them other than as ``work
in progress.''
To learn the current status of any Internet-Draft, please check
the ``1id-abstracts.txt'' listing contained in the Internet-
Drafts Shadow Directories on ftp.is.co.za (Africa), nic.nordu.net
(Europe), munnari.oz.au (Pacific Rim), ftp.ietf.org (US East
Coast), or ftp.isi.edu (US West Coast).
Abstract
There has been much discussion and several documents written about
the need for an Internet Directory. Recently, this discussion has
focussed on ways to discover an organization's domain name without
relying on use of DNS as a directory service. This draft discusses
lessons that were learned during InterNIC Directory and Database
Services' development and operation of WWWSeeker, an application that
finds a web site given information about the name and location of an
organization. The back end database that drives this application was
built from information obtained from domain registries via WHOIS and
other protocols. We present this information to help future
implementors to avoid some of the blind alleys that we have already
explored. This work builds on the Netfind system that was created by
Mike Schwartz and his team at the University of Colorado at Boulder
[1].
Expires 11/30/98 [Page 1]
INTERNET DRAFTBuilding Directories from DNS: Experiences from WWWSeekerJune 1998
1. Introduction
Over time, there have been several RFCs [2, 3, 4] about approaches
for providing Internet Directories. Many of the earlier documents
discussed white pages directories that supply mappings from a
person's name to their telephone number, email address, etc.
More recently, there has been discussion of directories that map from
a company name to a domain name or web site [5]. Many people are
using DNS as a directory today to find this type of information about
a given company. Typically when DNS is used, users guess the domain
name of the company they are looking for and then prepend "www.".
This makes it highly desirable for a company to have an easily
guessable name.
There are two major problems here. As the number of assigned names
increases, it becomes more difficult to get an easily guessable name.
Also, the TLD must be guessed as well as the name. While many users
just guess ".COM" as the "default" TLD today, there are many two-
letter country code top-level domains in current use as well as other
gTLDs (.NET, .ORG, and possibly .EDU) with the prospect of additional
gTLDs soon. As the number of TLDs in general use continues to
increase, guessing gets more difficult every day.
Between July 1996 and our shutdown in March 1998, the InterNIC
Directory and Database Services project maintained the Netfind search
engine [1] and the associated database that maps organization
information to domain names and thus acts as the type of Internet
directory that associates company names with domain names. We also
built WWWSeeker, a system that used the Netfind database to find web
sites associated with a given organization. The experienced gained
from maintaining and growing this database provides valuable insight
into the issues of providing a directory service. We present it here
to allow future implementors to avoid some of the blind alleys that
we have already explored.
2. Directory Population
2.1 Using WHOIS to Populate the Directory
One proposal for populating a directory is to use WHOIS to gather
information about the organization that owns a domain. At the
conclusion of the InterNIC Directory and Database Services project,
our backend database contained about 2.9 million records that have
data that could be retrieved via WHOIS. The entire database
contained 3.25 million records, with the additional records coming
from sources other than WHOIS.
Expires 11/30/98 [Page 2]
INTERNET DRAFTBuilding Directories from DNS: Experiences from WWWSeekerJune 1998
In our experience this information contains a significant number of
factual and typographical errors and requires further examination and
processing to improve its quality. Also, those TLDs that have
registrars that support WHOIS typically only support WHOIS
information for second level domains (i.e. ne.us) as opposed to lower
level domains (i.e. windrose.omaha.ne.us). Further, there are TLDs
without registrars, TLDs without WHOIS support, and still other TLDs
that use other methods (HTTP, FTP, gopher) for providing
organizational information. Based on our experience, an implementor
of an internet directory needs to support multiple protocols for
directory population.
2.2. Using "Tree Walks" to Populate the Directory
Another proposal is to use a variant of a "Tree Walk" to determine
the domains that need to be added to the directory. Our experience
is that this is neither a reasonable nor an efficient proposal for
maintaining such a directory. Except for some infrequent and long-
standing DNS surveys [6]. DNS "tree walks" tend to be discouraged by
the Internet community, especially given that the frequency of DNS
changes would require a new tree walk monthly. Also, our experience
has shown that data on allocated DNS domains can be usually retrieved
via other faster and more efficient methods (FTP, HTTP, etc.).
Since existing domains in the database may be verified via direct DNS
lookups rather than a "tree walk," "tree walks" should be the choice
of last resort for directory population.
3. Directory Updating: Full Rebuilds vs Incremental Updates
Given the size of our database in April 1998 when it was last
generated, a complete rebuild of the database that is available from
WHOIS lookups would require between 11.6 million and 14.5 million
seconds of time. This estimate does not include other considerations
that would increase the amount of time to rebuild the entire
database.
Whether this is feasible depends on the frequency of database updates
provided. Because of the rate of growth of allocated domain names
(150K-200K new allocated domains per month), we provided monthly
updates of the database. To rebuild the database each month would
require between 3 and 5 machines to be dedicated full time to the
task. Instead, we checkpointed the allocated domain list and rebuild
on an incremental basis during one weekend of the month. This
allowed us to complete the update on between 1 and 4 machines without
full dedication over a couple of days. Further, by coupling
incremental updates with periodic refresh of existing data (which can
Expires 11/30/98 [Page 3]
INTERNET DRAFTBuilding Directories from DNS: Experiences from WWWSeekerJune 1998
be done during another part of the month, and doesn't require full
dedication of machine hardware), older records would be periodically
updated when the underlying information changes. The tradeoff is
timeliness and accuracy of data (some data in the database may be
old) against hardware and processing costs.
4. Directory Presentation: Distributed vs Monolithic
While a distributed directory is a desirable goal, we maintained our
database as a monolithic structure. Given past growth, it is not
clear at what point migrating to a distributed directory becomes
actually necessary to support customer queries. Our last database
contained over 3.25 million records in a flat ASCII file. Searching
was done via a PERL script of an inverted tree (also produced by a
PERL script). While admittedly primitive, this configuration
supported over 200,000 database queries per month from our production
servers.
Increasing the database size only requires more disk space to hold
the database and inverted tree. Of course, using database technology
would probably improve performance and scalability, but we had not
reached the point where this technology was required.
5. Acknowledgments
This work described in this document was partially supported by the
National Science Foundation under Cooperative Agreement NCR-9218179.
6. References
Request For Comments (RFC) documents are available at
http://info.internet.isi.edu/1/in-notes/rfc and from numerous mirror
sites.
[1] M. F. Schwartz, C. Pu. "Applying an Information
Gathering Architecture to Netfind: A White Pages
Tool for a Changing and Growing Internet," Univer-
sity of Colorado Technical Report CU-CS-656-93.
December 1993, revised July 1994.
<URL:ftp://ftp.cs.colorado.edu/pub/cs/techreports/schwartz/Netfind.Gathering.txt.Z>
[2] K. Sollins, Plan for Internet Directory Services,
RFC 1107, July 1989.
[3] S. Hardcastle-Kille, E. Huizer, V.Cerf, R. Hobby,
S. Kent, A Strategic Plan for Deploying an Internet
Expires 11/30/98 [Page 4]
INTERNET DRAFTBuilding Directories from DNS: Experiences from WWWSeekerJune 1998
X.500 Directory Service, RFC 1430, February 1993.
[4] J. Postel & C. Anderson, White Pages Meeting
Report, RFC 1588, February 1994.
[5] J. Klensin, T. Wolf, G. Oglesby, Domain Names and
Company Name Retrieval, RFC 2345, May 1998.
[6] M. Lottor, "Network Wizards Internet Domain Sur-
vey," available from
http://www.nw.com/zone/WWW/top.html
7. Authors' addresses
Ryan Moats Rick Huber
AT&T AT&T
15621 Drexel Circle Room 1B-433, 101 Crawfords Corner Road
Omaha, NE 68135-2358 Holmdel, NJ 07733-3030
USA USA
EMail: jayhawk@att.com Email: rvh@att.com
INTERNET DRAFT EXPIRES JAN 1999 INTERNET DRAFT
| PAFTECH AB 2003-2026 | 2026-04-23 15:42:08 |