Network Working Group Peter Deutsch, FIRST DRAFT peterd@bunyip.com, Bunyip Information Systems. August 19, 1992 (other contributors to follow) Architecture of the WHOIS++ service ----------------------------------- Part I - Introduction ---------------------- Purpose and Motivation ---------------------- The current NIC WHOIS service is used to provide a very limited directory service, serving information about a small number of Internet users registered with the DDN NIC. In addition, it has been expanded to also serve information about a variety of services and other information. This service allows users to issue searches for individual strings within individual records, as well as searches for individual record handles (that is, unique identifiers associated with each record) using a very simple protocol. This basic service was described in RFC 954. Despite its utility, the current NIC WHOIS service obviously can not function as a general White Pages service for the entire Internet. Given the huge number of users, the obvious problems with reliability and the huge volume of traffic that a full scale directory service is expected to generate, such a centralized architecture is obviously not practical for a generalized Internet directory service. This document is part of a project to extend the simple WHOIS model to address the needs for a simple, light-weight directory service. A general outline of the service and a description of the motiviation for this service are included in In this document we describe our extensions to the current NIC WHOIS service. These extensions are intended to allow users to publish and locate information about other users, services and related information from hosts operating across the Internet. Throughout this document, our extensions to the basic service will be referred to as "WHOIS++" to distinguish it from the original WHOIS service. The WHOIS++ service is intended to be an extension to the extremely simple protocol described in RFC 954. These extensions use an extremely simple data model and a correspondingly simple query protocol to allow users to satisfy their queries with a minimum of effort or resources. The basic architecture of WHOIS++ allows distributed maintenance of the directory contents and the use of an automated yellow pages service for locating WHOIS servers. This Yellow Pages service is described in a companion document . The Basic Information Model --------------------------- Our extensions to the existing WHOIS service are centred upon a recommendation to structure user information around a series of standardized templates, similar to those described by . We also offer a set of extensions to the trivial protocol described in RFC 954 to allow the user to constrain searches to desired attributes or template types, in addition to the existing commands for specifying handles or simple strings. Although not intended as a replacement for the more elaborate directory services now being deployed, it is expected that the minimalist approach we have taken will find application where the high cost of configuring and operating traditional White Pages services can not currently be justified. Note that this system is intended to be easy to set up and operate, and additional templates may be created and used with little effort. Also note that the new architecture makes no assumptions about the search and retrieval mechanisms used within individual servers. Operators are free to use fast indexing software or even provide gateways to other directory services to store and retrieve information. Operators are also free to use other services to automate the creation and maintenance of databases. The WHOIS++ server simply functions as a known front end, offering a simple data model and communicating through a well known port and protocol. The format of replies have been structured to allow the use of more elaborate clients for generating searches and displaying the results, but some effort has been made to keep responses at to some degree readible by humans. The actual implemention details of of an individual WHOIS search engine are left to the imagination of the implementor. It is hoped that this approach will encourage experimentation and the development of improved search engines. Scope of this document ----------------------- In this paper we describe the architecture of the WHOIS++ service and present details of the WHOIS++ protocol extensions. Details of the system's data model is also included. describes the motivation and scope of this project in more detail and makes recommendations for a minimal set of information templates to be supported by each server. A separate paper describes the details of the flooding algorithm and associated protocols. The Current WHOIS Service --------------------------- The existing WHOIS service allows the user to specify simple searches within a single database. Options allow the user to constrain these searches. For example, the user may search only on specfied handles, specified mailboxes, or on all strings and handles in the database. A number of sites have brought up additional WHOIS server and some have added additional options to specify further search constraints or request help. A recent informal survey of Internet sites found over 100 hosts offering some form of WHOIS service. Unfortunately there has been little or no coordination of features or command syntax. In this paper we proposed standardizing such extensions. The current WHOIS Information Model ------------------------------------ The current WHOIS service is based upon an extremely simple data model. The NIC WHOIS database consists of a series of individual records, each of which is identified by a single unique identifer (the "handle"). Each record contains one or more lines of information. Currently, There is no structure or implicit ordering of this information, although by implication each record is concerned with information about a single user or service. We are implemented two basic changes to this model. First, we have structured the information within the database as collections of attribute/value pairs, with each individual record containing a specified set of these attributes. Secondly, we have introduced typing of the database records. In effect, each record is based upon one of a limited number of templates, each containing a finite and specified number of attirbute fields. This will allow us to limit our searches to specific collections, such as information about users, services, abstracts of papers, descriptions of software, etc. As a final extension, we require that each individual WHOIS++ database on the Internet be assigned a unique handle, analogous to the handle associated with each database record. The entire WHOIS++ database structure is shown in Fig. 1. [* Ideally, these database handles will be registered through the IANA, ensuring their uniqueness. This will allow us to specify each WHOIS++ entry on the Internet as a unique record handle/WHOIS handle pair. *] A unique registered handle is preferable to using the host's IP address, since it is conceivable that the WHOIS++ server for a particular domain may move over time. If we preserve the unique WHOIS++ handle in such cases we have the option of using it for resource discovery and networked information retrieval. We believe that organizing information around a series of such templates will make it easier for administrators to gather and maintain this information and thus encourage them to make such information available. At the same time, as users become more familiar with the attributes available within specific templates they will be better able to specify their searches, leading to a more useful service. ______________________________________________________________________________ | | | | | _______ _______ _______ | | handle3 |.. .. | handle6 |.. .. | handle9 |.. .. | | | _______ | _______ | _______ | | | handle2 |.. .. | handle5 |.. .. | handle8 |.. .. | | | _______ | _______ | _______ | | | handle1 |.. .. | handle4 |.. .. | handle7 |.. .. | | | |.. .. | |.. .. | |.. .. | | | ------- ------- ------- | | Template Template Template | | Type 1 Type 2 Type 3 | | | | + Single unique WHOIS database handle | | | | | | Fig.1 - Structure of a single WHOIS++ database. | | | | Notes: - Entire database is identified by a single unique WHOIS handle. | | - Each record has a single unique handle and a specific set | | of attributes, determined by the template type used. | | - Each value associated with an attribute can be any ASCII string | | up to a specified length. | | | ------------------------------------------------------------------------------ The WHOIS++ Yellow Pages ------------------------- Without a functional services registry users will obviously have difficulty in locating individual Internet services. As part of the complete WHOIS++ architecture, in we describe a simplify Yellow Pages registry service. This service features proactive data gathering and periodic verification of servers so status information can be offered to the user. Beyond WHOIS++ -------------- A flooding algorithm is used to propagate information from individual records across a distributed database combining information from a number of WHOIS servers. This mechanism is used to address concerns about scaling and redundancy. [* obviously need a lot more on centroids here *] The WHOIS++ Architecture -------------------------- ---------------------------------------------------------------------- ____ ____ root | | | | directory | | | | service ---- ---- ____ ____ whois | | | | index | | | | service ---- ---- ____ ____ ____ individual | | | | | | whois servers | | | | | | ---- ---- ---- Fig. 2 - Overall system architecture. ---------------------------------------------------------------------- Getting Help ------------ Another extension to the basic WHOIS service is a requirement for a basic HELP command, allowing users to find out information about the individual server and the entire WHOIS++ service. This is done with a simple extension to the extended information model by defining a HELP template format. The operator of each WHOIS service is required to have, as a minimum, a single general HELP template. Details of the HELP template is included in Minimum HELP Required ----------------------- Every WHOIS++ server is required to have at least two records of type HELP. one with Subject "HELP" and one with Subject "HELPHELP". The first must contain a general introductory help message about the service and the second a general introductory help message about HELP itself. Executing the command: HELP will result in the display of the HELP template with subject "HELP". Executing the command: HELP HELP will result in the display of the HELP template with subject "HELPHELP". Executing the command: HELP will result in a search through all available help templates for a record with the matching "searchstring". Privacy and Security Issues ----------------------------- WHOIS++ is intended to be a simple, generalized and unauthenticated template-oriented browsing service available to all Internet users. Site administrators should NOT make confidential information about their users available through this service, even if the WHOIS server is not publicized. At the same time, given the unauthenticated nature of the service users are cautioned against putting too much faith in the information served. Users looking for a secure, authenticated and robust service are advised to check out the work being done on generalized directory services, where such considerations have been given more weight (and have consequentally added additional weight to the resulting service). [* editorial comment - I doubt I'll get away with the above statement, but it makes me feel good so I'll leave it in for now! :-) *] Part II - The WHOIS++ Protocol -------------------------------- The WHOIS++ protocol specifies the interactions between a WHOIS client and a WHOIS server supporting the WHOIS++ extensions. These extensions are designed to be backwards compatible with existing servers, in the sense that a new server receiving any of the older commands specified in RFC 954 will behave in the same manner as the original NIC WHOIS server. Obviously, it is not possible to ensure desired behaviour if one of the extended commands is sent to an older WHOIS server, since the requested functionality is simply not there. Still, it is possible to store whether the WHOIS++ command set is supported as an attribute for each WHOIS server in any services registry. Thus, in practice this should not be a problem. In addition, any such command sent to an older WHOIS server would simply be treated as a search term, and thus no harm should result. The small number of older servers, and the probability that at least some of the older servers will be converted as newer servers become available, means that backwards compatibility is not expected to be a problem in practice. The WHOIS++ Command set ------------------------ There are two types of WHOIS++ commands - system commands and the WHOIS++ search command. System Commands ---------------- System commands are commands to the server for information or to control its operation. These include commands to list the template types available, to obtain a single blank template of any available type, to obtain a list of the search methods that are supported on that server and a command to obtain help. There are also commands to obtain the current version of the WHOIS++ protocol supported and to obtain a brief description of the service, which is intended to support the automated registration of the service by yellow pages directory services. Table I lists the set of valid WHOIS++ system commands. ---------------------------------------------------------------------- Short Form Long Form Functionality -------------------------------------------------------------------- ? HELP [ [',' ]] system help LIST [ [',' ]] List templates supported by this system SHOW [',' ] Show contents of templates specified CONSTRAINTS list search methods supported by this server or other system constraints (eg maxhits) VERSION return current version of the protocol supported [* is this really needed? *] DESCRIBE Describe this server, formating the response using the standard IAFA"Services" template Table I - Valid WHOIS++ SYSTEM commands. ---------------------------------------------------------------------- Format of the Search Command ----------------------------- A search command consists of one or more search terms, which act as specifiers for the selection of records from the WHOIS++ database. Such specifiers are cumulative, that is, each search term is an additional specification that a record must satisfy before it will be returned to the user as a valid response to the query. There is currently no plans for Boolean operations (logical AND, OR or NOT), although this capability could be added if there is sufficient demand. [* This is obviously still open to debate *] A search command consists of one or more search terms, followed by an optional set of global search constraints. Search constraints that apply to every search term are specified as global constraints. In addition, the format of server responses may be changed from the specified default behaviour by setting a specific global constraint. Format of a Search Term ------------------------ Each search term consists of one of the following: 1) A search string, followed by an optional comma and set of specific search constraints. 2) A search term specifier (as listed in Table II), followed by '=', followed by a search string, followed by an optional comma and set of specific search constraints. 3) An abbreviated search term specifier, followed by a search string, followed by an optional comma and set of specific search constraints. 4) A combination of attribute name, followed by '=', followed by a search string, followed by an optional comma and set of specific search constraints. In addition to the historical search specifiers to specify a search on content, handle or mailbox provided in RFC 954, there are also identifiers to select on template type or attribute name. In keeping with the spirit of RFC 954, all identifiers have an associated single character prefix that may be used in place of the "=" format of the same identifier. If no term identifier is provided, then the search will be applied to all template names, handles, attribute names and attribute values. This corresponds to an identifier of SEARCH_ALL. When the user specifies the search term using the form: " = " This is treated as equivalent to the combined terms: "ATTRIBUTE = ; VALUE = " Note that in this case, "" can not be one of the specifiers "ATTRIBUTE", "VALUE", "HANDLE" or "TEMPLATE". For discussion of the system reply format, and selecting the appropriate format, see the section "Server Responses". Format of a Search String -------------------------- [* The actual format of a search string is not yet specified, as there is a discussion to be had concerning the use of non-ASCII (esp. but no limited to other European languages) in search strings and even attribute names, etc. We must allow for this, but this document is merely flagging this need for now. This sounds like fruitful grounds for Working Group discussions... *] Search Term Constraints ------------------------- Specific search constraints are intended to be hints or recommendations to the search engine on how to perform that part of the search. Thus, a user might specify a search constraint as "exact match", or "substring match". The CONSTRAINTS system command is used to list the search constraints supported by an individual server. [* Note: The best way to handle this is probably with either a specified list of keywords or by referring to an IANA registry of supported search types... *] If a server cannot satisfy the specified constraint there is a mechanism for informating the user in the reply, using system messages. In such cases, the search is still performed, and the server ignores unsupported constraints. ---------------------------------------------------------------------- Valid specifiers: ----------------- Short Long Form Functionality -------------------------------------------------------------------- . ATTRIBUTE Confine search to attribute fields # VALUE Confine search to attribute values ! HANDLE Confine search to handles. ^ TEMPLATE Confine search to template names * SEARCH-ALL Search everything A search term takes one of the following forms: 1) [',' ] 2) = [',' ] 3) [',' ] 4) = [',' ] Which is equivalent to the compound terms: ATTRIBUTE = ; VALUE = Table II - Search command specifiers. ---------------------------------------------------------------------- Server Response Modes ---------------------- [* we can easily support additional response modes here. *] There are currently a total of four different response modes possible for WHOIS++ servers. These are FULL, ABRIDGED, HANDLE or SUMMARY. The syntax of each output format is specified in more detail in the following section. 1) A FULL format response provides the complete contents of each template matching the specified query, including the template type and handle for each record. 2) An ABRIDGED format response provides a brief summary, including (as a minimum) the record handle and the specific information in the corresponding record that matched the query. [* this needs work *] 3) A HANDLE format response returns only a list of handles that matched the specified query. 4) A SUMMARY response provides only a brief summary of information about the number of matches and the list of template types in which the matches occured. By default, a WHOIS++ server will provide a FULL response when there is a single record matching the specified query, an ABRIDGED response when there between two and ten records matching the query and a SUMMARY response when there is more than ten records matching the specified query. The user may override these defaults by specifying the appropriate keywords as global constraints to a search command (see below). [* These numbers were chosen psuedo-randomly and are obviously open to debate... *] The server response modes are summarized in Table III. Format of Responses -------------------- Each response consists of an optional free form introductory text message, followed by any optional system generated messages, followed by a formatted response message, followed by any optional system generated messages, followed by an optional free form closing text message. That is: [ ]* ['%' ]* ['%' ]* [ ]* There is no limit on the total length or format of either the introductory or closing text message, although each line should consist of no more than 81 characters, including the terminating newline character. If there are no matches to a query, the system is not required to generate any output as a formatted response, although it may still generate system messages and/or a closing text message. All optional system generated messages must begin with a '%' as the first character and must be no more than 81 characters long, including the terminating newline character. There is no limit to the number of system messages that may be generated. Syntax of a Formatted Response ------------------------------ All formatted responses consist of a START line, followed by a response-specific section, followed by a TERMINATION line. It is permissible to insert any number of lines consisting solely of newlines within a formatted response to improve readibility. A START line consists of a line beginning with a '#' in the first column, followed by zero or more white space characters (SPACE or TAB), followed by one of the following keywords FULL, ABRIDGED, HANDLE or SUMMARY. Where the keyword is FULL, ABRIDGED or HANDLE, this is then followed by one or more white space characters, followed by a count of the number of matches found for that query, followed by zero or more white space characters, followed by a newline. A START line must contain no more than 81 characters, including the terminating newline character. A TERMINATION line consists of a line beginning with a '#' in the first column, followed by zero or more white space characters (SPACE or TAB), followed by the keyword END, followed by zero or more white space characters, followed by a newline. A TERMINATION line must contain no more than 81 characters, including the terminating newline character. A response-specific section will be one of the following: 1) FULL Format Response 2) ABRIDGED Format Response 3) HANDLE Format Response 4) SUMMARY Format Response The details of each are specified in the following sections: A FULL format response ------------------------ A FULL format response consists of a series of responses, each consisting of a FORMAT specifier line, followed by the complete template information for the matching record. Each FORMAT specifier line consists of a '#' in the first column, followed by zero or more white space characters, followed by the name of the corresponding template type, followed by one or more white space characters, followed by the handle for that record, followed by zero or more white space characters, followed by a newline. A FORMAT specifier must contain no more than 81 characters, including the terminating newline character. [* Note this implicitly puts a limit on the length of a template name. We will need to set limits for this, and probably want to allow lines longer than 80 characters. I've put the 81 char limit in as a placeholder. *] The template information for each record will be returned as a series of lines consisting of a single space, followed by the corresponding line of the record. The line of the record shall consist of the attribute name, followed by a ':', followed by at least one space, followed by the value of that attribute, followed by a newline. Each such line shall be limited to no more than 81 characters, including the terminating newline. If a line (including the required leading single space) would exceed 81 characters, it is to be broken into lines of no more than 81 characters, with each continuation line beginning with a "+" character. ABRIDGED Format Response ------------------------ An ABRIDGED format response consists of a single set of responses, consisting of a single line excerpt of the template information from each matching record. The excerpt information shall include, as a minimum, the template type and handle of the record, as well as the portion of the information that caused the match. The abridged template information for each record will be returned as a series of lines, each of which must consist of a single space, followed by the abridged line of the record. Each line shall be limited to no more than 81 characters, including the terminating newline. If a line (including the required single space, would exceed 81 characters, it is to be broken into lines of no more than 81 characters, with the remainder following on the subsequent line, with the space replaced by a "+" character. HANDLE Format Response ----------------------- A HANDLE format response consists of a single set of responses, consisting of a single line listing the handle and template type for each matching record. Each line shall start with at least one space, followed by the handle, followed by at least one space, followed by the template type, followed by zero or more white space characters and terminated by a newline. Each such line must contain no more than 81 characters, including the terminating newline character. If a line (including the required first space) would exceed 81 characters, it shall be split into multiple lines, with each continuation line beginning with a '+' instead of a space. SUMMARY Format Response ----------------------- A SUMMARY format response consists of a single set of responses, consisting of a line listing the number of matches to the specified query, followed by a list of all template types which satisfied the query at least once. The first line shall begin with the string "matches: ", be followed by the number of responses to the query and terminated by a newline. The second line shall begin with the string "templates: ", be followed by the name of the first template type which matched the query, followed by a newline. Each succeeding line shall include the name of the next template type matching the query, terminated by a newline. System Generated Messages -------------------------- Any line beginning with a '%' in the first column is to be treated as a System generated message. System generated messages may occur immediately before, within or immediately after the formatted response section of the response. System generated messages displayed before or after the formatted response section are expected to refer to operation of the system or refer to the entire query. System generated messages within the output of an individual record during a FULL reponse are expected to refer to that record only, and could (for example) be used to indicate problems with that record of the response. Compatibility with Older WHOIS Servers --------------------------------------- Note that this format, although potentially more verbose, is still in a human readible form. Responses from older systems that do not follow this format are still conformant, since their responses would be interpreted as being equivalent to optional text messages, without a formatted response. Clients written to this specification would display the responses as a advisory text message, where it would still be readible by the user. ---------------------------------------------------------------------- Format Functionality ----------------------------------------------------- FULL Returns complete template information for each record that matches the specified query. Each such record is separated by a line that specifies the template type and handle for that record. ABRIDGED Returns a one line abridged response for each record that matches the specified query. HANDLE Returns a list of handles and corresponding template types for each record that matches the specified query. SUMMARY Returns only a brief summary of number of matches and the corresponding template types which matched the specified query. Note: Default is a FULL response for a single match, an ABRIDGED response when there is between two and ten matches and a SUMMARY response when there are more than ten matches. These may be overridden by specifying the response format desired as a global contraint. Table III - Summary of WHOIS++ Response Formats. ---------------------------------------------------------------------- Some Examples: -------------------- 1) A FULL format response: # FULL 3 # USER PD45 First Name: Peter Last Name: Deutsch email: peterd@bunyip.com # USER AE1 First Name: Alan Last Name: Emtage email: bajan@bunyip.com # SERVICES WWW1 Type: World Wide Web Location: the world # END -------------------- 2) An ABRIDGED format response: # ABRIDGED 3 Peter Deutsch (PD45) peterd@bunyip.com Alan Emtage (AE1) bajan@bunyip.com World Wide Web (WWW1) the world # END -------------------- 3) A HANDLE format response: # HANDLE 3 PD45 User AE1 User WWW1 Services # END -------------------- 4) A SUMMARY HANDLE format response: # SUMMARY Matches: 175 Templates: User Services Abstracts # END Table IV - Some sample responses. ---------------------------------------------------------------------- Appendix A - The WHOIS++ BNF Grammar --------------------------------------- Here is the complete BNF grammar for the WHOIS++ extensions: [* Well, almost complete. See "Notes" at the end *] WHOIScommand ::= | SYScommand ::= [ [',' ]] | "show" [',' ] | "constraints" | "version" syscmdname1 ::= "help" | "?" | "list" WHOISquery ::= [';' ]* [':' ] term ::= | | | generalterm ::= [ ',' ] specificterm ::= '=' [ ',' ] specificname ::= "template" | "handle" | "attribute" | "value" shortterm ::= [ ',' ] shortname ::= '^' | '!' | '.' | '' | '#' | '*' globalcnstrnts ::= [ ]* | [ ]0-1 responses ::= "full" | "abridged" | "handle" | "summary" aconstraint ::= (a set of specifiers for constraints. Expect these to be a list of valid search methods... ) searchstring ::= TBD (some string that does not include or '?' but does include extended characters) To come: - BNF for response formats. - more details on searchstring - more details on what a constraint