Arcadia Papers: ABSTRACT


"Maintaining Distributed Hypertext Infostructures: Welcome to MOMspider's Web", by Roy Fielding in Proceedings of the First International World-Wide Web Conference (WWW94), Geneva, May 25-27, 1994. subsequent version of this paper published in Computer Networks and ISDN Systems, 27(2), November 1994.)

Abstract

Most documents made available on the World-Wide Web can be considered part of an infostructure -- an information resource database with a specifically designed structure. Infostructures often contain a wide variety of information sources, in the form of interlinked documents at distributed sites, which are maintained by a number of different document owners (usually, but not necessarily, the original document authors). Individual documents may also be shared by multiple infostructures. Since it is rarely static, the content of an infostructure is likely to change over time and may vary from the intended structure. Documents may be moved or deleted, referenced information may change, and hypertext links may be broken. As it grows, an infostructure becomes complex and difficult to maintain. Such maintenance currently relies upon the error logs of each server (often never relayed to the document owners), the complaints of users (often not seen by the actual document maintainers), and periodic manual traversals by each owner of all the webs for which they are responsible. Since thorough manual traversal of a web can be time-consuming and boring, maintenance is rarely or inconsistently performed and the infostructure eventually becomes corrupted. What is needed is an automated means for traversing a web of documents and checking for changes which may require the attention of the human maintainers (owners) of that web. The Multi-Owner Maintenance spider (MOMspider) has been developed to at least partially solve this maintenance problem. MOMspider can periodically traverse a list of webs (by owner, site, or document tree), check each web for any changes which may require its owner's attention, and build a special index document that lists out the attributes and connections of the web in a form that can itself be traversed as a hypertext document. This paper describes the design of MOMspider and how it was influenced by the nature of distributed hypertext maintenance and requirements for the good behavior of any web-traversing robot. It also includes discussion of the efficiency requirements for maintaining world-wide webs and proposed changes to HTML and HTTP to support distributed maintenance. The paper concludes with a short description of MOMspider's future and pointers to its freeware distribution site.
The Arcadia Project <arcadia-www@ics.uci.edu>
Last modified: Tue Feb 28 11:28:47 1995