Functional Requirements and Framework for Versioning on the WWW

David G. Durand and Fabio Vitali

Changes from last version

Abstract

This document describes the functional requirements for integrating versioning into the WWW. Versioning is the fundamental basis of document management systems, with far reaching effects on the semantics of document identity and meaningful operations. These requirements reflect the basic versioning needs for document management and collaborative authoring. It does not define the complete set of requirements for these domains where they extend beyond the versioning of resources.

1. Introduction

This document discusses why versioning is needed on the WWW, and the functional requirements for full version support. We have divided the requirements in three sections. This discussion enumerates the reqirements for implementing such functionality as a first step to creating a specification that will address these needs.

We first briefly describe the rationale for versioning on the web in Section 2. This rationale enumerates the goals of versioning on the WWW. All specific requirements should support (and certainly should not hinder) the realization of the goals. Section 3 contains global requirements for protocol development. These are things we think are technically justified and that fulfil the rationale. They are separated from the other requirements because their acceptance creates further constraints on other technical requirements. Finally, In Section 4, we specific functional requirements based on the foundation established in the earlier sections.

We have based this effort on David Fiander's suggestion to separate versioning and configuration requirements, and we assume a two-layer architecture for versioning on the web. The first layer, whose requirements are defined in this document, will address the simple problem of handling multiple versions of single resources. The second layer will address the thornier problems of configuration management for multiple resources. This layering simplifies both discussion and design.

2. Rationale

Versioning in the context of the world-wide web offers a variety of benefits:

  1. It provides infrastructure for efficient and controlled management of large evolving web sites.

    Modern configuration management systems are built on some form of repository that can track the revision history of individual resources, and provide the higher-levelools to manage those saved versions. Basic versioning capabilities are required to support such systems.
  2. It allows parallel development and update of single resources

    Since versioning systems register change by creating new objects, they enable simultaneous write access by allowing the creation of variant versions. Many also provide merge support to ease the revers operation.
  3. It provides a framework for access control over resources.

    While specifics vary, most systems provide some method of controlling or tracking access to enable collaborative resource development.
  4. It allows browsing through past and alternative versions of a resource

    Frequently the modification and authorship history of a resource is critical information in itself.
  5. It provides stable names that can support externally stored links for annotation and link-server support.

    Both annotation and link servers frequently need to store stable references to portions of resources that are not under their direct control. By providing stable states of resources, version control systems allow not only stable pointers into those resources, but also well-defined methods to determine the relationships of those states of a resource.
  6. It allows explicit semantic representation of single resources with multiple states

    A versioning system directly represents the fact that a resource has an explicit history, and a persistent identity across the various states it has had during the course of that history.

3. Global requirements

This section covers the overarching contraints that must inform and direct detailed requirements for versioning support. They encompass compatibility across different implementations, as well as compatibility with current practice. Therefore, we believe the following to be the general requirements for WWW versioning:

  1. Stableness of versions.
    Most versioning systems are intended to enable an accurate record of the history of evolution of a document. This accuracy is ensured by the fact that a version eventually becomes "frozen" and immutable. Once a version is frozen, further changes will create new versions rather than modifying the original. In order for caching and persistent references to be properly maintained, a client must be able to determine that a version has been frozen. We require that unlocked resource versions be frozen. This enables the common practice of keeping unfrozen "working versions". Any successful attempt to retrieve a frozen version of a resource will always retrieve exactly the same content, or return an error if that version (or the resource itself) are no longer available. Since URLs may be reassigned at a server's discretion this requirement applies only for that period of time during which a URL identifies the same resource.

  2. User Agent Interoperability.
    All versioning-aware user agents should be able to work with any versioning-aware HTTP server. It is acceptable for some user agent/server combinations to provide special features that are not universally available, but the protocol should be sufficient that a basic level of functionality will be universal.
  3. Style-free Versioning
    The protocol should not unnecessarily restrict version management style to any one paradigm. For instance, locking and version number assignment should be interoperable across servers and clients, even if there are some differences in their preferred models.
  4. Separation of access to resources and access control
    The protocol must separate the reservation and release of versioned resources from their access methods. Provided that consistency constraints are met before, during and after the modification of a versioned resource, no "right way" to access to a resource is enforced by the protocol. For instance, a user may request declare an intention to write after a GET, may POST a resource without releasing the lock, and might even request a lock via HTTP connection while getting the document via FTP.
  5. Legacy Resource Support.
    The protocol should enable a versioning aware server to work with existing resources and URLs. Special versioning information should not become a mandatory part of HTTP protocols except where it is required. Special version information that would break existing clients and servers, such as new mandatory headers, cannot therefore be required for GET (and possibly also for PUT).
  6. Legacy User Agent Support.
    Servers should make versioned resources accessible to versioning-unaware user-agents in a format acceptable to them.
  7. Specific named version URLs that are constructed from a URL and an opaque version string
    Because the notation will be required to operate in the version control environment preferred by the website maintainer, it must be able to properly contain arbitrary strings, which may be used by the VCS as version identifiers. While version information may be intelligible to the human operator, and perhaps to special-purpose clients, the client must be able to treat the version specifier as a black box.

4. Functional requirements

The following functional reqirements are intended to satisfy the global requirements of Section 3 and enable the benefits listed in Section 2. The mention of possible new HTTP methods is intended to make the discussion clearer and more concrete, not to rule out other methods of meeting the requirements.

The protocol should provide:

  1. Access to specific named versions via a URL
    This is required for version-specific linking, and for legacy user-agent support.
  2. A URL to denote a versioned resource itself, rather than specific versions of it
    This is more important if URL computations are not allowed, since an identifier is needed for queries about the versioning status of a resource. This is used to perform operations (such as adjusting attributes, changing locks, or reassigning URLs) that affect all versions of a resource, rather than any specific version.
  3. Direct access to a server-defined "default", "current" or "tip" version of a resource
    This is one of the simplest ways to guarantee legacy user-agent compatibility and legacy file compatibility. If no special version URLs are used, the server will provide a default. This does not rule out the possibility of a server returning an error in case no such default exists.
  4. A way to access common related URLs from a versioned URL, whether by server query, URL computation, or some other way: Some versions of a resource are special. It must be possible in some way for a versioning-aware client to access common related versions to the one it currently is displaying. Possible solutions include, but are not limited to: the server automatically adding header fields to a versioned URL specifying the URL of the common related versions, the server providing one or more query methods ("who is the previous version to this URL?"), or a standardized way to compute related URLs when given a versioned URL. We feel that access to the "default" version of a resource is an extremely important operation, that a browser should be able to perform at any time that a versioned URL is seen.
  5. A way to retrieve the complete version topology for this resource
    There should be a way to retrieve information about all versions of a resource. The format for this information must be standardized so that the basic information can be used by all clients.
  6. Some way to determine that a URL points to a named version of a resource
    This might be implemented as part of the URL format, a server query or additional headers.
  7. Some way to determine a version identification and a resource identification for a versioned resource, given its URL
    This requirement describe the ability to take the URL of a version of a resource and determine: Note that this kind of facility supports only some comparison operations: It enables the determination that two version-containing URLs designate versions of the same resource. However, given the phenomenon of URL aliasing, it is insufficient to determine that they are not versions of the same resource.

    This is sort of a minimal "browsing through time" requirement. Tthis requirement allows a browser to tell that a versioned resource has been accessed and then to invoke special versioning or configuration management operations on the resource. While client performance will be best if this can be done via URL computation (ie. mangling) it could also be done by an extra query and round-trip to the server.
  8. A way to request exclusive access to a version of a resource (LOCK)
    Since not all systems implement lock-based access there is a question as how this should be implemented. Client use of this method could be optional, allowing some relatively strong guarantee on the meaning of acquiring a lock. Alternatively, clients could be expected to take a lock, but servers might implement different locking policies (possible even including implementation of LOCK and UNLOCK as NOPS).
  9. A way to specify a timeout after which a lock will lapse
    In many cases, locks over a certain duration are due to errors, and their strict enforcement can cause more problems than inadvertent version skew. We should allow locks to have a lifetime. It may prove a good idea to have a finite default lifetime defined by the protocol. If a universal default is too constraining, there should be a way for a server to inform the client what the lifetime of a lock is. Servers should honor client lock lifetime requests, or inform them if the request is denied.
  10. A way to release exclusive acccess to a resource (UNLOCK)
    This is the inverse of LOCK.
  11. A way for a client to declare an intention to modify a resource (RESERVE or CHECKOUT?)
    This operation is required before any versioned update. Its effects may vary depending on server policy, from locking a resource, to forking a new variant, to a NOP on servers that do not track sessions or restrict updates. If this operation returns a version number, the client is required to make sure that it uses a copy of the data associated with that version number of the resource for any update operations it carries out. Servers that wish to enforce a mandatory GET operation before update, should simply use a fresh version identifier on the return from this operation.
  12. A way to declare the end of an intention to write a resource
    This is the inverse of RESERVE. Typically, servers will commit updates at this time, and return a final version identifier if possible and if it was not already returned.
  13. A way to submit a new version of a resource (PUT)
    The server should be able to attach it to the correct part of the version tree, based on the version number associated with the resource before its modification.
  14. A way for a user-agent to request a version identifier for a checked out version.
    Such an identifier will not be used by any other user-agent in the meantime. The server may refuse the request.
  15. A way for a client to propose a version identifier upon submitting a version of a resource
    The server may refuse to to use the client's suggested version identifier.
  16. A way for a Client to supply metatdata to be associated with a version
    The kinds of data supplied here might be simple textual comments or more structured data. An ability to attach aritrary fields and content is probably required, but a standard set of attributes that would enable interoperation would be useful. For basic versioning we need only specify, for example, that comments are attached as the message-body of the operation that releases a write intention. The special formats for structured metadata can then be handled by using content-type negotiation, and the content-types defined as part of the Configuration Management layer.
  17. A way for a server to provide a version identifier to be used for a resource in further operations/
    This general requirement notes that version aware clients are responsible for providing the appropriate version identifier for a resource that is being manipulated. In particular, if a resource is being modified, any server provided version must be used when submitting an update. This allows servers to track active sessions (however they may be implemented by the server) by assigning version identifiers when documents are retrieved, locked, or reserved.

The following discussion of possible implementations of the requirements above is intended to aid understanding of the requirements. It is not a statement that a particular implementation is a requirement for basic versioning, but an explanation of how the separation of concerns might improve the final implementation architecture.

The requirements on reservation and PUT take care some key global requirements: version access is logically separated from access control (RESERVE/RELEASE) and updating. In terms of traditional CM, a CHECKOUT is a RESERVE followed by a GET and a CHECKIN is a PUT followed by an RELEASE. By separating access control (locking and unlocking of resources) from modification of resources, we achieve a great deal of versioning-style independence.

We also have very flexible options for the negotiation of version identifiers depending on server policy. The version identifier of a new resource can be negotiated between the user-agent and the server at 3 points in time: when a lock is taken, when the lock is released, or when the resource is POSTed. Session tracking can be implemented by using special version identifiers for RESERVE and RELEASE. All version identifier negotiation follows a simple rule: "the client proposes, but the server disposes."

Acknowledgements

This document is a result of the vigorous and valuable discussion on the Versioning on the Web <www-vers-wg-request@ics.uci.edu>, and the Distributed Authoring <w3c-dist-auth-request@w3.org& gt; mailing lists. All the the interactions on these lists have been helpful, as have several conversations. David Fiander's initial requirements got us started and clarified several points . Jim Whitehead provided useful criticism, some new points, and impetus to get this thing out the door. Yaron Golan and Christopher Seiwald provided extensive commentary and discussion.

The following list include the above and others who have also helped either with their postings, personal email or face-to-face discussions:

Dan Connolly, World Wide Web Consortium, connolly@w3.org
Ron Fein, Microsoft, ronfe@microsoft.com
David Fiander, Mortice Kern Systems, davidf@mks.com
Roy Fielding, U.C. Irvine, fielding@ics.uci.edu
Yaron Goland, Microsoft, yarong@microsoft.com
Dave Long, America Online, dave@sb.aol.com
Henrik Frystyk Nielsen, World Wide Web Consortium, frystyk@w3.org
Larry Masinter, Xerox PARC, masinter@parc.xerox.com
Murray Maloney, SoftQuad, murray@sq.com
Christopher Seiwald, Perforce Software, seiwald@perforce.com
Judith Slein, Xerox, slein@wrc.xeroc.com

To Do

Mandatory IETF formatting. Proofread. Spell check. Sanity check.