Hi. 2:30 in the morning, I guess it's time for a little session of confessions. Andre van der Hoek writes: > * It looks like different users are trying to push different capabilities. > Although expected, I think we should agree on a common `goal' for > versioning in HTTP. Are we trying to simply put versioning capabilities > in HTTP, are we trying to put *a* versioning capability in HTTP, are > we trying to seamlessly integrate versioning and non-version aware > clients/servers,etc. It looks like the group needs an objective very >much; > this in itself could be an interesting exercise, because I think we > will be able to identify multiple layers of functionality that can be > built on top of eachother. Today's topic is "what would I like versioning in the WWW for?" Please note how I avoided the phrase "versioning in HTTP". Ok. everything starts for me with Ted Nelson and his Xanadu project. The whole worldly literature on line, and users adding to it, and commenting, linking, quoting and including everybody else's stuff in their own documents. A vision of the scale of the WWW, but active, collaborative, participatory. Two issues are important: existing and legacy data shouldn't require particular adaptation in order to be put on line. Every user is allowed to add links and comments (maybe just private ones, that only she and her friends can access) to any document she can read. Or, even simpler, every user is allowed to create a live quotation from any existing document into one of her own. By live quotation, I mean a piece of text that it is taken from another document, that allows the reader to still access the original home of the piece (the document in which it was originated), and that may automatically update to the evolving content of the original home document. Because of the first issue, some documents may become quite large. This, and any implementation of live quotations, say that we want references (in links and in quotations) to be pointing to specific parts of the documents (i.e. point-to-point links). The second issue say that it is impossible for these references to be stored within the host document, as named objects similar to <A NAME> anchors. In fact, on a very large scale it would be possible for a successful document to have thousands of link end-points. Possible solutions include requesting the author of the document to name all the interesting end-points in her document, which is impractical, imperfect, incomplete, and basically leaves everybody at her mercy, or store the link in an external link-base to which every user may have write access. For this to work, a general addressing mechanism must be provided, which does not rely on explicit names, but depends on the structure of the document itself (for instance, in streams such as text, the character offset). External link bases are a really good idea, because they allow private links to be created, they allow private links to be shared with others, and even to be published independently of the data they refer to. Unfortunately, for any such addressing mechanism there are editing operations that would make the references invalid. For instance, if you store a reference to offset 34, and the author adds 20 more characters at the beginning of the text, the offset is no longer valid. The update of the offset becomes therefore an important issue. Solutions have included the requirement that all existing references to a document are kept outside of the document, but local to it (the Dexter Hypertext Model works in this way, for instance, as well as many real hypertext systems). Thus, whenever the changes to a document are committed, it becomes easy to update all the references to the new, correct positions. Unfortunately, this solution doesn't scale well: presuming that Yahoo keeps a copy of every link that anyone in the world has created to one of its pages, and updates the address values regularly, is absolutely laughable and unbelievable. Another solution that has been proposed is the use of heuristics in the determination of the current address of the link. That is, enough context information is kept of the whereabouts of the link end-point, so that when the link is discovered to be outdated, it is possible with some approximation to retrieve the correct position. This method is obviously unreliable and imprecise. The Microcosm hypertext system should work in this way. A different solution is given by versioning. By requiring that every document that we want to be linked is versioned, and by using a versioning mechanism that can keep track of individual changes from version to version, we can easily deduce the current position (if still existing) of a stored address, even if there has been no contact and reciprocal awareness between the stored link and the document it refers to. External links are possible and reliable on versioned documents. This is the main reason for VTML to exist, and to be a fine-grained change tracking mechanism, and for being less interested in other classical versioning issues such as configuration management. Another situation for which VTML has been thought is the support for asynchronous collaboration in the writing of shared documents. VTML should support independent writing of single documents to naturally branch in different versions, and should help authors to easily distinguish individual contributions, compare them and merge in converging final versions. Differently from David Durand, I don't believe in automatic merging, and prefer authors to individually select the changes to be incorporated in the merged document. This kind of collaboration implies that it is a fairly common operation the request to show an older version, or someone else's version of some document. For this reason, I believe that this should be a client-based operation, and that the server should always provide the complete set of available versions of a document to a versioning aware client, and should let the client deduce and display the specific version that the user selected. This is why VTML is capable of storing all the individual changes made in all the existing versions of a document in the same data stream, so that the client receives in the most compact form the complete set of changes that have been performed on the document. External deltas of specific versions should be used for efficient check-ins, efficient updates of client-stored versioned documents ("who has checked in new versions of this document since the last time I asked?") and efficient storage and concurrency control. Finally, I believe that all this should be COMPLETELY hidden to the unaware user, who could go on and do her chores absolutely oblivious that the editor is version-aware, and ignore versioning issues until suddenly she needs some of the features, such as recovery of erroneously deleted data, comparison and merge of independently modified versions, rollback to a previous, safe state. This explain why version creation and management in VTML can be completely automatic and hidden, for the user is not even required to decide about the version names and freezing parameters unless she needs to. All this also explain why VTML doesn't actually need any change in the HTTP protocol, doesn't require locking mechanisms on the server (which are STATEFUL!), and basically exploit SGML notion of putting as much information as possible in the data rather than in the code. Can somebody find a use for this nonsense? Fabio Vitali