Minutes

Editor's note: These minutes are the result of merging notes taken by Dennis Hamilton, Keith Dawson, and Jim Whitehead. In particular, the detailed, comprehensive notes taken by Dennis Hamilton form the core of this material. These original notes were edited to make them more readable in a standalone document, and to add context so that people who did not attend the meeting could still understand the discussion. Thus, text attributed to individuals should not be interpreted as a direct transcription of what they said in the meeting, but rather as a representation of their content or position, consistent with what was actually spoken.

Morning - 9AM

The meeting began with all present introducing themselves, and stating their organization.

History of Distributed Web Authoring - Whitehead

Whitehead gave a presentation on the history of distributed web authoring. The first web authoring tool was the Nexus browser (1989/1990), the Next machine browser/editor developed by Tim Berners-Lee. In 1992, the Mosaic 2.4 browser achieved critical mass, and created the "publish/browse" technical frame of reference for the WWW. In 1992, the HTTP PUT method disappeared from the HTTP 1.0 specification because this specification was intended to reflect the practices of existing HTTP servers, which did not have write capability. In 1994/95, development took place on NaviPress/NaviServer (now AOLpress/AOLserver), Vermeer (now Microsoft) FrontPage, and the World Wide Web Consortium (W3C) Line Mode Browser, the first generation of distributed web authoring tools.

In December, 1994, there was a breakout session on distributed web authoring tools at the WWW4 conference. The focus of this session was how to achieve interoperability among distributed web authoring tools. This session identified the following interoperability issues: common access control model, "lost update" problem, need for BROWSE and MKDIR HTTP methods, editing of variants, access to "raw" HTML before server-side include (SSI) processing, strong authentication, and placing a link to the HTML Standard Generalized Markup Language (SGML) Document Type Description (DTD) in the HTML source.

In 1995/96, Rohit Khare at W3C researched a resource leasing and locking mechanism. In March 1996, Dan Connolly of the W3C put out a call for volunteers to coordinate distributed authoring activity, and Jim Whitehead of U.C. Irvine volunteered, forming the Working Group on Distributed Authoring on the World Wide Web, and the Working Group on Versioning and Configuration Management of World Wide Web Content. Though Whitehead received much appreciated assistance from Dan Connolly of the W3C, and Larry Masinter of the IETF HTTP Working Group, neither the distributed authoring or the versioning and configuration management group has official sponsorship from either the W3C or the IETF.

In June, 1996, the IETF HTTP Working Group completed an internet draft of HTTP version 1.1, including a PUT (write) method. In July 1996, the IETF HTTP working group completed an internet draft on digest authentication for HTTP. On July 10, 1996, the Working Group on Distributed Authoring on the World Wide Web met at AOL Productions, San Mateo, California.

Working Group Descriptions and Purposes - Whitehead

Whitehead next gave a presentation stating the best current description of the purpose and activities of the Working Group on Distributed Authoring on the World Wide Web, and the Working Group on Versioning and Configuration Management of World Wide Web Content. Whitehead stated that the mission of the distributed authoring group is to make distributed authoring as pervasive as browsing is today. This could be achieved by specifying preliminary modifications to existing internet specification (e.g. HTTP), and by establishing usage conventions (e.g., URL naming conventions). These preliminary specification and usage conventions would then be forwarded to appropriate bodies, such as the IETF HTTP Working Group, for final discussion and potential incorporation into existing internet standards. (Further discussion of this issue took place in the afternoon.)

The Working Group on Distributed Authoring has a home page at URL:

http://www.ics.uci.edu/~ejw/authoring/

It also has a mailing list, <w3c-dist-auth@w3.org>, hosted by the World Wide Web Consortium.

At this point there was some discussion about membership on the mailing list. Whitehead stated that he hasn't yet turned down anyone who wanted to join the list, and he wants high participation on the list. However, it is a managed list. Whitehead wants the ability to gag (after warning) list participants who are not behaving constructively. Masinter stated that this is a perilous course, that it is better to just leave the list open and be selective in reading and responding.

There was some discussion about the differences between W3C mailing lists, which do not have to have open membership, and IETF working group mailing lists, which do.

Masinter: Irrespective of whether the mailing list is open or restricted, gagging on an individual basis just doesn't work out. I suggest having solid guidelines about the list and what is expected of participation, and have that known in advance.

Whitehead: We will discuss membership and sponsorship further later today.

Masinter: You need to decide on sponsorship of this working group so that the participants who are concerned about legal questions (e.g., anti-trust and intellectual property rights) around participating with others can be satisfied and operate appropriately.

The Working Group on Versioning and Configuration Management of World Wide Web Content has as its goal the development of a preliminary specification for how to provide versioning and configuration management of content served by an HTTP server. This goal can be achieved through the same means used by the distributed authoring group: extension of existing specifications, and establishment of usage conventions. While the issues considered by the distributed authoring group also include versioning and configuration management, the versioning group was created as a sub-group of the distributed authoring group to partition the issue space due to the complexity and number of issues involved.

The Working Group on Versioning and Configuration Management has a home page at URL:

http://www.ics.uci.edu/~ejw/versioning/

It also has a mailing list, <www-vers-wg@ics.uci.edu>, hosted by the Department of Information and Computer Science at the University of California, Irvine.

Seiwald: Is it possible to separate issues of versioning and configuration management from distributed authoring, and how do we reconcile different interests?

Whitehead: There is significant overlap between the two issue spaces. For example, access control is considered by both groups. However, due to the size of the versioning and configuration management issue space, and the different set of parties interested in this issue, such as configuration management companies, it makes sense to have a separate group address this issue. There will be close collaboration between the two groups, and there is currently significant overlap in membership.

Hamilton: The issue spaces aren't orthogonal, but we want to factor these topics (collaboration, versioning, authoring) so that we can proceed in parallel as much as possible.

Seiwald made a comment about configuration management being a result of cooperative authoring and how versioning and authoring don't get too far apart.

Fein asked about previous work on collaborative editing and the WWW.

Nielsen: There has been some discussion about collaboration in other groups, and there was a workshop on the subject.

Masinter: What are people interested in?

At this point we went around the room and reviewed what people were interested in.

Question: Have any of the people from Netscape Navigator Gold been asked to participate in this meeting? Long had some contact but it seems to have gone quiet at his end at the moment. Whitehead mentioned that Netscape was contacted prior to the meeting, but had declined to participate. Before the meeting someone from Netscape did telephone Dan Connolly at the W3C to mention they would be unable to attend the meeting, but were interested in participating in the working group.

There was a brief discussion about including multiple repository searching, etc. by Long, and Nielsen talked about the meeting recently held about indexing, Harvest, etc.

Collaborative Authoring in Microsoft Word - Fein

Fein gave a presentation on collaborative authoring in Microsoft Word, with brief comparisons to other word processors.

Fein mentioned that the efforts of this group are already too late to impact Word 8, which is scheduled to be released this fall for Windows 95, NT and Macintosh.

Fein listed the five most important collaboration features for word processors:

  1. Revision Marking
  2. Annotations
  3. Merging Documents
  4. Comparing Documents
  5. Access Control

Revision marking slide:

Fein discussed how MS Word and Lotus WordPro both track revisions to a document by storing the revision history in the document file. This has the advantage that moving the document to a floppy doesn't cause the loss of revision information. The desired capability is to know who wrote what, and when they wrote it, at a fine-grain level within the document (i.e., character, word, and paragraph level). It is also desirable to have protection from unauthorized changes by requiring people to submit deltas for approval and selective incorporation. One way of providing this desired capability is to have provisional edits stamped with user/time, which are then incorporated into the document during an accept/reconcile process by the main document author.

Experience with LAN usage shows that the access speed hit for storing the revision history in the document is acceptable. However, Fein suspects this will likely not be the case when retrieving or saving a file at 28.8 baud (the fastest mass-market modem speed).

Experience with Microsoft Internet Explorer has shown that converting a Word file to HTML causes irretrievable loss of revision information. Ideally, it would be nice to have the choice of file format be transparent to the user.

Fein also provided a brief discussion of versioning in Lotus WordPro (not on his slides).

Hamilton: The business of posting revisions or deltas as part of the document file is an approach in OpenDoc.

Burns: The Dynabook DynaWeb server manages deltas within their database, and only delivers the version users want.

There was some discussion about embedding revision information into HTML comments. This has the drawback that it makes the comments encoded. Whitehead mentioned that David Durand and Fabio Vitali are working on a versioned HTML called "VTML."

Annotations slide:

Unlike revisions, annotations are not proposed changes to the document. Within Word, annotations are a separate text stream anchored to regions of the main text. Fein mentioned the desirability for annotations which are not links (i.e., a few lines of text rather than a jump to a new document) to pop up a window within a WWW browser, after a Word file is converted to HTML.

Merging:

Collaborative editing leads to the problem of integration of multiple sets of revisions. Current approach: dump all revisions into the same file, then allow the document editor to choose which revisions to accept. Merging has several implications for the Web. First, merging requires some means of performing revision marking so users can choose between different revisions. Second, until all revisions have been merged, the document is in a half-finished state, and should not be world-readable until the merge has been performed in a rational way.

Comparing versions:

The functionality in Microsoft Word for comparing two different versions of a document was originally developed for the legal market so they could analyze changes made to a contract by a potentially hostile party who will not tell you what changes they have made. This is more difficult than the plain text (e.g. source code) comparison (or 'diff') case, since there is a need to perform differencing on logical parts of a document, such as sentences, paragraphs, and pages, rather than just individual lines of text.

Access control:

The issue here is how to provide fine-grain (individual revisions within a document) permissions to different users. WordPro has very good fine-grain document access control. A difficult access control situation occurs when one person's revisions aren't visible to all members of a work group (e.g., politically sensitive comments).

It was mentioned that the Adobe PDF document format allows selective modifications to a document.

Masinter: It is hard to see how a document can carry this amount of authorization.

Future plans of Word:

An Intranet Day presentation by Bill Gates demoed two features of an upcoming Word release.

Nielsen: The Amaya structured HTML browser and editor (W3C work performed at INRIA) does this already.

Masinter: The Shared Books system at Xerox might have done this too.

(General acknowledgment that these capabilities have existed in previous research systems.)

Some Word 8 features:

One problem they have been experiencing while working on Word is how to make a user interface which is good for both browsing and authoring simultaneously. Certain authoring features are undesirable during browsing, for example, the feature which marks misspelled words with a wavy red underline -- the user doesn't want to know this for sites they didn't author. One solution is to have a mode-based interface, with distinct browsing and authoring modes. However, Fein would prefer to have editing and browsing occur within the same mode.

Long: We encountered this problem in NaviPress as well. Out solution is: you are in browsing mode until you place your cursor in the window, then you switch to browsing/authoring mode.

Another participant mentioned that Netscape Gold has an authoring mode which is separate from their browsing mode, even popping up a separate authoring window.

Fein discussed the desirability of having the file format (e.g., HTML, Word native) and connection type (e.g. LAN, SLIP) be visible, but still the same operation to the user, for most cases. For example, HTML should be just one option for saving, rather than having a separate 'Save to URL/HTML' command.

Since the native Word document type is a richer format than HTML, there is some information loss going from Word to HTML. There are also some aspects of HTML which don't convert exactly to native Word format. This causes a problem for 'out-of-band' editing of HTML, that is, editing an HTML document that was created by converting a Word document to HTML, and then wanting to recover those changes back into Word. This led into the broader issue of whether the Web should support all word processing features.

Brown mentioned that some of their customers really cared about the format of generated HTML because they were feeding the generated HTML into other tools.

The issue of multi-purpose authoring was also raised: how best to have just one document type which is good for CD-ROM, for the Web, for printing, etc. Especially for printing, bandwidth becomes an issue. It is acceptable to retrieve a high-resolution image over a LAN to produce high-quality printed output, but this is probably not acceptable when working over the Internet, since it is not as fast. Are users willing to wait half a day to download the high-quality files to produce a high-quality printout?

There was some discussion about how best to handle writing a resource which contains many related resources, such as a web page and associated graphics.

Masinter: The Web mail group has been working on a way of agglomerating many files into one file, using the multipart/related MIME type. This allows the sending of a web page (many files together to form one page) via email.

There was some discussion about the need for a "GET for edit" method which is distinct from a "GET of source" or a "GET for browse" (the current behavior of HTTP). The need for this becomes clear when revision information is stored in a resource. In this case, a user performing an edit may not want the full source, because they do not want to incur the time delay of downloading the full revision history, but they may not want the browsable version either, due to server-side include processing.

Interoperability:

Existing tools do not all share the same editing model. For example, with FrontPage, it is possible to have a company name stored in one place. When this company name is modified, the change is then propagated to N leaf nodes. This is considered to be only one change to FrontPage, but it might be considered N separate changes to another tool.

Another issue is defining behaviors for less savvy tools, for example, what should revisions look like to a non-revision savvy browser?

Integrations with existing versioning systems -- this topic should be handled by the versioning working group.

In conclusion, Fein stated that word processing applications already have collaboration features, which are not now available in existing Web authoring tools. Microsoft is committed to a standards-based approach to collaborative authoring. In the long term Fein sees Word becoming the main editing engine, while FrontPage will concentrate on its facilities for site management, offering a higher web functionality.

Schulert: I see FrontPage being a more high-end, more web-focused tool than Word.

Whitehead: Can you tell us which HTTP method will be used in Word 8 for its "Save to URL" capability?

Fein: Sorry, I cannot talk about this.

Nielsen stated that the Word team should use the PUT method rather than the POST method. Schulert responded with a discussion about why the POST method was chosen by FrontPage group. The primary reason is that when they began development of FrontPage, POST was implemented and standard across servers, while PUT was not. Nielsen stated that the HTTP 1.1 PUT is now much more usable for writing content to an HTTP server than the HTTP 1.0 PUT.

Masinter asked whether the HTTP 1.1 version of PUT is adequate for putting versions with metadata, and also about the atomicity of PUTs? Long suspects that HTTP 1.1 still has some problems in this area.

Nielsen: Need to aim for December timeframe to get into HTTP 1.2, since there are many products being worked on, and HTTP 1.2 appears to be the last revision of HTTP.

Masinter: This group needs to produce some input for the HTTP Working Group by the end of the summer to have results put into HTTP 1.2.

Distributed Authoring - Nielsen

Nielsen began by reiterating the need to work quickly to get results into HTTP 1.2.

Nielsen would like to be able to create a document once and then have it seamlessly: (1) sent to a friend via email, (2) posted to a newsgroup, and (3) written to a web page, all using a common PUT/POST model. Some issues this raises are what transaction model is used? What quality of service do you want to have? Nielsen used the example of having icons on the desktop representing people, web servers, news servers, etc. with the capability of dragging a document onto one or more of these icons and then saying "submit". This causes the appropriate email post, newsgroup post, and web page write to occur simultaneously with appropriate error handling when things go wrong.

Nielsen also favors using the LINK mechanism within HTML to have rich relationship semantics among documents. These links could serve a number of purposes, for example, serializing a hyperweb of documents so they can be printed, or linking a table of contents to chapters of document. These relationships could also be used to create alternate hierarchies to the containment hierarchy offered by the directory structure of most existing web servers. This more general model can handle web servers implemented on top of a database as well as on top of a file system.

In Nielsen's view, during a PUT, the client may make suggestions about the destination URL of a resource, but is not in charge of the name space of the server. The server ultimately decides where it will place a resource in its name space. The server uses the link relationships to determine the location and hierarchical arrangement of resources within its namespace, rewriting HTML as necessary to preserve the relationships.

Some meeting participants were unaware of the LINK tag, and so there was a brief discussion describing it, and what it can be used for. This led into a discussion of the desirability of having a standard for LINK REL tags (usage convention). It was mentioned that Murray Maloney had previously written a document on standard relationships, and that the time might be ripe to revisit this.

Hamilton: What about HyTime? It can project structure onto a document.

Frystyk: Same with Hyper-G.

Hamilton: There is a need for addition of structure that is not embedded into a document, like a linkbase system.

There was some discussion about the advantages of using standard link relationships. Concern was raised that a full consideration of this functionality would take longer than the two months remaining before the end of the summer deadline for impacting HTTP 1.2.

Masinter: The two month timeframe is only for changes to HTTP. However, there is the PEP (Protocol Extension Protocol) which can support changes beyond 1.2.

There was some discussion about whether there is a need for two phase commit transaction semantics for PUT operations in HTTP.

Wills: Should versioning be tied to a particular versioning system, like CVS (Concurrent-Versions System)?

Nielsen: Versioning should not be tied to a particular system. Authoring tools should use PEP to get the operations needed to perform versioning with a given system. How a particular versioning system performs locking and access control will be negotiated between the authoring client and HTTP server in some meta language.

Whitehead: Using PEP is not necessary, since it is possible to implement several different versioning styles using a common set of atomic operations.

Masinter: Perhaps the variability between versioning styles can be expressed in standard ways, such as by using a forms-based interface.

Seiwald: Whitehead says we can do a lot with some atomic operations, which allow for basic interoperability across versioning systems, so perhaps we should al least do this.

Nielsen: I don't want to end up with an enumeration of particular version control systems, but by using a negotiation model common methods will emerge.

Masinter: Registration and negotiation is the fallback when standardization is not possible. It is better than chaos, but standardization is better than negotiation.

Whitehead: There are also significant technical problems with negotiation. For example a lock simply prevents writing to a resource by all users except the lock holder. Describing a lock thus requires an access control meta language, the description and implementation of which is a difficult problem.

Nielsen: What goes into HTTP has to be independent of the versioning model used.

(There was agreement on this point.)

Back to presentation:

Nielsen: Caching needs unique names for versions and variants so caches can deal with them uniquely.

Masinter: No, HTTP/1.1 doesn't require unique names. Could perhaps use a version header instead, which the cache could understand.

There was some discussion about the possible use of entity tags in a versioning scheme.

Masinter: There should be some investigation of if-match and PUT usage. There doesn't really need to be that much support for versioning. There is a tradeoff in putting a new version header on the request versus putting version identifiers into the name space.

Nielsen mentioned that PUT does not support writing to resource byte ranges (i.e., sub parts of a resource).

Nielsen also mentioned that the Jigsaw server has an integration with CVS, and that the Amaya browser/editor will be 1.1 compliant as quickly as possible.

Nielsen also mentioned the need to make PUT reliable in practice, by which he means ensuring that PUT doesn't drop data on the floor during a transaction (lost data is bad).

There was some discussion about the applicability of SHTTP (Secure HTTP) for secure writes. According to Masinter, the SHTTP specification is in last call and will be approved very soon. Nielsen expressed the opinion that SHTTP is very difficult to implement.


University of California, Irvine
Jim Whitehead <ejw@ics.uci.edu>
Department of Information and Computer Science
247 ICS2 #3425
Irvine, CA 92697-3425

Last modified: 24 Jul 1996