Re: Version identifier in URL

Fabio Vitali (vitali@cis.njit.edu)
Fri, 7 Jun 1996 03:51:23 -0500


Hi.

Since I was feeling like I needed some thrills in this boring evening, I
have decided to risk my life and belongings contributing to the already
heated discussion on URL decoration for versions.

Let me summarize my understanding of the proposals that have been made:

1) http://host/path/resource,x
proposed by Pettengill and Arango and discussed because of problems with
the "," character, being allowed in URLs. Christopher Seiwal though prefers
it over ";".

2) http://host/path/resource;v=x
proposed by Jim Whitehead and dissolved into schema n. 3 for clear
similarities and for being less expressive than that one.

3) http://host/path/resource;version=x
proposed by MKS (David Fiander), seconded by Jim Whitehead and strongly
objected to by Daniel Connolly.

4) <A HREF="http://host/path/resource" version=x>
proposed by Robert Pettengi and criticized by David Fiander for requiring a
modification in the HTML DTD. Daniel Connolly likes it though.

5) GET /path/resource HTTP/1.0
   Content-version: x

proposed by Kenji Takahashi and generally criticized for requiring
versioning-aware clients and for being impossible to specify in an anchor.

6) http://host/path/resource/x
also proposed by Kenji Takahashi, never really discussed but used by
Christopher Seiwal for several examples.

7) SET_CONFIG tables-no-frames HTTP/?.?
   followed by simple GETs
proposed by Keith Dawson and generally criticized for being stateful.

8) http://host/path/resource?version=x
proposed by Larry Masinter as a provocation and surprisingly resisting
widespread attacks having been reproposed by Daniel Connolly.

Well, then, I would like to throw in my 5 italian liras (~0.3 US cents) to
the discussion. Basically, of the three proposals still alive -- 3, 6 and 8
-- I don't like 3 and 8, and like 6 very much.

Maybe we agree that versions are usually organized in a more or less
hierarchical structure, the configuration. By this I don't mean that the
space is a strict tree, i.e. that every leaf can only be reached by
specifying one and only one path, but that there is a non-ambiguous "path"
to specify the resource that goes from the most general to the most
specific. For instance, file.c in the french version of the macintosh
version of the second release of product X. The path may not be unique,
because the same file, non containing any language or platform related
code, may also appear in the chinese version of the Windows version.
The configuration hierarchy may be trivial (e.g. version 3.4 of file.html),
but still is a hierarchy.

http://www.w3.org/pub/WWW/Addressing/URL/uri-spec.html has this to say
about hierarchies:

>The slash ("/", ASCII 2F hex) character is reserved for the delimiting of
>substrings whose relationship is hierarchical.

The fact that the slash is used basically as a way to specify the path of a
file within the file system accessed by the server is NOT a valid reason
for limiting its use to this purpose. Indeed:

>The similarity to unix and other disk operating system filename
>conventions should be taken as purely coincidental, and should not be
>taken to indicate that URIs should be interpreted as file names.

Now it seems to me that this is the best way to go for specifying version
numbers for versioned entities.

- The slash is already reserved for hierarchies. Hierarchies of different
origin can be piled by juxtaposing the path name. This can't be said for
the other proposed schemas:
        ftp://host/dir/file;A;version=x
or
        wais://host/database/c/A;B;version=x
while being outside of the scope of a discussion on HTTP, look pretty silly
anyway and would make the ftp or wais access to these resources impossible
or completely different.

I don't particularly like either schema 8, and it's clear the reason::
        http://host/cgi-bin/tool.pl?version=x?a=b

On the other hand,
        http://host/dir/projectX/1.5/Macintosh/French/file.c
        http://host/dir/projectX/file.c/1.5/Macintosh/French
        http://host/dir/projectX/French/Macintosh/1.5/file.c
        http://host/dir/projectX/French/Macintosh/1.5/file.c/george
        http://host/dir/projectX/French/Macintosh/1.5.file.c/draft/2.1

are all non ambiguous and may well refer to exactly the same resource. Hey,
I think it is even possible to simply use the file system for them! And the
schema would also gracefully accept simple hierarchies:

        http://host/dir/file.html/3.4

and even:

        http://host/cgi-bin/x/tool.pl?a=b
        http://host/cgi-bin/tool.pl/x?a=b

Since the file system is opaque at the URL level, it is the server that
determines which part of the URL is the actual path name of the resource,
and which is the specification of the version.

Basically, we are simply making use of an established concept: that the
specification of version 3.4 of document "foobar" is "in some sense" more
specific than the specification of document "foobar" itself (which could be
either 'the default version of document "foobar"' or, as in VTML, 'all the
versions of document "foobar" put together').

Finally, I find slashes easier to read than adding yet another decoration.
Suppose I want to access  anchor "chapter1" of the document described by
author="bill" and name="foo" as retrieved by version Y of the search engine
"application". For instance, what would be the right syntax in this case?

        http://host/cgi-bin/application;version=Y?author=bill+name=foo#chapter1
        http://host/cgi-bin/application?author=bill+name=foo;version=Y#chapter1
        http://host/cgi-bin/application?author=bill+name=foo#chapter1;version=1

Compare to the simple:
        http://host/cgi-bin/application/Y?author=bill+name=foo#chapter1

Drawbacks:
- Well, it is not clear by simply reading the URL which is the name of the
file and which is the specification of the version. This can actually be a
merit from the URI point of view.
- I don't know if there exists a file system in which a file and a
directory can have exactly the same name, and are distinguished by the fact
that one is a directory and the other is a file. I mean, if it were
possible for a directory to have both a file and a directory called
file.html, then the URL:
        http://host/dir/file.html/1.5
would be ambiguous. If there is such a perverted file system, I will
withdraw my points and leave to become a goatherd in Sardinia (Italy).

Comments?

Fabio Vitali