*** draft-fielding-url-syntax-07.txt Mon Sep 22 15:34:29 1997 --- draft-fielding-url-syntax-08.txt Tue Oct 14 23:02:52 1997 *************** *** 1,10 **** Network Working Group T. Berners-Lee, MIT/LCS INTERNET-DRAFT R. Fielding, U.C. Irvine ! draft-fielding-url-syntax-07 L. Masinter, Xerox Corporation ! Expires six months after publication date September 22, 1997 Uniform Resource Locators (URL): Generic Syntax and Semantics Status of this Memo This document is an Internet-Draft. Internet-Drafts are working --- 1,12 ---- Network Working Group T. Berners-Lee, MIT/LCS INTERNET-DRAFT R. Fielding, U.C. Irvine ! draft-fielding-url-syntax-08 L. Masinter, Xerox Corporation ! Expires six months after publication date October 14, 1997 ! Uniform Resource Locators (URL): Generic Syntax and Semantics + Status of this Memo This document is an Internet-Draft. Internet-Drafts are working *************** *** 55,61 **** dealing with characters outside of the US-ASCII character set; those recommendations are discussed in a separate document. ! All significant changes from the prior RFCs are noted in Appendix F. 1.1 Overview of URLs --- 57,63 ---- dealing with characters outside of the US-ASCII character set; those recommendations are discussed in a separate document. ! All significant changes from the prior RFCs are noted in Appendix G. 1.1 Overview of URLs *************** *** 150,157 **** components in the scheme. There is a `relative' form of URL reference which is used in conjunction with a `base' URL (of a hierarchical scheme) to produce another URL. The syntax of hierarchical URLs is ! described in section 4, and the relative URL calculation is described ! in section 5. 1.5. URL Transcribability --- 152,159 ---- components in the scheme. There is a `relative' form of URL reference which is used in conjunction with a `base' URL (of a hierarchical scheme) to produce another URL. The syntax of hierarchical URLs is ! described in Section 4, and the relative URL calculation is described ! in Section 5. 1.5. URL Transcribability *************** *** 459,465 **** and a reference containing only a fragment identifier is a reference to the identified fragment of that document. Traversal of such a reference should not result in an additional retrieval action. - However, if the URL reference occurs in a context that is always intended to result in a new request, as in the case of HTML's FORM element, then an empty URL reference represents the base URL of the --- 461,466 ---- *************** *** 584,609 **** be a security risk in almost every case where it has been used. The host is a domain name of a network host, or its IPv4 address as ! a set of four decimal digit groups separated by ".". A suitable ! representation for IPv6 addresses has not yet been determined. hostport = host [ ":" port ] ! host = hostname | hostnumber hostname = *( domainlabel "." ) toplabel domainlabel = alphanum | alphanum *( alphanum | "-" ) alphanum toplabel = alpha | alpha *( alphanum | "-" ) alphanum ! hostnumber = 1*digit "." 1*digit "." 1*digit "." 1*digit port = *digit Hostnames take the form described in Section 3.5 of [RFC1034] and Section 2.1 of [RFC1123]: a sequence of domain labels separated by ".", each domain label starting and ending with an alphanumeric character and possibly also containing "-" characters. The rightmost ! domain label will never start with a digit, though, which ! syntactically distinguishes all domain names from hostnumbers. To ! actually be "Uniform" as a resource locator, a URL hostname should ! be a fully qualified domain name. In practice, however, the host ! component may be a local domain literal. The port is the network port number for the server. Most schemes designate protocols that have a default port number. Another port --- 585,614 ---- be a security risk in almost every case where it has been used. The host is a domain name of a network host, or its IPv4 address as ! a set of four decimal digit groups separated by ".". Literal IPv6 ! addresses are not supported. hostport = host [ ":" port ] ! host = hostname | IPv4address hostname = *( domainlabel "." ) toplabel domainlabel = alphanum | alphanum *( alphanum | "-" ) alphanum toplabel = alpha | alpha *( alphanum | "-" ) alphanum ! IPv4address = 1*digit "." 1*digit "." 1*digit "." 1*digit port = *digit Hostnames take the form described in Section 3.5 of [RFC1034] and Section 2.1 of [RFC1123]: a sequence of domain labels separated by ".", each domain label starting and ending with an alphanumeric character and possibly also containing "-" characters. The rightmost ! domain label of a fully qualified domain name will never start with a ! digit, thus syntactically distinguishing domain names from IPv4 ! addresses. To actually be "Uniform" as a resource locator, a URL ! hostname should be a fully qualified domain name. In practice, ! however, the host component may be a local domain literal. ! ! Note: A suitable representation for including a literal IPv6 ! address as the host part of a URL is desired, but has not yet ! been determined or implemented in practice. The port is the network port number for the server. Most schemes designate protocols that have a default port number. Another port *************** *** 756,762 **** | | `----------------------------------------------' | | | | (5.1.3) URL used to retrieve the entity | | | `----------------------------------------------------' | ! | (5.1.4) Base URL = "this_message:/" | `----------------------------------------------------------' 5.1.1. Base URL within Document Content --- 761,767 ---- | | `----------------------------------------------' | | | | (5.1.3) URL used to retrieve the entity | | | `----------------------------------------------------' | ! | (5.1.4) Default Base URL is application-dependent | `----------------------------------------------------------' 5.1.1. Base URL within Document Content *************** *** 775,812 **** of how the base URL can be embedded in the Hypertext Markup Language (HTML) [RFC1866] is provided in Appendix D. ! MIME messages [RFC2045] are considered to be composite documents. ! The base URL of a message can be specified within the message ! headers (or equivalent tagged metainformation) of the message. For ! protocols that make use of message headers like those described in ! MIME [RFC2045], the base URL can be specified by the Content-Base ! or Content-Location [RFC2068] header fields. ! ! Content-Base = "Content-Base" ":" absoluteURL ! ! Content-Location = "Content-Location" ":" ! ( absoluteURL | relativeURL ) ! ! The field names are case-insensitive and any whitespace inside ! the field value (including that used for line folding) is ignored. ! Content-Base takes precedence over Content-Location when both are ! present within the same header field set. If a Content-Location ! value is relative, it must be resolved to its absolute form (like ! any relative URL) before it can be used as the base URL for other ! references. ! ! For example, the header field ! ! Content-Base: http://www.ics.uci.edu/Test/a/b/c ! ! would indicate that the base URL for that message is the string ! "http://www.ics.uci.edu/Test/a/b/c". The base URL for a message ! serves as both the base for any relative URLs within the message ! headers and the default base URL for documents enclosed within the ! message, as described in the next section. ! ! Protocols which do not use the RFC 822 message header syntax, but ! which do allow some form of tagged metainformation to be included within messages, may define their own syntax for defining the base URL as part of a message. --- 780,789 ---- of how the base URL can be embedded in the Hypertext Markup Language (HTML) [RFC1866] is provided in Appendix D. ! A mechanism for embedding the base URL within MIME container types ! (e.g., the message and multipart types) is defined by MHTML ! [RFC2110]. Protocols that do not use the MIME message header syntax, ! but which do allow some form of tagged metainformation to be included within messages, may define their own syntax for defining the base URL as part of a message. *************** *** 819,837 **** document is the base URL of the entity in which the document is encapsulated. - Composite media types, such as the "multipart/*" and "message/*" - media types defined by MIME [RFC2046], define a hierarchy of - retrieval context for their enclosed documents. In other words, - the retrieval context of a component part is the base URL of the - composite entity of which it is a part. Thus, a composite entity - can redefine the retrieval context of its component parts via the - inclusion of a Content-Base or Content-Location header, and this - redefinition applies recursively for a hierarchy of composite - parts. Note that this might not change the base URL of the - components, since each component may include an embedded base URL - or a Content-Base or Content-Location header field that - would take precedence over the retrieval context. - 5.1.3. Base URL from the Retrieval URL If no base URL is embedded and the document is not encapsulated --- 796,801 ---- *************** *** 844,855 **** 5.1.4. Default Base URL If none of the conditions described in Sections 5.1.1--5.1.3 apply, ! then the base URL can be considered to be the imaginary URL ! ! this_message:/ ! ! which exists for the sole purpose of resolving relative references ! within a multipart entity. It is the responsibility of the distributor(s) of a document containing relative URLs to ensure that the base URL for that --- 808,818 ---- 5.1.4. Default Base URL If none of the conditions described in Sections 5.1.1--5.1.3 apply, ! then the base URL is defined by the context of the application. ! Since this definition is necessarily application-dependent, failing ! to define the base URL using one of the other methods may result in ! the same content being interpreted differently by different types of ! application. It is the responsibility of the distributor(s) of a document containing relative URLs to ensure that the base URL for that *************** *** 942,948 **** result = "" if scheme is defined then - append scheme to result append ":" to result --- 905,910 ---- *************** *** 982,988 **** Resolution examples are provided in Appendix C. - 6. URL Normalization and Equivalence In many cases, different URL strings may actually identify the --- 944,949 ---- *************** *** 1094,1101 **** [ASCII] US-ASCII. "Coded Character Set -- 7-bit American Standard Code for Information Interchange", ANSI X3.4-1986. ! 10. Authors' Addresses Tim Berners-Lee World Wide Web Consortium --- 1055,1109 ---- [ASCII] US-ASCII. "Coded Character Set -- 7-bit American Standard Code for Information Interchange", ANSI X3.4-1986. + 10. Notices ! Copyright (C) The Internet Society 1997. All Rights Reserved. ! ! This document and translations of it may be copied and furnished to ! others, and derivative works that comment on or otherwise explain it ! or assist in its implementation may be prepared, copied, published ! and distributed, in whole or in part, without restriction of any ! kind, provided that the above copyright notice and this paragraph are ! included on all such copies and derivative works. However, this ! document itself may not be modified in any way, such as by removing ! the copyright notice or references to the Internet Society or other ! Internet organizations, except as needed for the purpose of ! developing Internet standards in which case the procedures for ! copyrights defined in the Internet Standards process must be ! followed, or as required to translate it into languages other than ! English. ! ! The limited permissions granted above are perpetual and will not be ! revoked by the Internet Society or its successors or assigns. ! ! This document and the information contained herein is provided on an ! "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING ! TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING ! BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION ! HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF ! MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. ! ! The IETF takes no position regarding the validity or scope of any ! intellectual property or other rights that might be claimed to ! pertain to the implementation or use of the technology described in ! this document or the extent to which any license under such rights ! might or might not be available; neither does it represent that it ! has made any effort to identify any such rights. Information on the ! IETF's procedures with respect to rights in standards-track and ! standards-related documentation can be found in BCP-11. Copies of ! claims of rights made available for publication and any assurances of ! licenses to be made available, or the result of an attempt made to ! obtain a general license or permission for the use of such ! proprietary rights by implementors or users of this specification can ! be obtained from the IETF Secretariat. ! ! The IETF invites any interested party to bring to its attention any ! copyrights, patents or patent applications, or other proprietary ! rights which may cover technology that may be required to practice ! this standard. Please address the information to the IETF Executive ! Director. ! ! 11. Authors' Addresses Tim Berners-Lee World Wide Web Consortium *************** *** 1150,1160 **** userinfo = *( unreserved | escaped | ":" | ";" | "&" | "=" | "+" ) hostport = host [ ":" port ] ! host = hostname | hostnumber hostname = *( domainlabel "." ) toplabel domainlabel = alphanum | alphanum *( alphanum | "-" ) alphanum toplabel = alpha | alpha *( alphanum | "-" ) alphanum ! hostnumber = 1*digit "." 1*digit "." 1*digit "." 1*digit port = *digit path = [ "/" ] path_segments --- 1158,1168 ---- userinfo = *( unreserved | escaped | ":" | ";" | "&" | "=" | "+" ) hostport = host [ ":" port ] ! host = hostname | IPv4address hostname = *( domainlabel "." ) toplabel domainlabel = alphanum | alphanum *( alphanum | "-" ) alphanum toplabel = alpha | alpha *( alphanum | "-" ) alphanum ! IPv4address = 1*digit "." 1*digit "." 1*digit "." 1*digit port = *digit path = [ "/" ] path_segments *************** *** 1242,1248 **** Within an object with a well-defined base URL of ! Content-Base: http://a/b/c/d;p?q the relative URLs would be resolved as follows: --- 1250,1256 ---- Within an object with a well-defined base URL of ! http://a/b/c/d;p?q the relative URLs would be resolved as follows: *************** *** 1336,1350 **** http:g = http:g http: = http: D. Embedding the Base URL in HTML documents It is useful to consider an example of how the base URL of a document can be embedded within the document's content. In this appendix, we describe how documents written in the Hypertext Markup Language (HTML) [RFC1866] can include an embedded base URL. This ! appendix does not form a part of the relative URL specification and ! should not be considered as anything more than a descriptive ! example. HTML defines a special element "BASE" which, when present in the "HEAD" portion of a document, signals that the parser should use --- 1344,1358 ---- http:g = http:g http: = http: + D. Embedding the Base URL in HTML documents It is useful to consider an example of how the base URL of a document can be embedded within the document's content. In this appendix, we describe how documents written in the Hypertext Markup Language (HTML) [RFC1866] can include an embedded base URL. This ! appendix does not form a part of the URL specification and should not ! be considered as anything more than a descriptive example. HTML defines a special element "BASE" which, when present in the "HEAD" portion of a document, signals that the parser should use *************** *** 1415,1431 **** attempt to recognize and strip both delimiters and embedded whitespace. ! Examples: ! Yes, Jim, I found it under "http://www.w3.org/pub/WWW/", but you can probably pick it up from . Note the warning in . ! F. Summary of Non-editorial Changes ! F.1. Additions Section 3 (URL References) was added to stem the confusion regarding "what is a URL" and how to describe fragment identifiers --- 1423,1473 ---- attempt to recognize and strip both delimiters and embedded whitespace. ! For example, the text: ! Yes, Jim, I found it under "http://www.w3.org/Addressing/", but you can probably pick it up from . Note the warning in . ! ! contains the URL references ! ! http://www.w3.org/Addressing/ ! ftp://ds.internic.net/rfc/ ! http://www.ics.uci.edu/pub/ietf/uri/historical.html#WARNING ! + F. Abbreviated URLs ! The URL syntax was designed for unambiguous reference to network ! resources and extensibility via the URL scheme. However, as URL ! identification and usage have become commonplace, traditional media ! (television, radio, newspapers, billboards, etc.) have increasingly ! used abbreviated URL references. That is, a reference consisting of ! only the site and path portions of the identified resource, such as ! www.w3.org/Addressing/ ! ! or simply the DNS hostname on its own. Such references are primarily ! intended for human interpretation rather than machine, with the ! assumption that context-based heuristics are sufficient to complete ! the URL (e.g., most hostnames beginning with "www" are likely to have ! a URL prefix of "http://"). Although there is no standard set of ! heuristics for disambiguating abbreviated URL references, many ! client implementations allow them to be entered by the user and ! heuristically resolved. It should be noted that such heuristics may ! change over time, particularly when new URL schemes are introduced. ! ! Since an abbreviated URL has the same syntax as a relative URL path, ! abbreviated URL references cannot be used in contexts where relative ! URLs are expected. This limits the use of abbreviated URLs to places ! where there is no defined base URL, such as dialog boxes and off-line ! advertisements. ! ! ! G. Summary of Non-editorial Changes ! ! G.1. Additions Section 3 (URL References) was added to stem the confusion regarding "what is a URL" and how to describe fragment identifiers *************** *** 1439,1445 **** Section 2.4 was rewritten to clarify a number of misinterpretations and to leave room for fully internationalized URLs. ! F.2. Modifications from both RFC 1738 and RFC 1808 Confusion regarding the terms "character encoding", the URL "character set", and the escaping of characters with % --- 1481,1491 ---- Section 2.4 was rewritten to clarify a number of misinterpretations and to leave room for fully internationalized URLs. ! Appendix F on abbreviated URLs was added to describe the shortened ! references often seen on television and magazine advertisements and ! explain why they are not used in other contexts. ! ! G.2. Modifications from both RFC 1738 and RFC 1808 Confusion regarding the terms "character encoding", the URL "character set", and the escaping of characters with % *************** *** 1489,1495 **** corresponds to actual practice) or creating a separate component just to hold that slash. We chose the former. ! F.3. Modifications from RFC 1738 The definition of specific URL schemes and their scheme-specific syntax and semantics has been moved to separate documents. --- 1535,1541 ---- corresponds to actual practice) or creating a separate component just to hold that slash. We chose the former. ! G.3. Modifications from RFC 1738 The definition of specific URL schemes and their scheme-specific syntax and semantics has been moved to separate documents. *************** *** 1506,1512 **** The recommendations for delimiting URLs in context (Appendix E) have been adjusted to reflect current practice. ! F.4. Modifications from RFC 1808 RFC 1808 (Section 4) defined an empty URL reference (a reference containing nothing aside from the fragment identifier) as being a --- 1552,1558 ---- The recommendations for delimiting URLs in context (Appendix E) have been adjusted to reflect current practice. ! G.4. Modifications from RFC 1808 RFC 1808 (Section 4) defined an empty URL reference (a reference containing nothing aside from the fragment identifier) as being a *************** *** 1519,1526 **** correctly interpret an empty reference has been added in Section 3. The description of the mythical Base header field has been replaced ! with the Content-Base and Content-Location header fields defined by ! HTTP/1.1 and MHTML [RFC2110]. RFC 1808 described various schemes as either having or not having the properties of the generic-URL syntax. However, the only requirement --- 1565,1572 ---- correctly interpret an empty reference has been added in Section 3. The description of the mythical Base header field has been replaced ! with a reference to the Content-Base and Content-Location header ! fields defined by MHTML [RFC2110]. RFC 1808 described various schemes as either having or not having the properties of the generic-URL syntax. However, the only requirement