Science Fair Project Encyclopedia
Uniform Resource Locator
A Uniform Resource Locator, URL (either pronounced as "earl" — IPA: (American) or /ɜːl/ (British) — or spelled out), or Web address, is a standardized address for some resource (such as a document or image) on the Internet (or elsewhere). First created by Tim Berners-Lee for use on the World Wide Web, the currently used forms are detailed by Internet standard RFC 1738.
The URL was a fundamental innovation in the history of the Internet. The syntax is designed to be generic, extensible, and able to express addresses in any character set using a limited subset of ASCII characters (for instance, whitespace is never used in a URL). URLs are classified by the "scheme" which typically identifies the network protocol used to retrieve the resource over a computer network.
URIs and URLs
Every URL is a type of URI (or more concisely the set of URLs is a proper subset of URIs). A URI identifies a particular resource while a URL both identifies and a resource indicates how to locate it. To illustrate the distinction consider the URI urn:ietf:rfc:1738 which identifies IETF RFC 1738 without indicating where to find the text of this RFC. Now consider three URLs for three seperate documents containing the text of this RFC:
Each URL uniquely identifies each document and thus is a URI itself but URL syntax is such that the identity allows one to locate each of these documents.
Historically, the terms have been almost synonymous as almost all URIs have also been URLs. For this reason, many definitions in this article mention URIs instead of URLs; the discussion applies to both URIs and URLs.
An URL begins with the name of its scheme, followed by a colon, followed by a scheme-specific part.
Some examples of URL schemes:
- http - HTTP resources
- https - HTTP over SSL
- ftp - File Transfer Protocol
- mailto - E-mail address
- ldap - Lightweight Directory Access Protocol lookups
- file - resources available on the local computer or over a local file sharing network
- news - Usenet newsgroups
- gopher - the Gopher protocol
- telnet - the telnet protocol
Generic URI syntax
The syntax of the scheme-specific part depends on the requirements of the scheme. Schemes using typical connection-based protocols use a common "generic URI" syntax, defined below:
The authority typically consists of a hostname or IP address of a server, optionally followed by a colon and a port number. It may in fact also contain information on username and password for authenticating to the server.
The path is a specification of a location in some hierarchical structure, using a slash ("/") as delimiter between components.
The query part is typically intended to express parameters of a dynamic query to some database residing on the server.
The complete, authoritative URI parameter/syntax is below:
The term URI reference means a particular instance of a URI as it is being used in, for instance an HTML document. It introduces two new concepts: the one of absolute and relative references, and that of a fragment identifier.
An absolute URL is just like a URL defined above. A relative URL comprises just the scheme-specific part, where the scheme is inferred from the context in which the URL reference appears (the Base URI).
A URI reference can also consist of a URI followed by a hash sign ("#") and a pointer to within the resource referenced by the URI as a whole. This is not a part of the URI as such, but is intended for the "user agent" (browser) to interpret after the resource has been retrieved. Therefore it is never sent to the server in HTTP GET requests.
URLs in general are case-sensitive; however it is up to the server administrator to decide to respect case when responding to requests. For convenience some webservers send the same page for URLs differing only in case.
URLs in everyday use
A HTTP URL combines into one simple address the four basic items of information necessary to retrieve a resource from anywhere on the Internet:
- the protocol to use to communicate,
- the host (server) to communicate with,
- the network port on the server to connect to,
- the path to the resource on the server (for example, its file name).
A typical URL can look like:
- http is the protocol,
- en.wikipedia.org is the host,
- 80 is the network port number on the server (as 80 is the default value for the HTTP protocol, this portion could have been omitted entirely),
- /wiki/Special:Search is the resource path,
- ?search=train&go=Go is the query string; this part is optional.
Most web browsers do not require the user to enter "http://" to address a webpage, as HTTP is by far the most common protocol used in web browsers. Likewise, since 80 is the default port for http it is not usually specified. One usually just enters a partial URL such as www.wikipedia.org/wiki/Train. To go to a homepage one usually just enters the host name, such as www.wikipedia.org.
Since the HTTP protocol allows a server to respond to a request by redirecting the web browser to a different URL, many servers additionally allow users to omit certain parts of the URL, such as the "www." part, or the trailing slash if the resource in question is a directory. However, these omissions technically make it a different URL, so the web browser cannot make these adjustments, and has to rely on the server to respond with a redirect. It is possible, but due to tradition rare, for a web server to serve two different pages for URLs that differ only in a trailing slash.
Note that in en.wikipedia.org/wiki/Train, the hierarchical order of the five elements is org (generic top-level domain) - wikipedia (second-level domain) - en (subdomain) - wiki - Train; i.e. before the first slash from right to left, then the rest from left to right.
The big picture
The term URL is also used outside the context of the World Wide Web. Database servers specify URLs as a parameter to make connections to it. Similarly any Client-Server application following a particular protocol may specify a URL format as part of its communication process.
Example of a database URL :
If a webpage is uniquely and more or less permanently defined by a URL it can be linked to (see also permalink, deep linking). This is not always the case, e.g. a menu option may change the contents of a frame within the page, without this new combination having its own URL. A webpage may also depend on temporarily stored information. If the webpage or frame has its own URL, this is not always obvious for someone who wants to link to it: the URL of a frame is not shown in the address bar of the browser, and a page without address bar may have been produced. The URL may be derivable from the source code and/or "properties" of various components of the page. See also Webpage#URL.
Apart from the purpose of linking to a page or page component, one may want to know the URL to show the component alone, and/or to lift restrictions such as a browser window without toolbars, and/or of a small non-adjustable size.
- RFC 1738 - Uniform Resource Locators (URL).
- RFC 3986 - Uniform Resource Identifier (URI): Generic Syntax.
- URL Encoding (or: 'What are those "%20" codes in URLs?') by Brian Wilson - some URL encoding charts and converter.
- URLEncode Code Chart (from i-Technica) - URL encoding chart.
The contents of this article is licensed from www.wikipedia.org under the GNU Free Documentation License. Click here to see the transparent copy and copyright details