Anatomy of URL

URLs (Uniform Resource Locators) are the standardized means of addressing pages in
the Web.
Technically, a URL (presumably pronounced like the name of the psychic known for
bending spoons) is any short string leading to a resource that is acceptable for use on the
Web, while it also identifies a specific protocol for retrieving the resource. Though this
meaning is slightly different in case URI, URN, URP, URT & URVs.
In the above acronyms, the "U" is sometimes construed as standing for "Universal" rather
than "Uniform".
URLs have the following form:

http://www.sitename.com/x/y.html
The first part, separated by a colon (:) from the rest of the URL, is the protocol, usually
http for HyperText Transport Protocol, though other protocols such as ftp and gopher are
sometimes used. For secure-server sites using an encrypted protocol, https is used as the
protocol identifier.
Next comes the hostname (domain name or IP address), preceded by a double slash (//). It
seems to be a common misconception that the colon and double slash are an inseparable
delimiter terminating the protocol -- for instance, the Mozilla team posted an online
document regarding their implementation of irc:// URLs. Actually, the colon is the
terminator of the protocol section, and the double slash is used to introduce a hostname or
other site identifier (varying somewhat by protocol, with some less-common protocols
taking things other than domain names in this section) and is absent in URIs lacking a
hostname like mailto: and news: URLs.
After that is the directory path to the Web page you're accessing, with forward slashes (/)
separating directory levels (not backslashes (\) like in DOS/Windows systems).
So “:// ” also indicates that all the files of that web page are stored systematically in
different folders of varied levels.
Pedantic Note: Actually, as many purists will tell you, it's not true that the "path" portion
of a URL is necessarily a directory path. Servers can be configured to interpret a URL
path any way they like, which might not necessarily correspond to any actual
subdirectory tree. Sites generated dynamically from databases may use URL paths that
have nothing to do with directory structures. However, most Web servers do use URLs
corresponding to the file structure, so that's what I'll assume for this document.
There are a few special protocols with URLs of differing syntax. mailto: is followed with
an e-mail address to create a link allowing users to send mail to that address. news: is
followed by the name of a newsgroup (e.g., comp.infosystems.www.authoring.html) to
let the user follow the link to see the newsgroup's messages (if the user's browser is
configured to access a news server). Both of these URL types do not have slashes (single
or double) in them; the syntax looks like mailto:webmaster@webtips.dan.info, not
mailto://webmaster@webtips.dan.info/; developers used to the more common http:
syntax often put extra slashes in these URLs and cause them to fail. (More information on
mailto: URLs is in my page on e-mail.)
Note that you can't leave out the protocol and use www.somewhere.com as a link URL
without the http://. This syntax works when you're typing in a URL in most browsers, but
in a link within your Web site it will be interpreted as a relative URL to a file named
"www.somewhere.com" in the current directory.
Are URLs case sensitive?
Technically, yes. You should always be consistent in your use of upper or lower case in
your URLs. Even in cases where the upper and lower case versions go to the same
resource, you're imposing an unnecessary burden on browsers that need to retrieve and
cache two copies of the same thing if they go to two variants of the same URL.
As far as whether you can vary the case and still get the same resource, this depends. The
protocol and hostname are not case sensitive, so you can write http://www.dan.info/ or
HTTP://www.dan.info/ and they'll work identically. However, the directory and
filenames may be case sensitive depending on what operating system the server is
running under (UNIX is case-sensitive, while Windows isn't). Fragment names are case-
sensitive. So be careful to match the directory, file, and anchor names in your links to the
case of the actual files and anchors.
Can I include spaces in my URLs?
No, the space is not a legal character in URLs. Spaces, and a number of other special
characters, must be encoded by using a percent sign (%) followed by a two character
hexadecimal number giving the character's position in the ASCII or ISO LATIN-1
encoding. A space is represented as %20.
Some Web servers might have file systems that allow documents with names containing
spaces, but if you use files with such names, their URLs will contain %20, which is rather
ugly. So it's best to avoid such names and stick to safer characters like letters, numbers,
dashes, and underscores. Mac users in particular tend to create directory structures
including spaces, producing awkward URLs.
Relative URLs
Definition: Relative URLs are context-sensitive, giving a path with respect to your
current location.
There are several types of relative URL.
1. A URL with no slashes, like "junk.html", references another page in the same
directory as the current page. So if you're currently at
"http://www.yoursite.com/stuff/one.html" and encounter the relative URL "two.html",
this is addressing the page "http://www.yoursite.com/stuff/two.html".
2. A URL with no leading slashes, but slashes within, references a subdirectory beneath
the current one. "subdir/test.html", encountered from the same page as the above
example, would reference "http://www.yoursite.com/stuff/subdir/test.html".
3. A URL with double dots at the start, like "../another.html", references the parent
directory of the current one. This URL, accessed from the same page as the above
examples, would lead to "http://www.yoursite.com/another.html". Double dots can be
repeated, like "../../grandparent.html", to go up additional levels, or combined with
subdirectory references like "../sister/" to go to a sibling directory.
4. A URL with a single dot at the start, like "./stuff.html", references another file in the
same directory, just like a URL with no slashes. It's better to use the form of URL
without the dot and slash, since there are a few old browsers and indexing robots that
don't seem to understand this syntax properly, and end up expanding the URLs into
bizarre things like "http://www.yoursite.com/././stuff/../junk/", which work (with most
servers), but look weird in your access logs. Double dots produce this effect too, but
they're too useful to give up, but the single dot is unnecessary (except in the special case
of linking back to the index of the current directory, where "./" is the best URL, as
described elsewhere).
5. A URL with a slash at the start, like "/dir1/dir2/stuff.html", references a page at a
path starting from the root of the server. To be more precise, it starts at the root of the
domain name you're in. Be careful using this if your site is in a virtual domain on an
Internet provider's system. If you have a domain yoursite.com which points at the
directory /sites/yours/ within the ISP's domain provider.com, then your page
silly/stuff.html can be reached via two different URLs:
http://www.yoursite.com/silly/stuff.html and
http://www.provider.com/sites/yours/silly/stuff.html. Maybe you had your site up for a
long time before getting your own domain so your users are regularly coming in via both
addresses. In this case, a URL like "/silly/morestuff.html" can be interpreted as
"http://www.yoursite.com/silly/morestuff.html" or
"http://www.provider.com/silly/morestuff.html" depending on which domain the user is
in. Thus, you should avoid this form of URL if there's any doubt about how the user is
accessing your site.
6. In an uncommon but legal URL form, a URL with a double slash at the start, like
"//www.yoursite.com/stuff.html", keeps only the protocol identifier from the current URL
and gets the full sitename and path from the new URL. I actually found a use for this
form recently, in a piece of HTML code that was being accessed under both the secure
https: protocol and the nonsecure http: protocol, and under more than one domain name. I
wanted to access a particular graphic in all cases, using a protocol (secure or nonsecure)
matching that with which the main page was accessed. Using relative URLs of the forms
given above would require the graphic to be placed in all the different domains; and using
an absolute URL would force the protocol to be specified. I deftly avoided these
problems by using a double-slashed relative URL.
7. Finally, a URL beginning with a pound sign (#) specifies a link to a fragment
identifier (anchor) in the current page.
Which Type of URL Should You Use?

TIP: Use absolute URLs when linking to a different site, and relative URLs when linking
within your site.
Within your site, it's best to use relative URLs, because this will allow you to move the
entire site to a different location without having to change all the internal links. Avoid the
forms of relative URL starting with slashes, as they are relative only to the root of the
server and will become incorrect if you move to a different place in the full directory tree.
However, the forms without leading slashes will work identically no matter where the site
is relocated.
Use absolute URLs when linking to other sites. You may wish to consider even some
other pages you created yourself to be "other sites" for this purpose, if they're part of a
completely different logical grouping from the current site and there's a chance one set of
your pages will be relocated while the other stays put. So, if you have two sites, at
http://www.yoursite.com/literature/ and http://www.yoursite.com/music/, and you think
you might eventually move the latter to http://www.yourmusic.com/, then any link from
the music site to the literature site should use the full URL instead of a relative URL like
"../literature/", which would stop working after this site move.

Anatomy of URL

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Anatomy of URL

Uploaded by

Copyright:

Available Formats

URLs (Uniform Resource Locators) are the standardized means of addressing pages in

URLs have the following form:

There are several types of relative URL.

Which Type of URL Should You Use?

You might also like