You are on page 1of 13

2.

Web publishing
In this section, we look more closely into the basic concepts behind Web publishing. The
first subsections give basic background information. This information is not essential to
follow the book but it may help you to get the big picture and understand how the various
parts fit together. Such an understanding can be quite vital in abnormal situations, where
something does not work as it should be. The last sections are essential for the remaining
parts of the book.

2.1. HTTP
Today, almost all Web publishing uses the HTTP (HyperText Transfer Protocol) [RFC
2616] or its "secured" version HTTPS [RFC 2818] which it HTTP over TLS (Transport
Layer Security) [RFC 2246] or its predecessor SSL (Secure Socket Layer).

HTTP is a very simple protocol: a client sends a request to a server; the server processes
the request and replies with a response. Requests and responses are send as messages
over a TCP connection. The message format is essentially MIME (Multi purpose Internet
Mail Extension) [RFC 2045-2049], a format used, too, to transfer multi media mail
messages across the Internet. A MIME message consists of a set of headers and a body,
also known as message entity. The body is optional for HTTP messages. The standards
speaks of it as the HTTP entity. For HTTP, a request or response line, respectively, is
prepended to the message.

A request line consists of the request method, the resource locator and the protocol
version. A resource can be anything: a HTML page, an image, a file, a database, a
service, an application. It is identified by the resource locator, a path to easily locate the
resource in a hierarchical structure such as e.g. a file system or Zope's folder structure.
HTTP uses the URL syntax for the resource locator. It is up to the receiving HTTP server
to determine what resource the resource locator does really identify.

HTTP knows a set of request methods. The most essential are GET, POST, PUT, HEAD.
GET requests all information about the resource to be transfered in the response. HTML's
link traversal and image references are mapped to GET requests. HTML form submission
may use a GET request. Additional request headers can make the request into a
conditional request or a request to transfer part of the information. Both request
modifications are intended to reduce communication traffic. A GET request should not
have side effects and should be idempotent. Idempotence means, that a repetition of the
same request results in the same response. HTTP clients often use this fact and assume
that GET requests can be cached. They save the response to such a request and take the
response from their cache when a later GET request targets the same resource. For a Zope
Web site, many requests do have side effects. This is especially true, if a session
management device is employed. In such cases, it may be necessary to fight with the
caching behavior.
A POST request sends information to a resource that this should integrate subordinate to
itself. POST requests usually have side effects. You would use a POST request for
example to create a new database record, update the properties of a Zope object or post a
news item to a discussion board. HTML form submission often uses POST requests.

A PUT request sends information that the server should use to create a resource located by
the request resource locator. If there is already such a resource, it may be overwritten.
PUT requests are not used from HTML. HTML editing tools use PUT requests to publish
an object on a Web site.

A HEAD request is similar to a GET request. It is expected to have no side effects and be
idempotent. It should return the same headers as a GET targeted to the same resource but
should not transfer the message body. HEAD is not used from HTML. It is, however,
used by link validation and indexing tools, to efficiently check the existence, the type,
size and other meta information for a resource. It is difficult for Zope to meet the
requirements for this request type. Most of its objects are templates that need to be
rendered in order to obtain full header information. Rendering, however, can have
unwanted side effects. Zope, therefore, returns only approximate, sometimes even wrong
information in response to HEAD requests.

A resource may be finer grained then the location of an object in a hierarchical structure.
Resource locators for GET and HEAD requests may have a trailing query string that
provides additional parameters. The query string is started with a ? which is followed by
a sequence of & separated parameter definitions of the form name=value. For POST and
PUT requests, parameters may be present in the request body. Their packaging there is
indicated by the request's Content-Type header (by e.g. application/x-www-form-
urlencoded or multipart/form-data). In Zope, ZPublisher takes care of the
parameters, independent whether they are provided as part of the resource locator or in
the request body, and makes them accessible in a standard way.

A response begins with a response line. This consists of the protocol version, the
numerical status code and the textual status phrase. The status code, a three digit decimal
number, tells the client, what happened with the request. The code is divided into classes
based on the first digit.

The codes 1xx represent informational responses.

The codes 2xx tell the client that the request has been successfully executed. Usually, the
remaining response contains an entity as the primary request result. The usual return code
is 200. Other codes indicate, that special information is available in response headers or
that the browser should behave in a special way.

The 3xx class calls for redirections. The request is not completed but requires special
actions from the user or his user agent. For security reasons, HTTP requires that
redirections are only performed automatically for GET and HEAD requests. All other
request types require user confirmation. The redirect method of Zope's RESPONSE
object uses a 302 status code with the location header set to the new URL. For some
objects, especially files and images, Zope responds with a 304 response to GET requests
made conditional with an If-Modified-Since header, if the object has not been
modified since the given date. In fact, this response is not a redirection. It completes the
response without sending the entity data. Conditional requests of this type are usually
send for objects in a client's cache when the cache validity should be checked. A 304
response indicates in this case, that the cache entry is still valid and the request can be
served from the cache. If the object has been changed since the given date, Zope responds
with a 200 response that contains the new information. By default, Zope does not employ
this mechanism for template objects as their modification date is not decisive to
determine whether or not the generated page remains the same. As it is too difficult to get
this right in the general case, Zope always processes such requests as unconditional.
However, applications with special efficiency concerns may explicitly generate a 304
response if they can guarantee validity. As of version 2.3, Zope provides an integrated
cache manager that can help you to control caches both inside and outside of Zope.

The 4xx status codes indicate a client error. Usually, the response contains an entity that
explains the problem and what can be done about it. The most essential codes are

400

The request is malformed or otherwise not understood.

401 (Unauthorized)

The resource is protected. Authentication may allow access to the resource.

404

The requested resource is not found. HTTP allows the server to cheat. The code is
a catch all for all types of client errors, the server does not want to give a more
detailed description for.

5xx status codes indicate a server error. Zope uses code 500 (internal server error), when
some application code tries to set an invalid status code or when it raises an exception
that Zope is unable to map to another status code (such as redirect or unauthorized).
When Zope is connected to through a proxy, a client may observe other status codes from
this class. It usually indicates that either Zope or another proxy on the way to Zope died
or a connection broke down.

HTTP is a stateless protocol. This means that a request must contain all information
necessary to process it. The server is not expected to have saved state information from
previous requests that may be necessary to process this one. This HTTP property makes it
quite hard to build more complex Web applications. Of course, users expect from most
applications that they are aware of their preferences and remember essential facts from
previous interactions. A whole mess of kludges have been defined and implemented to
work around this limitation: authentication headers, hidden form variables, cookies,
session products. We will learn about these concepts later in this book. I expect future
HTTP versions to remove this limitation.

2.2. URL
The URL (Universal Resource Locator) is one of the most essential Web publishing
concepts. As the name says, it is used to locate a resource. As we explained in the last
section, a resource is used in a very wide sense: it can be almost anything, a person, an
object, a service, an application etc. Almost the only requirement is that it can be
identified with an identifier, an URI (Universal Resource Identifier). There are different
kinds of identifiers. Some kinds contain a description how to locate the resource. These
form the subclass of the resource locators, the others are the resource names, URN's. The
URI syntaxes for the various kinds have many commonalities. Therefore, the common
aspects can be described in a single URI syntax standard [RFC 2396]. Each URI kind is
identified by a scheme. Although the scheme determines the precise syntax, URIs,
especially URLs, usually consist of up to 4 components: the scheme, an authority, a path
and a query. The scheme is always present. It determines which of the other components
may or must be present. This means, the generic syntax looks like:

scheme: [authority] [path] [?query]

For resource locators, the scheme usually identifies a protocol which can be used to
access the resource. The remaining URL parts provide the parameters necessary for this
access. The most prominent protocols in the Web publishing domain are http and its
secured version https, as well as mailto and ftp. The mailto protocol accesses the
resource, usually a mailbox or mail group, by sending an email to it. The URLs use only
the authority part which usually has the format user@host.

The other listed protocols belong to the family of hierarchical URI schemes. Their
commonality is the use of the path component, a sequence of (path) segments separated
by /. Paths can be used to navigate in a (typically) hierarchical structure: to locate
path/segment, segment is used as a local selector in the context of the resource located
by path. You know this type of navigation from your file system and indeed the
resources are often folders and files in a standard file systems and the URL path
component directly mapped to the file hierarchy.

A segment has the form of an id optionally followed by a sequence of parameters, each


one preceeded with a ;. These parameters are a recent extension to the URI syntax.
Formerly, the path component as a whole could have a single parameter section as suffix.
Zope's URI parsing algorithm[20] still follows this older specification and terminates the
path as soon as it sees the first ;. It does not interprete parameters or makes them
accessible to the application.

For hierarchical URI schemes, the authority component typically has the form
//[userinfo@]host[:port]

The host identifies the name or Internet address of a host where a service at port should
resolve the URI into a resource. If port is not specified, a protocol specific default is
used. This is 80 for HTTP, 443 for HTTPS and 21 for FTP. If present, userinfo
identifies the user for whom the resolution and maybe an associated request should be
performed. It has the form username[:password].

The query component is a string of additional information, interpreted by the resource.


For HTTP, it usually is a sequence of name=value components, separated by &. Zope
interprets them as argument definitions and makes them available directly via their name.

What is usually used in documents are not URIs but rather a more general concept, an
URI reference. An URI reference is essentially an URI, but two aspects make it more
general than an URI. An URI reference may have an attached fragment identifier,
introduced with #. It identifies a fragment, a part of the resource identified by the URI.
And the URI reference may be relative, i.e. it may only specify part of the complete URI
with the missing parts given by a base URI. The fragment part is only interpreted by user
agents, usually to position the view onto the displayed resource. It is never sent in a
request. Likewise, relative URI references are resolved with respect to their base URI to
form an absolute URI and only these absolute URIs are sent in a request. A relative URI
reference does not follow the above mentioned URI syntax, it is rather a suffix thereof,
especially it does not have the scheme component. The rules to resolve a relative URI
reference with respect to its base into an effective absolute URI are as follows:

1. If the relative reference is empty or consists just of a fragment, then it references


the current resource.
2. If the relative reference starts with two slash characters, then the effective
reference gets its scheme from the base URI and everything else from the relative
reference.
3. Otherwise, if the relative reference starts with a single slash, then the effective
reference gets its scheme and authority from the base URI and everything else
from the relative reference.
4. Otherwise, the effective reference gets scheme and authority from the base URI,
query and fragment from the relative reference. To construct its path, the path of
the base URI and the relative reference are merged as follows. All characters in
the base path following the last / are discarded, then the path of the relative
reference is appended. All segments consisting of . are discarded and then, from
left to right, all segments consisting of .. are discarded together with their
preceeding segment.

As we speak about Web publishing, most resource references embedded in pages will be
URL references: references to images, other pages, mail addresses. If the resources are
local, you should usually try to reference them by relative references with respect to the
current page. Relative references have the advantage that you can often rename or move a
substructure without the need to change all your references. Using relative URLs with
respect to the current page works because the default base URI for resolution of relative
URIs is the URI of the current document. HTML provides the base tag, a header
component, to explicitly specify the base URI. Under some circumstances, the URL used
by the HTTP request is not the canonical URL of an object, as would be necessary for
correct resolution of relative URLs. Zope knows about many of these circumstances and
generates automatically a corresponding base tag. For cases where this is not possible,
Zope provides a method that allows the application to set the base.

As we have seen, there are many characters that have special meaning for URI parsing. If
they need to be used literally, i.e. as part of one of the URI components rather than as
component separator, then they must be encoded. Furthermore, some characters have
platform dependent representations. This induces problems for cross platform
applications such as Web publishing. There are other problems with some control
characters. The URI standard therefore severely restricts the set of characters that can
occur unencoded as part of the various URI components. The only characters that can be
used unrestrictedly are the ASCII letters (upper and lower case), the ASCII digits and the
characters from the set -_.!~*'(). Depending on the URI component, other characters
may be allowed, too. For example, the characters :@&=+$, are additionally allowed in
path segments. However, you should think twice, whether you really want to use such
facts. Any character not allowed in a context must be encoded. The encoding consists of
% followed by the two hex digits representing the character's code in the ISO-8859-1
encoding, also known as Latin-1. This is a superset of the ASCII character set. You do
not need to worry about the coding details: Zope provides a function url_quote that
encodes strings correctly to be used as URI components. You must be aware, however,
that encoding is necessary at some places and use url_quote at these places. Zope
decodes URLs automatically. Thus, there is usually no need to worry about this aspect.

2.3. HTML FORMs


Currently, most Web published pages use HTML, the HyperText Markup Language.
There are tools that allow you to create and modify HTML pages WYSIWYG or at least
guided by menus. These are so call HTML editors. Examples are DreamWeaver from
Macromedia, Microsoft's FrontPage or Netscape's Composer. However, these tools are
designed to build static, stand alone pages. They are of lesser practical use to work on
page templates with their additional tags not understood by the tool and their usually non
HTML-conformant macro structure[21]. This implies that for page template design you
need a basic understanding of HTML and a tool that lets you use unknown tags and build
non-standard pages. Such a tool would usually not support full WYSIWYG editing but
may provide menu guidance. Personally, I use XEmacs' HTML mode. But there are
many other simple HTML editors for this task, such as HTMLKit.

Although a basic HTML understanding is necessary to build dynamic Web sites with
Zope, it is beyond this book's scope to provide a thorough HTML introduction. I,
personally, look into the HTML4.0 specification when I need information about HTML.
In my view, it is a very good specification which provides introductions, well structured
overviews and detailed information combined with good navigation support such as
element and attribute indexes. In this book, we will only look at HTML forms, as they are
especially important for dynamic Web sites.

An HTML form is the major device that allows users to provide input for Web
applications. It is implemented by the HTML form tag. The form tag contains special
form controls beside normal text and HTML markup. Controls have a name, an initial
value and a current value. The user interacts with the form by changing the current value
of its controls, either directly or through script invocations. He may then submit the form.
Form submission results in a request being send to an agent, e.g. an email server or an
HTTP server. Depending on the context of submission, some controls are being
considered successful. For each successful control, the request contains an association
control_name=current_value. The order is the same as the controls appear in the
document.

Controls are implemented as HTML tags. Their name is given by a name attribute. Their
initial value is usually given by a value attribute, for some controls by their content
(textarea; option if no value attribute is present). The current value is initially set to
the initial value and can later be changed either by the user or a script. Values are strings.

There are controls for (single line) text input, image, submit, check, radio and reset
buttons and file input, all implemented by the input tag. Menus are implemented by
select, which is a container for options and is available both as a single and a multiple
selection. Multi line text input is implemented by textarea. HTML 4 provides additional
button and object controls. As a special case, there are hidden controls, also
implemented by input. They are used not for user interaction but to transfer information
between the page generation process and the request processing after form submission.
Such a transfer may be necessary to work around HTTP's lack of state which requires that
each request is self describing. Cookies provide an alternative to the use of hidden
controls.

Whether a control is successful during form submission is usually determined by its type
and its current value. Text input and hidden controls are always successful. Check and
radio button controls are only successful, if checked. For selections, each selected option
defines a successful control associated with the selection's name. Thus, there may be one,
several or none successful controls for a single (multiple) menu control in the submitted
form data. A submit or image button is only successful, if it was used to submit the
form[22].

It should be noted, that unsuccessful checkbox controls can make problems during form
processing. Similar problems result from multiple selections when no option has been
selected. In all these cases, the submitted form data does not contain a definition for the
associated control name. The application must take care, to interpret this lack of a value
correctly. Zope provides various facilities to handle these cases.

The form tag has one required attribute, action. Its value is an URI reference and
specifies the resource that should process the form data when it is submitted. Usually, it
is either a mailto or http/https URI. In the first case, the form data is send by email to
the given URI, in the second case, an HTTP request is sent. form has several optional
attributes, the most essential being method and enctype. method's value is either GET (the
default) or POST. When the GET method is used, then the form data is provided as query
string in the request locator of an HTTP GET request. As we have noted, the allowed
characters inside an URI are severely restricted. Characters not allowed must be encoded,
which results in a three byte code for each single byte character. Therefore, this method is
inefficient for non-ASCII strings or binary data. You should use POST when your form
transmits large non-ASCII strings or even files. If the action specifies a HTTP URI, then
an HTTP POST request is used to transfer the form data. Here, the form data is contained
not in the resource locator but in the request body. The enctype (encoding type)
determines in this case the content type of the request body (and thereby the encoding of
the form data). The default enctype value is application/x-www-form-urlencoded,
which is the same encoding used for URL encoding and therefore, is inefficient in the
same cases as the GET method. Do not use it when your form contains files. Use in these
cases multipart/form-data. This uses a multipart MIME message to encode the form
data. It can contain binary parts and therefore transfer binary data efficiently. If the form
data is sent as email to a human, then text/plain may be appropriate as value for
enctype. With this encoding, each successful control results usually in a line of the form
name=value without an encoding of characters in name or value. This is adequate for
humans. If the email recipient is a program one of the other encoding may be more
appropriate as they present no parsing ambiguity and there are standard tools for parsing.
If a form is submitted to Zope, any of the request methods and encoding types (exception
text/plain) are handled transparently and the form data made accessible in a
convenient way.

Example 3.3. Example HTML form

<FORM action="http://somesite.com/prog/adduser"
method="post">
<P>
<LABEL for="firstname">First name: </LABEL>
<INPUT type="text" id="firstname"><BR>
<LABEL for="lastname">Last name: </LABEL>
<INPUT type="text" id="lastname"><BR>
<LABEL for="email">email: </LABEL>
<INPUT type="text" id="email"><BR>
<INPUT type="radio" name="sex" value="Male"> Male<BR>
<INPUT type="radio" name="sex" value="Female">
Female<BR>
Interests:
<SELECT name="interests" multiple>
<OPTION value="1">Sports</OPTION>
<OPTION value="2">Politics</OPTION>
<OPTION value="3">Arts</OPTION>
<OPTION value="4">Economics</OPTION>
<OPTION value="5">Family</OPTION>
</SELECT><BR>
Origin continent:
<SELECT name="origin">
<OPTION>North America</OPTION>
<OPTION>South America</OPTION>
<OPTION>Asia</OPTION>
<OPTION>Australia</OPTION>
<OPTION>Europe</OPTION>
</SELECT><BR>
Interested in further information:
<INPUT name="info" type="checkbox" checked>
</P>
<h4>Remarks:</h4>
<P><TEXTAREA name="remarks" cols=60
rows=10></TEXTAREA></P>
<P>
<INPUT type="submit" value="Send"> <INPUT type="reset">
<INPUT type="hidden" name="sessionId" value="2417369">
</P>
</FORM>

This example (partly stolen from the HTML 4.0 specification) shows a simple form
containing most available controls.

The Forms chapter of the HTML specification contains more detailed information about
forms and form processing. It is very recommended reading.

2.4. Authentication
Unlike a static Web site where visitors usually can only retrieve data, a dynamic Web site
built with Zope allows in principle all types of site extensions and modifications
performed through the Web. It is clear that an administrator wants to control who is
entitled to perform such operations. Authentication, the determination of the identity of a
requesting agent, is vital for a dynamic Web site.

HTTP provides for an elementary authentication scheme. It is rightfully termed basic


authentication [RFC 2617]. HTTP requests can contain an Authorization header. The
authorization header identifies the authentication scheme, an authentication token and
optionally parameters necessary to interpret the authentication token. For the basic
authentication, the scheme is Basic, the token is the base64 encoding of
username:password and parameters are not used. When an HTTP server receives a
request requiring authorization, it will examine the authorization header. If it is missing
or it does not provide for the required authorization, the server will return an
Unauthorized response to the client. The response contains a WWW-Authenticate header
with one or more challenges. Each challenge consists of an authentication scheme and a
sequence of parameter definitions. For basic authentication, a single parameter is defined:
realm. Its value is a quoted string. The server URL and the realm value define the
protection space of the required authentication. An interactive user agent (a browser) will
usually pop up a login dialog for the server and realm when it receives such a challenge,
unless login information for the given protection space is already available. The login
dialog will ask the user for its username and password with respect to the protection
space and will remember this information at least for the session duration. It will then
construct an Authorization header from the login information and include it in all
requests sent to this server until it receives a new Unauthorized response from this
server, maybe for a different realm.

Basic authentication has two weak points. The first is security: username and password
are essentially sent in clear text (They are sent base64 encoded. However, it is trivial to
reconstruct the original information from the encoding) with every request. Anyone that
intercepts such a request can extract the username and password and use it to obtain a
false identity. The second is comfort: the lifetime of the login information is controlled
by the browser. It usually maintains it during the current session (i.e. the lifetime of the
browser process)[23]. In this case, the user has to reauthenticate each time he restarted its
browser.

Recently, a more secure authentication scheme has been defined for HTTP: digest
authentication. However, it is not widely implemented. Especially, Zope does not yet
support it (but many browsers do not, too).

The authentication scheme used by a Zope Web site is not hard-wired into Zope. Instead,
a component, the so called UserFolder decides about all authentication aspects. The
standard UserFolder which comes as part of Zope supports only basic HTTP
authentication. There are, however, products that use cookie authentication.

With cookie authentication, the authentication token is not sent in an Authentication


header but in a cookie. The token usually is not the clear text username and password but
some meaningless hash value that is only identified with the user inside the server.
Therefore, this scheme appears to be more secure. However, security is increased only
marginally. Any request will carry the cookie. An interceptor of such a request can fetch
the cookie and use it himself to steal the associated identity with respect to the
corresponding server. It is a bit more safe, as it is more difficult to get at the clear text
login information. This is essential if the same password is used for other purposes or
passwords for different purposes are constructed according to some scheme. However,
the cookie authentication provides much more control for the application. It can specify
the cookies lifetime. This can either be used to limit the validity period for the
authentication token and thereby for a potentially stolen identity. It can also be used to
save the cookie for a long time and eliminate the need for logins (almost) altogether. This
can be seen as a big advantage by casual users that might otherwise tend to forget their
passwords.

Some people (I am one of them) do not like cookies because of privacy concerns.
Cookies are often used by Web sites to identify their visitors across visits, collect long
term information about their visits and visit patterns and use this information in various
ways: to improve their Web site (good), to analyze their visitors interests and use it for
personalized marketing (I do not like that), maybe even sell this information (I hate that).
Therefore, I look regularly in my cookie file (where the browser maintains long living
cookies). When I detect cookies with a lifetime of more than a month or so, I get very
suspicious about the site's intentions. I delete such cookies and may disable cookies
altogether when visiting such a site.

2.5. Cookies
HTTP is a stateless protocol. This means that each request must be self contained. There
is nothing like a context build from previous requests that can be used to interpret the
current request. On the other hand, many applications need to be state full. Think of a
shopping card. When you look at your card, it must of course contain the items you have
sent to it in previous requests. Or think of a form with a complex form field. To fill it,
you may need to look at supporting information. When you come back, the form fields
you have filled previously must of course retain their values even though the new visit is
a new request to the server. How to implement such applications despite the stateless
HTTP protocol?

There are several workarounds for this HTTP deficiency. Usually they combine two
strategies: first, store information on the server associated with an id, and second, encode
the id somehow in the URI or the HTML content. To encode something in the URI, either
a path segment or a query parameter might be appropriate. Hidden form controls are
appropriate to encode state information inside HTML forms. Usually, these work arounds
are tedious and several encoding techniques must be used in combination, for example
hidden variables for pages with forms and ids encoded in URI references for link
traversal. That's where cookies come in.

The cookie mechanism is very similar to the HTTP authentication scheme we have seen
in the last section. Authentication is a typical example where you want state full behavior.
You should not need to authenticate for each request separately. After you have logged in
once, all following requests should use the login information you provided during this
first login. This is possible despite the stateless HTTP protocol, because the user agent
provides this information with each request. With a cookie, it is very similar.

A cookie is a named value, defined by the HTTP server and sent to the user agent. The
user agent stores the cookie and automatically includes all cookies defined by a given
server when it sends a request to this server. Looking at its cookies, the server can access
information effectively determined by earlier requests. That's a bit simplified but it gives
the general idea. Cookies are great by providing state information for HTTP processing
without the need to switch such information between query strings and hidden variables.

Cookies have been invented by Netscape. The cookie specification can be found on
Netscape's web site. As cookies solve a fundamental problem with HTTP, they were soon
be implemented by other browsers. Nowadays, almost all browsers support cookies.

Earlier I said a cookie were a named value and all cookies defined by a server were sent
with any request to this server. As already mentioned, this was a simplification. Actually,
a cookie is described by the following attributes:

name

the cookie's name. The name must not contain white space, equal sign, comma or
semicolon.

value

the cookie's value. The value is a string not containing whitespace, comma or
semicolon. The value is usually encoded to prevent such forbidden characters to
slip in.

expires

the cookie's expiration date. This is an HTTP datetime, also known as an RFC
822 time [RFC 822]. The time zone is fixed to GMT. The format is Wdy, DD-Mon-
YYYY HH:MM:SS GMT. The user agent should delete the cookie when this time
arrives. If the cookie creating server does not specify an expiration date, the
cookie lives as long as the browser process. It is not stored persistently.

domain

The domain controls to which servers the cookie may be sent. A cookie may be
sent to a server, when domain is a suffix of the server's host name. This implies
that a cookie can also be sent to a server different from that defining the cookie as
long it is in the domain given by domain. To make abuse more difficult, a server
that sets a cookie can only specify a domain, it belongs to. Moreover, the domain
must be sufficiently specific: domain must contain at least 2 or 3 periods. If the
cookie creating server does not specify a domain, the servers host name is used.

path

A cookie is included in a request only when path is a prefix of the path


component of the request URI. This feature allows to restrict a cookie to be sent
only to a subset of the web site. If the cookie creating server does not specify a
path, then the path component of the request URI is used the response of which
created the cookie.

secure

If the cookie is marked secure, the browser will only send it over secure
connections. This currently means either an HTTPS or HTTP over TLS
connection.

User agents usually impose limits on the number and complexity of cookies. There is a
total limit (300) and a limit per server and domain for the number of cookies (20). The
name and value part of a cookie must not exceed 4kB.

In Zope, the RESPONSE object provides methods setCookie, appendCookie and


expireCookie to set or unset a cookie. setCookie has the parameters name and value
and the optional parameters expires, domain, path and secure, which can be provided
as keyword parameters. It sets the cookie specified by the parameters. appendCookie has
the parameters name and value. It sets a cookie with this name and value. If the current
response object has already a cookie with this name, the new value is appended to the old
one separated by a colon. expireCookie has the parameters of setCookie with the
exception of value and expires. It deletes the cookie by setting a cookie with an
expiration time in the past. The cookies send with a request are made accessible in the
cookies component of Zope's REQUEST object. They can directly be accessed through
their name in document templates.

Cookies can pose a significant thread to privacy. Be aware that some potential users will
disable cookies in their browser.

You might also like