You are on page 1of 42

HTTP Hyper Text Transfer

Protocol
By:
Manish Kumar
Pilani (Rajasthan) India
manishkulhari@gmail.com

Common Protocols
In order for two remote machines to
understand each other they should
speak the same language
coordinate their talk

The solution is to use protocols


Examples:

FTP File Transfer Protocol


SMTP Simple Mail Transfer Protocol
NNTP Network News Transfer Protocol
HTTP HyperText Transfer Protocol
manishkulhari@gmail.com

Why HTTP was Needed?


According to Tim Berners-Lee (1991), a
protocol was needed with the following
features:

A subset of the file transfer protocol


The ability to request an index search
Automatic format negotiation
The ability to refer the client to another
server
manishkulhari@gmail.com

HTTP
Request

HTTP
Request

Proxy Server

HTTP
Response

HTTP Response

Web Server

http://www.cs.huji.ac.il/~dbi
http://www.cs.huji.ac.il/~dbi

www.cs.huji.ac.il:80

File System
manishkulhari@gmail.com

Department
Proxy Server

University
Proxy Server

Israel
Proxy Server

Web Server
manishkulhari@gmail.com

www.w3.org:80
5

Terminology
User agent: client which initiates a request
(browser, editor, Web robot, )
Origin server: the server on which a given
resource resides (Web server a.k.a. HTTP
server)
Proxy: acts as both a server and a client
Gateway: server which acts as intermediary
for other servers
Tunnel: acts as a blind relay between two
applications we can implement a custom
protocol using HTTP tunneling
manishkulhari@gmail.com

Resources
A resource is a chunk of information
that can be identified by a URL
(Universal Resource Locator)
A resource can be
A file
A dynamically created page

What we see on the browser can be a


combination of some resources
manishkulhari@gmail.com

Universal Resource Locator


protocol://host:port/path#anchor?parameters
http://www.cs.huji.ac.il/~dbi/index.html#info
http://www.google.com/search?hl=en&q=blabla

There are other types of URLs


mailto:<account@site>
news:<newsgroup-name>
manishkulhari@gmail.com

In a URL
Spaces are represented by +
Characters such as &,+,% are encoded
in the form %xx where xx is the ascii
value in hexadecimal; For example, &
= %26
The inputs to the parameters are given
as a list of pairs of a parameter and a
value:
var1=value1&var2=value2&var3=value3
manishkulhari@gmail.com

war&peace Tolstoy

manishkulhari@gmail.com

10

http://www.google.com/search?hl=en&q=war%26peace+Tolstoy

manishkulhari@gmail.com

11

An HTTP Session
A basic HTTP session has four phases:
1.Client opens the connection (a TCP
connection)
2.Client makes a request
3.Server sends a response
4.Server closes the connection

manishkulhari@gmail.com

12

Nesting
in
Page
Index.html
Left frame

Jumping fish

Right frame

Fairy icon

HUJI icon

What
What we
we see
see on
on the
the browser
browser can
can be
be
aa combination
combination of
of several
several resources
resources
manishkulhari@gmail.com

13

Nested Objects
Suppose a client accesses a page containing
10 inline images, how many sessions will be
required to display the page completely?
The answer is 11 HTTP sessions why?
Some browsers/servers support a feature
called keep-alive which can keep the
connection open until it is explicitly closed
How can this help?
manishkulhari@gmail.com

14

Stateless Protocol
HTTP is a stateless protocol, which means
that once a server has delivered the
requested data to a client, the server retains
no memory of what has just taken place
(even if the connection is keep-alive)
What are the difficulties in working with a
stateless protocol?
How would you implement a site for buying
some items?
So why dont we have states in HTTP?
manishkulhari@gmail.com

15

The Format of HTTP


Requests and Responses

An initial line
Zero or more header lines
A blank line (i.e., a CRLF by itself), and
An optional message body (e.g., a file, query
data, or query output)
Note: CRLF = \r\n
(usually ASCII 13 followed by ASCII 10)
manishkulhari@gmail.com

16

Headers
HTTP 1.0 defines 16 headers
None are required

HTTP 1.1 defines 46 headers

How do we
know who is
the host when
there is no host
header?

One header (Host:) is required in requests


that are sent to Web servers
A request that is sent to a proxy does not
have to include any header
A response does not have to include any
header
manishkulhari@gmail.com

17

HTTP Requests

manishkulhari@gmail.com

18

The
The Format
Format of
of aa Request
Request
method
header

sp
:

URL
value

sp version
cr lf

cr

lf

headers
lines
header
cr lf

value

cr

lf

Entity Body
manishkulhari@gmail.com

19

Request Example
GET /index.html HTTP/1.1 [CRLF]
Accept: image/gif, image/jpeg [CRLF]
User-Agent: Mozilla/4.0 [CRLF]
Host: www.cs.huji.ac.il:80 [CRLF]
Connection: Keep-Alive [CRLF]
[CRLF]

manishkulhari@gmail.com

20

method

Request Example

request URL
GET /index.html HTTP/1.1
version
Accept: image/gif, image/jpeg
User-Agent: Mozilla/4.0
Host: www.cs.huji.ac.il:80
Connection: Keep-Alive
[blank line here]
headers
manishkulhari@gmail.com

21

Request Methods

manishkulhari@gmail.com

22

Common Request Methods


GET returns the contents of the
indicated document
HEAD returns the header information
for the indicated document
Useful for finding out info about a resource
without retrieving it

POST treats the document as an


application and sends some data to it
manishkulhari@gmail.com

23

More Request Methods


PUT replaces the content of the document
with some data
DELETE deletes the indicated document
TRACE invokes a remote loop-back of the
request. The final recipient SHOULD reflect
the message back to the client
Usually these methods are not allowed
manishkulhari@gmail.com

24

GET Request
A request to get a resource from the
Web
The most frequently used method
The request has no message body, but
parameters can be sent in the request
URL (i.e., the URL without the host
part)
manishkulhari@gmail.com

25

HEAD Request
A HEAD request asks the server to return the
response headers only, and not the actual
resource (i.e., no message body)
This is useful for checking characteristics of a
resource without actually downloading it, thus
saving bandwidth
Used for testing hypertext links for validity,
accessibility and recent modification
manishkulhari@gmail.com

26

Post Request
POST request can send data to the
server
POST is mostly used in form-filling
The data filled into the form are translated
by the browser into some special format
and sent to a program on the server using
the POST command

manishkulhari@gmail.com

27

Post Request (cont.)


There is a block of data sent with the request,
in the message body
There are usually extra headers to describe
this message body, like Content-Type: and
Content-Length:
The request URL is a URL of a program to
handle the sent data, not a file
The HTTP response is normally the output of
a program, not a static file
manishkulhari@gmail.com

28

Post Example
Here's a typical form submission, using
POST:
POST /path/register.cgi HTTP/1.0
From: frog@cs.huji.ac.il
User-Agent: HTTPTool/1.0
Content-Type: application/x-www-form-urlencoded
Content-Length: 35
home=Ross+109&favorite+flavor=flies
manishkulhari@gmail.com

29

Request Headers

manishkulhari@gmail.com

30

HTTP 1.1 Request Headers


The common request headers of HTTP 1.1
are described in the following slides

Accept
Accept-Encoding
Authorization
Connection
Cookie
Host
If-Modified-Since
Referer
User-Agent
manishkulhari@gmail.com

31

Accept Request Headers


Accept
Specifies the MIME types that the client
can handle (e.g., text/html, image/gif)
Server can send different content to
different clients

Accept-Encoding
Indicates encodings (e.g., gzip) client can
handle
manishkulhari@gmail.com

32

More Accept Request Headers


Accept-Charset
Accept-Language

manishkulhari@gmail.com

33

Authorization Request Header


Authorization
User identification for password-protected
pages
Instead of HTTP authorization, use HTML
forms to send username/password and
store in state (e.g., session object )

manishkulhari@gmail.com

34

Connection Request Header


Connection
Connection: keep-alive means that the
browser can handle persistent connection
Keep-alive is the default in HTTP 1.1
In a persistent connection, the server can
reuse the same socket over again for
requests that are very close together from the
same client
Connection: close means that the
connection is closed after each request
manishkulhari@gmail.com

35

Content-Length
Request Header
This header is only applicable to
POST requests
It specifies the size of the POST
data in bytes

manishkulhari@gmail.com

36

Cookie Request Header


Gives cookies previously sent to the
client
Not in the HTTP 1.1 specification,
but is widely supported (originally, a
Netscape extension)

manishkulhari@gmail.com

37

Host Request Header


Indicates host and port as given in
the original URL
Required in HTTP 1.1

Needed due to request forwarding


and machines that have multiple
hostnames
manishkulhari@gmail.com

38

If-Modified-Since
Request Header
This header indicates that client
wants the page only if it has been
changed after the specified data
If-Unmodified-Since is the reverse of
If-Modified-Since
It is used for PUT requests (update
this document only if nobody else has
changed it since I generated it)
manishkulhari@gmail.com

39

The Format of the Date in


If-Modified-Since
and in
If-Unmodified-Since
Greenwich Mean Time should be used
and the format is:
Last-Modified: Fri, 31 Dec 1999 23:59:59 GMT

manishkulhari@gmail.com

40

Referer Request Header

URL of referring Web page


Useful for tracking traffic
It is logged by many servers
Can be easily spoofed
Note the spelling error correct
spelling is Referrer, but use Referer
manishkulhari@gmail.com

41

User-Agent Request Header


The value of this header is a string
identifying the browser making the
request
Use sparingly
Again, can be easily spoofed

manishkulhari@gmail.com

42

You might also like