You are on page 1of 9

Part of Chapter 1: HTTP Handout

What is HTTP?
HTTP, the Hypertext Transfer Protocol, is the application-level protocol that is used to transfer data on the Web. HTTP comprises the rules by which Web browsers and servers exchange information. Although most people think of HTTP only in the context of the World-Wide Web, it can be, and is, used for other purposes, such as distributed object management systems.

How Does HTTP Work?


HTTP Is a request-response protocol. For example, a Web browser initiates a request to a server, typically by opening a TCP/IP connection. The request itself comprises o a request line, o a set of request headers, and o an entity. The server sends a response that comprises o a status line, o a set of response headers, and o an entity. The entity in the request or response can be thought of simply as the payload, which may be binary data. The other items are readable ASCII characters. When the response has been completed, either the browser or the server may terminate the TCP/IP connection, or the browser can send another request.

An Example
As an illustration of HTTP, here is an example exchange between a Web browser and the Silicon Press server, www.silicon-press.com. In response to a user request to go to the URL: http://www.silicon.press.com the browser sends the following HTTP request to www.silicon-press.com: GET / HTTP/1.1 Connection: Keep-Alive User-Agent: Mozilla/5.0 (compatible; Konqueror/2.2-11; Linux) Accept: text/*, image/jpeg, image/png, image/*, */* Accept-Encoding: x-gzip, gzip, identity Accept-Charset: Any, utf-8, * Accept-Language: en, en_US Host: www.silicon-press.com -- blank line A brief explanation: o The first line is the request line that comprises three fields: 1. a method: The GET method indicates that the server is supposed to return an entity. 2. a request-URI (Universal Resource Identifier). The / indicates the root of the document system on the server, and 3. HTTP protocol version: 1.1 in this case. o The second line is the optional Connection header informs the server that the browser would like to leave the connection open after the response.

o The third line is the optional User-Agent header that identifies the kind of browser that is sending the request, its version, and its operating system. o The Accept headers specify the type, language, and encoding for the returned entity that the browser would prefer to receive from the server. Responding to the browser, the www.silicon-press.com server sends the following response: HTTP/1.1 200 OK Date: Thu, 24 Jan 2002 17:33:52 GMT Server: Apache/1.3.14 Last-Modified: Mon, 21 Jan 2002 22:08:33 GMT Etag: 47bc6-25e0-3c4c9161 Accept-Ranges: bytes Content-Length: 9696 Connection: close Content-Type: text/html -- blank line--- HTML entity A brief explanation: o The first line is the status line consisting of three fields: 1. HTTP protocol version of the response: 1.1 in this case, 2. a three-digit numeric status code, and 3. a short description of the status code. o The Content-Length, Content-Type, Etag, and Last-Modified header lines describe the entity returned.

More Examples
Example When you type a url in your address bar, your browser sends an HTTP request and it may look like this:
GET /tutorials/other/top-20-mysql-best-practices/ HTTP/1.1

01 02 03 04 05 06 07 08 09 10 11 12

Host: net.tutsplus.com User-Agent: Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US; rv:1.9.1.5) Gecko/20091102 Firefox/3.5.5 (.NET CLR 3.5.30729) Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8 Accept-Language: en-us,en;q=0.5 Accept-Encoding: gzip,deflate Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7 Keep-Alive: 300 Connection: keep-alive Cookie: PHPSESSID=r2t5uvjq435r4q7ib3vtdjq120 Pragma: no-cache Cache-Control: no-cache

First line is the "Request Line" which contains some basic info on the request. And the rest are the HTTP headers. After that request, your browser receives an HTTP response that may look like this: 01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17 18 19 20 21
<title>Top 20+ MySQL Best Practices - Nettuts+</title> HTTP/1.x 200 OK Transfer-Encoding: chunked Date: Sat, 28 Nov 2009 04:36:25 GMT Server: LiteSpeed Connection: close X-Powered-By: W3 Total Cache/0.8 Pragma: public Expires: Sat, 28 Nov 2009 05:36:25 GMT Etag: "pub1259380237;gz" Cache-Control: max-age=3600, public Content-Type: text/html; charset=UTF-8 Last-Modified: Sat, 28 Nov 2009 03:50:37 GMT X-Pingback: http://net.tutsplus.com/xmlrpc.php Content-Encoding: gzip Vary: Accept-Encoding, Cookie, User-Agent

Request Methods
The two most commonly used request methods are: GET and POST. You're probably already familiar with them, from writing html forms.

GET: Retrieve a Document This is the main method used for retrieving html, images, JavaScript, CSS, etc. Most data that loads in your browser was requested using this method. For example, when loading a Nettuts+ article, the very first line of the HTTP request looks like so:

1 2

GET /tutorials/other/top-20-mysql-best-practices/ HTTP/1.1 ...

Once the html loads, the browser will start sending GET request for images, that may look like this:

1 2

GET /wp-content/themes/tuts_theme/images/header_bg_tall.png HTTP/1.1 ...

Web forms can be set to use the method GET. Here is an example.

1 2 3 4 5 6 7 8

<form method="GET" action="foo.php">

First Name: <input name="first_name" type="text"> <br /> Last Name: <input name="last_name" type="text"> <br />

<input type="submit" name="action" value="Submit" />

</form>

When that form is submitted, the HTTP request begins like this:

1 2

GET /foo.php?first_name=John&last_name=Doe&action=Submit HTTP/1.1 ...

You can see that each form input was added into the query string. POST: Send Data to the Server Even though you can send data to the server using GET and the query string, in many cases POST will be preferable. Sending large amounts of data using GET is not practical and has limitations. POST requests are most commonly sent by web forms. Let's change the previous form example to a POST method. 1 2 3 4 5 6 7 8
</form> <input type="submit" name="action" value="Submit" /> First Name: <input type="text" name="first_name" /> <br /> Last Name: <input type="text" name="last_name" /> <br /> <form method="POST" action="foo.php">

Submitting that form creates an HTTP request like this:

POST /foo.php HTTP/1.1

01 02 03 04 05 06 07 08 09 10 11 12 13 14

Host: localhost User-Agent: Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US; rv:1.9.1.5) Gecko/20091102 Firefox/3.5.5 (.NET CLR 3.5.30729) Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8 Accept-Language: en-us,en;q=0.5 Accept-Encoding: gzip,deflate Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7 Keep-Alive: 300 Connection: keep-alive Referer: http://localhost/test.php Content-Type: application/x-www-form-urlencoded Content-Length: 43

first_name=John&last_name=Doe&action=Submit

There are three important things to note here: The path in the first line is simply /foo.php and there is no query string anymore. Content-Type and Content-Length headers have been added, which provide information about the data being sent. All the data is in now sent after the headers, with the same format as the query string. POST method requests can also be made via AJAX, applications, cURL, etc. And all file upload forms are required to use the POST method.

HTTPS (Secure HTTP)


HTTPS denotes the use of HTTP with SSL (Secure Socket Layer) protocol or its successor protocol Transport Layer Security (TLS), a transport-layer protocol. Either of these protocols, which use encryption, can be used to create a secure connection between two machines. The browser uses SSL or TLS when connecting to a secure part of a website indicated by an HTTPS URL, that is, a URL with the prefix https://. The browser then uses HTTP to send and receive requests over this secure connection.

HTTP Status Codes


200's are used for successful requests. 300's are for redirections. 400's are used if there was a problem with the request.

500's are used if there was a problem with the server.

200 OK
As mentioned before, this status code is sent in response to a successful request.

206 Partial Content


If an application requests only a range of the requested file, the 206 code is returned. It's most commonly used with download managers that can stop and resume a download, or split the download into pieces.

404 Not Found

When the requested page or file was not found, a 404 response code is sent by the server.

401 Unauthorized
Password protected web pages send this code. If you don't enter a login correctly, you may see the following in your browser.

Note that this only applies to HTTP password protected pages, that pop up login prompts like this:

403 Forbidden
If you are not allowed to access a page, this code may be sent to your browser. This often happens when you try to open a url for a folder, that contains no index page. If the server settings do not allow the display of the folder contents, you will get a 403 error.

For example, on my local server I created an images folder. Inside this folder I put an .htaccess file with this line: "Options -Indexes". Now when I try to open http://localhost/images/ - I see this:

There are other ways in which access can be blocked, and 403 can be sent. For example, you can block by IP address, with the help of some htaccess directives. 1 2 3 4 5
order allow,deny deny from 192.168.44.201 deny from 224.39.163.12 deny from 172.16.7.92 allow from all

302 (or 307) Moved Temporarily & 301 Moved Permanently


These two codes are used for redirecting a browser. For example, when you use a url shortening service, such as bit.ly, that's exactly how they forward the people who click on their links.

Both 302 and 301 are handled very similarly by the browser, but they can have different meanings to search engine spiders. For instance, if your website is down for maintenance, you may redirect to another location using 302. The search engine spider will continue checking your page later in the future.

But if you redirect using 301, it will tell the spider that your website has moved to that location permanently. To give you a better idea: http://www.nettuts.com redirects to

http://net.tutsplus.com/ using a 301 code instead of 302.

500 Internal Server Error

This code is usually seen when a web script crashes. Most CGI scripts do not output errors directly to the browser, unlike PHP. If there is any fatal errors, they will just send a 500 status code. And the programmer then needs to search the server error logs to find the error messages.

Complete List
You can find the complete list of HTTP status codes with their explanations here.

Where Can I Find More Information?


o HTTP/1.1 Specification (http://www.rfc-editor.org/rfc/rfc2616.txt) o TLS Protocol (http://www.rfc-editor.org/rfc/rfc2246.txt)