You are on page 1of 8

Research Proposal

James Collins
May 25, 2006

Title: Improving Low Bandwidth Web Browsing Using a Proxy Server


Author: James Collins (10220952)
Supervisor: Mr Peter Jones

1
Contents
1 Background 3

2 Aim 4

3 Method 5
3.1 Experiment: HTML Compression Using GZIP . . . . . . . . . 5
3.2 Experiment: Image Compression vs Image Removal . . . . . . 5
3.3 User Survey . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
3.4 Proposed Time Line . . . . . . . . . . . . . . . . . . . . . . . 6

4 Software and Hardware Requirements 7

5 References 8

2
1 Background
Web browsing forms a vital part of a typical internet user’s online activity.
Since its conception, web browsing has progressed from a small collection
of text-based web pages, to an almost limitless collection of visually appeal-
ing web sites. This transformation has meant that the download size of a
typical web page has drastically increased. However, despite the increasing
availability of high bandwidth broadband connections, over two thirds (69%)
of Australian internet users still are connected via dialup [6]. In addition,
the invention of wireless internet technologies such as GPRS has required
download optimisation techniques to be introduced.
It has been shown that the main contributor to perceived network latency
is limited modem bandwidth [5]. Thus, it can be said that there is relatively
high demand for improving web browsing performance over poor connections.
In order for this to be achieved, the amount of data transferred must be
decreased. Formula 1 (below) illustrates this:

downloadT ime = downloadSize × downloadSpeed (1)

In this case downloadSpeed is constant, meaning that in order to reduce


downloadTime, downloadSize must be reduced. There are many ways in
which the size of a web page can be reduced. For example:

• Compressing images (<img ..> tags).

• Removing images (<img ..> tags), and selectively showing them when
requested.

• Compressing the HTML code using an algorithm such as GZIP [4].

These filtering mechanisms could be applied in several ways. Firstly, the


web content could be filtered on the user’s machine (client-side). However,
in this case, a client-side solution is far from ideal because the full web page
would still have to be downloaded to the user before being filtered. Thus,
it is evident that the content must be filtered before the data is transmitted
over the user’s low bandwidth connection. Figure 1 (below) illustrates how
this can be achieved by the use of a proxy server to filter the content.

3
Figure 1: A network employing the use of a proxy server

This diagram demonstrates that only the filtered content (and thus less
data) is passed over the low bandwidth connection, whereas the original
content is passed over the high bandwidth connection.
Filtering web content using a proxy server has several advantages over a
client-side solution:

• The user’s computer requires no special software; an existing web browser


can be used provided its proxy settings are configured correctly.

• The filtering process is completely automatic and transparent to the


user.

• The filtering process is centralised; modifications to the filtering algo-


rithms can be made easily.

The concept of reducing download times whilst browsing the web is not
only limited to desktop computers; it is also applicable for other devices such
as PDA’s and 3G phones. Previous research has been performed in this area
[1, 3, 2]. The ideas presented in this project will be aimed at not only dialup
users, but users of wireless internet enabled devices such as PDA’s.

2 Aim
This project will involve the development of a customised proxy server, capa-
ble of filtering content with the primary aim of reducing download sizes whilst
browsing the web. This system will be compatible with typical desktop com-
puters, as well as with portable devices such as PDA’s and mobile phones.
Once this system is developed, experiments aimed to measure performance
and user satisfaction will be performed.

4
3 Method
In order to simplify development, an open source proxy such as Squid [8] or
RabbIT [7] will be extended with the specific aim of reducing the download
times of web pages. At this stage the implementation will probably be written
in C or Java. Other languages may be also be considered if appropriate.
The development process will adhere to common software engineering
practices, such as:

• A development methodology such as the waterfall model [9] will be


used.

• Functional requirements will be clearly listed.

• Design decision rationale will be included.

• Verification and validation techniques will help ensure the system achieves
its objectives.

Once the proxy server is developed, experiments will be run in order to


measure the effectiveness of the system. Proposed experiments are detailed
below.

3.1 Experiment: HTML Compression Using GZIP


Hypothesis: Running a compression algorithm such as GZIP on HTML
pages will result in at least a 10% reduction in download size.

Linux’s wget command will be used to download various popular Australian


web sites that already implement GZIP compression. The wget command
will firstly be used to download the page without GZIP compression, and
then with GZIP compression. These two download sizes will be compared
for each site, allowing an average compression ratio to be determined.

3.2 Experiment: Image Compression vs Image Removal


Hypothesis #1: Compressing images found on web pages will reduce down-
load size by at least 20%.

Hypothesis #2: Removing images found on web pages will reduce down-
load size by at least 80%.

5
Linux’s wget command will be used to download an existing (image inten-
sive) web site. The total time required and total bytes downloaded will be
recorded. This operation will be performed three times:

• Without the proxy server.

• With the proxy server set to remove all images and replace them with
links.

• With the proxy server set to compress all images before being down-
loaded.

These results will then be compared in order to determine the overall effec-
tiveness of each method.

3.3 User Survey


Hypothesis: Over 75% of users will be satisfied with the performance of the
filtering system.

A survey will be conducted on ten or more users of my proxy system in


order to measure their level of satisfaction levels. The users will also be
asked if they can think of any areas where it could be improved. The user
survey will have to be submitted for UWA’s Human Ethics approval. This
is factored into the time line, as detailed below.

3.4 Proposed Time Line


Below is a proposed time line detailing the milestones to be achieved in this
project. Please note that the following is an estimate only, and may change
during the early stages of the project.

Deliverables are shown in bold.

6
Date Milestone
Semester 1
Week 2 Background research begins
Week 3 Project proposal and summary
Week 4 Research previous developments
Week 5 Literature review
Week 6 Proposal talk presented to research group
Week 9 Conduct GZIP compression experiment
Week 10-11 Modify initial proposal if required
Week 12 Revised project proposal
Mid-year break User survey(s) written
Begin development and implementation
Submit survey for UWA ethics approval
Testing and refinement of implementation
Semester 2
Week 1 Survey(s) approved and administered
Weeks 2-3 Other experiments conducted
Week 4 Analysis of survey results
Week 4 Performance testing of implementation
Week 7 Conclusions drawn
Week 9 Draft dissertation due
Week 10 Seminar title and abstract
Week 12 Final dissertation
Week 13 Poster
Seminar
Study break Marked dissertation available for collection
After exams Corrected dissertation

4 Software and Hardware Requirements


The proxy server will be developed on a linux operating system such. As
the requirements will only be a C or Java compiler (depending on which pro-
gramming language is used), almost any PC can be used. For the purposed
of development, my personal machine running Ubuntu Linux 5.10 will be
used for running the proxy server. Due to the network privileges required
for running a proxy server, the proxy is not suitable for the CSSE computer
labs. The client machine can be almost any computer, such as my laptop
running Windows XP or the CSSE computer lab computers.

7
5 References

[1] Hassan Artail and Mackram Raydan. Device-aware desktop web page
transformation for rendering on handhelds. Personal Ubiquitous Com-
puting, 9(6):368–380, 2005.

[2] Staffan Bjork, Lars Erik Holmquist, Johan Redstrom, Ivan Bretan, Rolf
Danielsson, Jussi Karlgren, and Kristofer Franzen. West: a web browser
for small terminals. In UIST ’99: Proceedings of the 12th annual ACM
symposium on User interface software and technology, pages 187–196,
New York, NY, USA, 1999. ACM Press.

[3] Orkut Buyukkokten, Hector Garcia-Molina, Andreas Paepcke, and Terry


Winograd. Power browser: efficient web browsing for pdas. In CHI ’00:
Proceedings of the SIGCHI conference on Human factors in computing
systems, pages 430–437, New York, NY, USA, 2000. ACM Press.

[4] Peter Deutsch. Rfc 1952: Gzip file format specification version 4.3.
http://www.gzip.org/zlib/rfc1952.pdf.

[5] Li Fan, Pei Cao, Wei Lin, and Quinn Jacobson. Web prefetching be-
tween low-bandwidth clients and proxies: potential and performance. In
SIGMETRICS ’99: Proceedings of the 1999 ACM SIGMETRICS inter-
national conference on Measurement and modeling of computer systems,
pages 178–187, New York, NY, USA, 1999. ACM Press.

[6] Australian Bureau of Statistics. Household use of informa-


tion technology, australia. http://www.abs.gov.au/Ausstats/abs@.nsf/
0/BB43A65B94202A57CA2570D800169243?Open.

[7] Robert Olofsson. Rabbit web proxy. http://www.squid-cache.org/.

[8] Squid Development Team. Squid web proxy cache. http://rabbit-


proxy.sourceforge.net/.

[9] Wikipedia. Waterfall model. http://en.wikipedia.org/wiki/Waterfall model.

You might also like