Professional Documents
Culture Documents
by
Marliza Ramly
Zurina Saaya
Wahidah Md Shah
Mohammad Radzi Motsidi
Haniza Nahar
Faculty of Information and Communication Technology
Universiti Teknikal Malaysia Melaka (UTeM)
May 2007
TABLE OF CONTENT
1.
PROXY SERVERS......................................................................................... 1
1.2
2.
3.
3.4
3.5
3.6
4.2
CACHING.................................................................................................... 42
5.1
5.2
6.
ACL CONFIGURATION............................................................................... 25
4.1
5.
4.
CONCEPTS .................................................................................. 42
CONFIGURING A CACHE FOR PROXY SERVER ............................................ 42
ANALYZER ................................................................................................ 78
7.1
7.2
7.3
7.4
7.5
iii
ABBREVIATIONS
Abbreviation
Details
ACL
CARP
CD
Compact Disk
DNS
FTP
GB
Gigabyte
HTCP
HTTP
I/O
Input/Output
ICP
IP
Internet Protocol
LAN
MAC
MB
Megabyte
RAM
RPM
RTT
SNMP
SSL
UDP
URL
UTeM
WCCP
iv
Chapter
1. Proxy Servers
A Proxy Server is an intermediary server between the Internet browser
and the remote server. It acts like a "middleman" between the two
ends of the client/server network connection and also works with
browsers and servers or other application by supporting underlying
network protocols like HTTP. Furthermore, it store and download
documents in its local cache so that the downloading time from the
internet can be faster because the document is store in a local server.
For example, lets imagine when a user want to download documents
from the Internet browser with a specify URL address such as
http://www.yahoo.com, which then the document will be transfer to
workstation. (e.g UTeM to local workstation). In that situation, the
internet browser communicates directly with the proxy server UTem to
get the document.
In addition, a cache is combined with a proxy server which will make it
reliable for quicker transfer. In this matter, Internet browser will no
longer contact the remote server directly but it request document from
the proxy server.
Proxy Servers
Connection Sharing
Administrative Control
Caching service
Proxy Servers
client
client
Proxy Server
Internet
client
client
client
Chapter
2. Internet Caching
2.1 Hierarchical Caching
Cache Hierarchies are a logical extension of the caching concept. A
sharing concept might help and give some benefit for a group of Web
caches and a group of Web Clients. Figure 2-1 shows how it works.
However, there are some disadvantages as well. It will depends on the
specific situation discuss below whether the advantages will outweigh
the disadvantages.
client
Proxy Server
Yes
internet
Is requested page in
proxy server cache?
No
Proxy server
requests the page
from the web server
Web server
Internet Caching
Internet Caching
Internet Caching
parent cache
sibling cache
origin server
Internet Caching
The querying cache now collects the ICP replies from its peers.
If the cache does not receive an ICP_HIT reply, then all replies
will be ICP_MISS.
Include the origin server in the ICP pinging so that if the origin
server reply arrives before any ICP_HITs, the request is
forwarded there directly.
Chapter
3. Introduction to Squid
Squid is a high-performance proxy caching server for Web clients,
support FTP, gopher, and HTTP data objects. It has two basic
purposes;
caching
DNS
lookups,
supports
non-blocking
DNS
lookups,
and
Introduction to Squid
Squid Support
cache hierarchies
transparent caching
RAM
Minimum RAM recommended = 128mb (scales by user count and
size of disk cache)
Disk
Small user count = 512MB to 1G
Large user count = 16G to 24G
10
Introduction to Squid
Directories
Explaination
/var/cache
/etc/squid
/var/log
11
Introduction to Squid
Internet
10.1.1.1
10.1.1.1
80
client
client
client
client
client
LAN
client
client
LAN
12
Introduction to Squid
Internet
client
13
Introduction to Squid
14
Introduction to Squid
Router
Router
client
client
client
client
client
15
Introduction to Squid
To
do
so,
you
need
to
copy
the
installation folder into your local drive and run the following command.
# ./configure
# make
# make install
NOTE: Make sure all the dependency files are already installed in
your machine before starting to install Squid
16
Introduction to Squid
The following paragraph of this chapter will works through the options
that may need some further changes to get Squid to run. Most people
will not need to change all of these settings. What usually needs to
change is at least one part of the configuration file though: the default
file in squid.conf, which denies the access to the browser. If you
don't change this, Squid will not be very useful.
Basic Configuration
All of squid configuration goes in one file - squid.conf. This section
details up the configuration of Squid as a caching proxy only, not as
http-accelerator.
Some basic configuration need to be implemented. First, uncomment
and edit the following lines in the configuration file found at default file
/etc/squid/squid.conf
To construct the squid server, do the following tasks
1. log in as root to the machine
2. type the following command
# vi /etc/squid/squid.conf
The above command will open Squid configuration file for editing
17
Introduction to Squid
Then, set the port on which Squid listens. Normally, Squid will listen
on port 3128. While it may convenient to listen on this port, network
administrators often configure the proxy to listen on port 8080 as well.
This is a non-well-known port, while (port 1024 are well-known ports
and are restricted from being used ordinary users processes), and is
therefore not going to be in conflict with other ports such as 80, 443,
22, 23, etc. Squid need not be restricted to one port. It could easily be
started in two or more ports.
At squid.conf file, find out the following sentence for some changes
or leave it as default if its port is 3128.
http_port
Check
http_port 3128 (is a default.)
or
http_port 8080 3128 (for multiple port)
.
18
Introduction to Squid
10.1.5.49:8080
10.0.5.50:3128
http_access
By default http_access is denied. The Access Control Lists (ACL) rules
should be modified to allow access only to the trusted clients. This is
important because it prevents people from stealing your network
resources.
ACL will be discussed in Chapter 4.
cache_dir
This directive specifies the cache directory storage format and its size
as given below.
cache_dir ufs /var/spool/squid 100 16 256
The value 100 denotes 100MB cache size. This can be adjusted to the
required size. (cache will be discuss later in Chapter 5)
cache_effective_user
cache_effective_ group
NOTE: You can edit the squid.conf file by using gedit instead
of command line
19
Introduction to Squid
# squid -k parse
If error detected, for example
# squid k parse
FATAL: could not determine fully qualified hostname, Please
set visible hostname
Squid Cache (versio 2.6.STABLE4):Terminated abnormally.
CPU Usage:0.0004 seconds=0.0004 user+0.000 sys
Maximum Resident Size:0KB
Page faults with physical i/o:0
Aborted.
Solution : Add the following sentence in squid.conf file
visible_hostname localhost
If no error detected, continue with the following command to start
squid. (This is temporarily step to start the squid)
20
[OK]
Introduction to Squid
Stopping squid: .
[OK]
#squid k shutdown
- causes Squid to exit after waiting briefly for current connections to
exit
#squid k interrupt
- shuts down Squid immediately, without waiting for connections to
close
#squid k kill
kills Squid immediately, without closing connections or log files. (use
this option only if other methods dont work)
21
Introduction to Squid
The following section will explain the steps to configure proxy server in
Internet Explorer, Mozilla Firefox and Opera.
Internet Explorer 7.0
1. Select the Tools menu option
2. Select Internet Options
3. Click on the Connection tab
4. Select LAN settings
5. The Internet using a proxy server
6. Check the box in proxy server Type in the proxy IP address in
the Address field, and the port number in the Port field.
Example:
Mozilla Firefox
1. Click Tools Options Advanced
2. Click at Network go to connection Settings
3. At the configure proxies to Access Internet
22
Introduction to Squid
Port: 3128
6. Check the box to use the proxy server for all protocols
7. Then click OK
8. Now, the client can access the internet.
Opera 9.1
1. Click Tools Preferences Advanced
2. Choose Network
3. Click at Proxy Sever
Check
HTTP
: 10.0.5.10
Port :3128
HTTPs
: 10.0.5.10
Port :3128
FTP
: 10.0.5.10
Port :3128
Gropher
: 10.0.5.10
Port :3128
4. Then, Click OK
23
Introduction to Squid
24
Chapter
4. ACL Configuration
4.1 Access controls
Access control lists (ACL) are the most important part in configuring
Squid. The main use of the ACL is to implement simple access control
where it is used to restrict other people from using cache infrastructure
without certain permission. Rules can be written for almost any type of
requirement. It can be very complex for large organisations or just a
simple configuration to home users.
ACL is written in squid.conf file using the following formats
acl name type (string|"filename") [string2] ["filename2"]
name is a variable defined by user and it should be descriptive while
type is defined accordingly and it will be described in the next section
.
25
ACL Configuration
allow|deny
Details
src
client IP address
srcdomain
dst
destinations IP address
dstdomain
srcdom_regex
dstdom_regex
time
url_regex
Regular
expression
describing
whole
URL
of
URL
of
Regular
expression
describing
path
of
proto
Specify protocol
method
Specify method
browser
Specify browser
proxy_auth
maxconn
26
ACL Configuration
src
Description
This ACL allows server to recognize client (the computer which will use
server as proxy to get access to the internet ) using its IP address. The
IP address can be listed using single IP address, range of IP or using
defined IP address in an external file.
Syntax
acl
aclname
src
acl
aclname
src
acl
aclname
src
Example 1
acl fullaccess src /etc/squid/fullaccess.txt
http_access allow fullaccess
This ACL is using external file named fullaccess.txt where fullaccess.txt
consist of list of IP address of the client.
Example of fullaccess.txt
198.123.56.12
198.123.56.13
198.123.56.34
Example 2
acl office.net src 192.123.56.0/255.255.255.0
http_access allow office.net
This ACL set the source address for office.net in range 192.123.56.x to
access the Internet using http_access allow operator
27
ACL Configuration
srcdomain
Description
This ACL allows server to recognize client using clients computer
name. To do so, squid needs to reverse DNS lookup (from client ipaddress to client domain-name) before this ACL is interpreted, it can
cause processing delays.
Syntax
acl
aclname
Example 1
acl staff.net srcdomain staff20 staff21
http_access allow
staff.net
This ACL is for clients with computer name staff20 and staff21. The
operator http_access is allowing the ACL named staff.net to access
the Internet. This option is not really effective since the computer must
do reverse name lookup to determine the source name.
28
ACL Configuration
dst
Description
This is same as src, the difference is only it refers to Servers IP
address (destination). First, Squid will dns-lookup for IP Address from
the domain-name, which is in request header, and then interpret it
Syntax
acl
aclname
dst
IP address)
Example 1
acl tunnel dst 209.8.233.0/24
http_access deny tunnel
This ACL deny any node with IP 209.8.233.x
Example 2
acl allow_ip dst 209.8.233.0-209.8.233.100/255.255.0.0
http_access allow allow_ip
This ACL is allowing destination with IP address range
from
209.8.233.0 to 209.8.233.100.
dstdomain
Description
This ACL recognize destination using its domain. This is the effective
method to control specific domain
Syntax
acl
aclname
dstdomain
domain.com
29
ACL Configuration
Example 1
acl banned_domain dstdomain www.terrorist.com
http_access deny banned_domain
This ACL deny destionation with domain www.terrorist.com
srcdom_regex
Description
This ACL is almost similar to srcdomain where the server needs to
reverse DNS lookup (from client ip-address to client domain-name)
before this ACL is interpreted. The difference is this ACL allow the
usage of regular expression in defining the clients domain.
Syntax
acl
aclname
srcdom_regex -i
source_domain_regex
Example 1
acl staff.net srcdom_regex -i staff
http_access allow staff.net
This ACL allows all the node with the domain contains word staff to
access the internet. Option -i is used to make expression caseinsensitive
dstdom_regex
Description
This ACL allows server to recognize destination using its domain
regular expression.
Syntax
acl
30
aclname
dstdom_regex -i
dst_domain_regex
ACL Configuration
Example 1
acl banned_domain dstdom_regex -i terror porn
http_access deny banned_domain
This ACL denies client to access the destinations that contain word
terrorist or porn in its domain name. For example the access to the
domain www.terrorist.com and www.pornoragphy.net will be denied by
proxy server.
time
Description
This ACL allows server to control the service using time function. The
accessibility to the network can be set according the scheduled time in
ACL
Syntax
acl
aclname
time
where h1:m1 must be less than h2:m2 and day will be represented
using abbreviation in Table 4-1
day
abbreviations
Sunday
Monday
Tuesday
Wednesday
Thursday
Friday
Saturday
31
ACL Configuration
Example 1
acl SABTU time A 9:00-17:00
ACL SABTU refers to day of Saturday from 9:00 to 17:00
Example 2
acl pagi time 9:00-11:00
acl office.net 10.2.3.0/24
http_access deny pagi office.net
pagi refers time from 9:00 to 11:00, while office.net refer to the
clients' IP. This combination of ACLs deny the access for office.net if
the time is between 9.00am to 11.00 am
url_regex
Description
The url_regex means to search the entire URL for the regular
expression you specify. Note that these regular expressions are casesensitive. To make them case-insensitive, use the -i option
Syntax
acl
aclname
url_regex -i
url_regex ..
Example 1
acl banned_url url_regex -i terror porn
http_access deny banned_url
This ACL deny URL that contains word terrorist or porn.
For example, the following destination will be denied by the proxy
server;
http://www.google.com/pornography
http://www.news.com/terrorist.html
http://www.terror.com/
32
ACL Configuration
urlpath_regex
Description
The urlpath_regex is regular expression pattern matching from URL
but excluding protocol and hostname.
If
URL
is
http://www.free.com/latest/games/tetris.exe
then
this
acltype only looks after http://www.free.com/. It will leave out the http
protocol and www.free.com hostname.
Syntax
acl
aclname
urlpath_regex
pattern
Example 1
acl blocked_free urlpath_regex free
http_access deny blocked_free
This ACL will blocked any URL that only containing "free'' not "Free,
and without referring to protocol and hostname.
These regular expressions are case-sensitive. To make them caseinsensitive, add the i option.
Example 2
acl blocked_games urlpath_regex i games
http_access deny blocked_games
blocked_games refers to the URL containing word games no matter if
the spelling in upper or lower case.
Example 3
To block several URL.
acl block_site urlpath_regex i
/etc/squid/acl/block_site
http_access deny block_site
33
ACL Configuration
port
Description
Access can be controlled by destination (server) port address
Syntax
acl
aclname
port port-number
Example 1
Deny requests to unknown ports
acl Safe_ports port 80
acl Safe_ports port 21
acl Safe_ports port 443 563
# http
# ftp
# https, snews
34
ACL Configuration
proto
Description
This specifies the transfer protocol
Syntax
acl
aclname
proto
protocol
Example 1
acl protocol proto HTTP FTP
This refers protocols HTTP and FTP
Example 2
acl manager proto cache_object
http_access allow manager localhost
http_access deny manager
Only allow cachemgr access from localhost.
Example 3
acl ftp proto FTP
http_access deny ftp
http_access allow all
This command should block every ftp request
35
ACL Configuration
method
Description
This specifies the type of the method of the request
Syntax
acl
aclname
method
method-type
Example 1
acl connect method CONNECT
http_access allow localhost
http_access allow allowed_clients
http_access deny connect
the CONNECT method to prevent outside people from trying to connect
to the proxy server
browser
Description
Regular expression pattern matching on the request's user-agent
header. To grep the user-agent header information, squid.conf
should be added this line:
useragent_log /var/log/squid/useragent.log
Then, try to run the Mozilla browser. The user-agent header for Mozilla
should be as in the example.
Syntax
acl
aclname
browser
pattern
Example 1
acl mozilla browser ^Mozilla/5\.0
http_access deny mozilla
This command will deny Mozilla browsers or any other browser related
to it.
36
ACL Configuration
proxy_auth
Description
User authentication via external processes. proxy_auth requires an
EXTERNAL
authentication
program
to
check
username/password
aclname
proxy_auth
username...
Example 1
To validate a listing of users, we should do the following steps.
Creating passwd file
# touch
# chown
# chmod
/etc/squid/passwd
root.squid /etc/squid/passwd
640 /etc/squid/passwd
Adding users
# htpasswd
/etc/squid/passwd shah
You will be prompted to enter a passwd for that user. In the example is
the passwd for user shah.
Setting rules
auth_param basic program /usr/lib/squid/ncsa_auth
/etc/squid/passwd
auth_param basic children 5
auth_param basic realm Squid proxy-caching web-server
auth_param basic credentialsttl 2 hours
These listings are already in the configuration file but need to be
adjusted to suit your environments.
37
ACL Configuration
Authentication configuration
acl LOGIN proxy_auth REQUIRED
http_access allow LOGIN
This command will only allow user that have been authenticated during
accessing network connection.
CAUTION !! proxy_auth can't be used in a transparent proxy.
maxconn
Description
A limit on the maximum number of connections from a single client IP
address. It is an ACL that will be true if the user has more than
maxconn connections open.
Syntax
acl
aclname
maxconn
number_of_connection
Example 1
acl someuser src 10.0.5.0/24
acl 5conn maxconn 5
http_access deny someuser 5conn
The command will restrict users in 10.0.5.0/24 subnet to have only
five (5) maximum connections at once. If exceed, the error page will
appear. Other users are not restricted to this command by adding the
last line.
38
ACL Configuration
CAUTION !!
Do not include HTML close tags </HTML></BODY>
39
ACL Configuration
4.2 Exercises
1.
Why the users still can do the download process with the
following configuration.
acl download urlpath_regex -i \.exe$
acl office_hours time 09:00-17:00
acl GET method GET
acl it_user1 src 192.168.1.88
acl it_user2 src 192.168.1.89
acl nodownload1 src 192.168.1.10
acl nodownload2 src 192.168.1.11
http_access
http_access
http_access
http_access
allow
allow
allow
allow
it_user1
it_user2
nodownload1
nodownload2
40
ACL Configuration
2.
3.
41
Caching
Chapter
5. Caching
5.1 Concepts
The proxy server can simply send the content requested by the
client form it copy in cache.
The assumption is that later requests for the same data can be
serviced more quickly by not having to go all the way back to the
original server.
42
Caching
However, in the following subsection, only the first two groups will be
covered.
A. Cache Size
The following are the common parameters used in cache size.
i. cache_mem
Syntax
cache_mem size(MB)
This parameter specifies the amount of cache memory (RAM)
used to store in-transit object (ones that are currently being
used), hot objects (one that are used often) and negative-cached
object (recent failed request). Default size value is 8MB.
Example:
cache_mem 16 MB
ii. maximum_object_size
Syntax
maximum_object_size
size(MB)
This parameter used if you want not to cache file that are
larger or equal to the size set. Default size value is 4MB.
Example:
maximum_object_size 8 MB
43
Caching
iii. ipcache_size
Syntax
ipcache_size
size(MB)
percentage
44
Caching
Example:
ipcache_low 90
B. Cache Directories
i. cache_dir
Syntax
cache_dir
type dir
size(MB)
L1
L2
Caching
Example :
x = 6GB
= 6 * 1024 *1024 = 6291456 KB
so ;
x / y / z = 6291456 / 13 / 256
= 1890
and
L1 * L2 = x / y / z
L1 * 256 = 1890
L1
= 7
ii. access_log
Syntax
cache_log dir
This parameter specifies the location where the HTTP and ICP
accesses are stored. The default dir /var/log/squid/access.log is
always used.
Example:
cache_log /var/log/squid/access.log
46
Chapter
47
48
This will copy all of the Webmin files to the appropriate locations and
run the install script with appropriate default values. For example, the
Webmin perl files will be installed in /usr/libexec/webmin while the
configuration files will end up in /etc/webmin. Webmin will then be
started on port 10000. You may log in using root as the login name
and your system root password as the password. It's unlikely you will
need to change any of these items from the command line, because
they can all be modified using Webmin. If you do need to make any
changes, you can do so in miniserv.conf in /etc/webmin.
After Installation
After
installation,
your
Webmin
installation
will
behave
nearly
49
50
Proxy port
Sets the network port on which Squid operates. This option is usually
3128 by default and can almost always be left on this address, except
when multiple Squids are running on the same system, which is
usually ill-advised. This option corresponds to the http_port option in
squid.conf.
ICP port
This is the port on which Squid listens for Internet Cache Protocol, or
ICP, messages. ICP is a protocol used by web caches to communicate
and share data. Using ICP it is possible for multiple web caches to
share cached entries so that if any one local cache has an object, the
distant origin server will not have to be queried for the object. Further,
cache hierarchies can be constructed of multiple caches at multiple
privately interconnected sites to provide improved hit rates and higherquality web response for all sites. More on this in later sections. This
option correlates to the icp_port directive.
51
subnets,
Squid
host.
This
option
correlates
to
the
udp_incoming_address directive.
Multicast groups
The multicast groups that Squid will join to receive multicast ICP
requests. This option should be used with great care, as it is used to
configure your Squid to listen for multicast ICP queries. Clearly if your
server is not on the MBone, this option is useless. And even if it is, this
may not be an ideal choice.
52
53
though
usually
not
recommended,
to
implement
54
All other things being equal, a cache that is not overloaded will
perform better (with regard to hit ratio) with a larger number of
clients. Simply put, a larger client population leads to a higher quality
of cache content, which in turn leads to higher hit ratios and improved
bandwidth savings. So, whenever it is possible to increase the client
population without overloading the cache, such as in the case of a
cache mesh, it may be worth considering. Again, this type of hierarchy
can be improved upon by the use of Cache Digests, but ICP is usually
simpler to implement and is a widely supported standard, even on
non-Squid caches.
Finally, ICP is also sometimes used for load balancing multiple caches
at the same site. ICP, or even Cache Digests for that matter, are
almost never the best way to implement load balancing. Using ICP for
load balancing can be achieved in a few ways.
Through have several local siblings, which can each provide hits
to the others clients, while the client load is evenly divided
across the number of caches.
55
Proxy port is the port where the neighbor cache normally listens for
client traffic, which defaults to 3128.
Hostname
The name or IP address of the neighbor cache you want your cache to
communicate with. Note that this will be one-way traffic. Access
Control Lists, or ACLs, are used to allow ICP requests from other
caches. ACLs are covered later. This option plus most of the rest of the
options on this page correspond to cache_ peer lines in squid.conf.
56
Type
The type of relationship you want your cache to have with the neighbor
cache. If the cache is upstream, and you have no control over it, you
will need to consult with the administrator to find out what kind of
relationship you should set up. If it is configured wrong, cache misses
will likely result in errors for your users. The options here are sibling,
parent, and multicast.
Proxy port
The port on which the neighbor cache is listening for standard HTTP
requests. Even though the caches transmit availability data via ICP,
actual web objects are still transmitted via HTTP on the port usually
used for standard client traffic. If your neighbor cache is a Squid-based
cache, then it is likely to be listening on the default port of 3128. Other
common ports used by cache servers include 8000, 8888, 8080, and
even 80 in some circumstances.
ICP port
The port on which the neighbor cache is configured to listen for ICP
traffic. If your neighbor cache is a Squid-based proxy, this value can
be found by checking the icp_port directive in the squid.conf file on
the neighbor cache. Generally, however, the neighbor cache will listen
on the default port 3130.
57
Proxy only?
A simple yes or no question to tell whether objects fetched from the
neighbor cache should be cached locally. This can be used when all
caches are operating well below their client capacity, but disk space is
at a premium or hit ratio is of prime importance.
Default cache
This is switched to Yes if this neighbor cache is to be the last-resort
parent cache to be used in the event that no other neighbor cache is
present as determined by ICP queries. Note that this does not prevent
it from being used normally while other caches are responding as
expected. Also, if this neighbor is the sole parent proxy, and no other
route to the Internet exists, this should be enabled.
Round-robin cache?
Choose whether to use round-robin scheduling between multiple
parent caches in the absence of ICP queries. This should be set on all
parents that you would like to schedule in this way.
58
ICP time-to-live
Defines the multicast TTL for ICP packets. When using multicast ICP, it
is usually wise for security and bandwidth reasons to use the minimum
tty suitable for your network.
Cache weighting
Sets the weight for a parent cache. When using this option it is
possible to set higher numbers for preferred caches. The default value
is 1, and if left unset for all parent caches, whichever cache responds
positively first to an ICP query will be sent a request to fetch that
object.
Closest only
Allows
you
to
specify
that
your
cache
wants
only
No digest?
Chooses whether this neighbor cache should send cache digests. No
NetDB exchange When using ICP, it is possible for Squid to keep a
database of network information about the neighbor caches, including
availability and RTT, or Round Trip Time, information. This usually
allows Squid to choose more wisely which caches to make requests to
when multiple caches have the requested object.
59
No delay?
Prevents accesses to this neighbor cache from affecting delay pools.
Delay pools, discussed in more detail later, are a means by which
Squid can regulate bandwidth usage. If a neighbor cache is on the
local network, and bandwidth usage between the caches does not need
to be restricted, then this option can be used.
Login to proxy
Select this if you need to send authentication information when
challenged by the neighbor cache. On local networks, this type of
security is unlikely to be necessary.
Multicast responder
Allows Squid to know where to accept multicast ICP replies. Because
multicast is fed on a single IP to many caches, Squid must have some
way of determining which caches to listen to and what options apply to
that particular cache. Selecting Yes here configures Squid to listen for
multicast replies from the IP of this neighbor cache.
section
provides
configuration
options
for
general
ICP
from
your
neighbor
caches.
This
option
sets
the
hierarchy_stoplist directive.
61
62
Memory Usage
This page provides access to most of the options available for
configuring the way Squid uses memory and disks (Figure 6-4). Most
values on this page can remain unchanged, except in very high load or
low resource environments, where tuning can make a measurable
difference in how well Squid performs.
Gambar memory usage
6.7 Logging
Squid provides a number of logs that can be used when debugging
problems and when measuring the effectiveness and identifying users
and the sites they visit (Figure 6-5). Because Squid can be used to
snoop on users browsing habits, one should carefully consider
privacy laws in your region and, more importantly, be considerate to
your users. That being said, logs can be very valuable tools in ensuring
that your users get the best service possible from your cache.
64
useful when tracking cache usage and solving problems. Because there
are several effective tools for parsing and generating reports from the
Squid standard access logs, it is usually preferable to leave this at its
default of being off. This option configures the emulate_httpd_log
directive. The Calamaris cache access log analyzer does not work if
this option is enabled.
Logging netmask
Defines what portion of the requesting client IP is logged in the
access.log. For privacy reasons it is often preferred to only log the
network or subnet IP of the client. For example, a netmask of
255.255.255.0 will log the first three octets of the IP, and fill the last
octet with a zero. This option configures the client_netmask directive.
66
The directive is cache_dir while the options are the type of filesystem,
the path to the cache directory, the size allotted to Squid, the number
of top level directories, and finally the number of second level
directories. In the example, I've chosen the filesystem type ufs, which
is a name for all standard UNIX filesystems. This type includes the
standard Linux ext2 filesystem as well. Other possibilities for this
option include aufs and diskd.
The next field is simply the space, in megabytes, of the disk that you
want to allow Squid to use. Finally, the directory fields define the upper
and lower level directories for Squid to use
67
68
Edit an ACL
To edit an existing ACL, simply click on the highlighted name. You will
then be presented with a screen containing all relevant information
about the ACL. Depending on the type of the ACL, you will be shown
different data entry fields. The operation of each type is very similar,
so for this example, you'll step through editing of the localhost ACL.
Clicking the localhost button presents the page that's shown in Figure
6-8
69
The title of the table is Client Address ACL which means the ACL is of
the Client Address type, and tells Squid to compare the incoming IP
address with the IP address in the ACL. It is possible to select an IP
based on the originating IP or the destination IP. The netmask can also
be used to indicate whether the ACL matches a whole network of
addresses, or only a single IP. It is possible to include a number of
addresses, or ranges of addresses in these fields. Finally, the Failure
URL is the address to send clients to if they have been denied access
due to matching this particular ACL. Note that the ACL by itself does
nothing, there must also be a proxy restriction or ICP restriction rule
that uses the ACL for Squid to use the ACL.
70
71
72
External Auth
This ACL type calls an external authenticator process to decide whether
the request will be allowed. Note that authentication cannot work on a
transparent proxy or HTTP accelerator. The HTTP protocol does not
provide for two authentication stages (one local and one on remote
Web sites). So in order to use an authenticator, your proxy must
operate as a traditional proxy, where a client will respond appropriately
to a proxy authentication request as well as external Web server
authentication requests. This correlates to the proxy_auth directive.
Proxy IP Address
The local IP address on which the client connection exists. This allows
ACLs to be constructed that only match one physical network, if
multiple interfaces are present on the proxy, among other things. This
option configures the myip directive.
Request Method
This ACL type matches on the HTTP method in the request headers.
This includes the methods GET, PUT, etc. This corresponds to the
method ACL type directive.
73
information.
It
does
not
include,
for
example,
the
in
the
form
"192.168.1.1-192.168.1.25".
This
option
74
6.10
Administrative Options
75
options
correlate
to
the
cache_effective_user
and
cache_effective_group directives.
realm
that
will
be
reported
to
clients
when
performing
Unique hostname
Configures the unique_hostname directive, and sets a unique host
name for Squid to report in cache clusters in order to allow detection of
forwarding loops. Use this if you have multiple machines in a cluster
with the same Visible Hostname.
Cache announce host, port and file
The host address and port that Squid will use to announce its
availability to participate in a cache hierarchy. The cache announce file
is simply a file containing a message to be sent with announcements.
76
Announcement period
Configures the announce_period directive, and refers to the frequency
at which Squid will send announcement messages to the announce
host.
Most of the content in Chapter 6 is taken from Unix System Administration with
Webmin by Joe Cooper (2002) available online at
http://www.swelltech.com/support/webminguide/
77
Chapter
7. Analyzer
7.1 Structure of log file
In Fedora, the Squid log files are stored in the /var/log/squid directory
by default. It makes 3 log files which are:
Access log
Cache log
Store log
Access log
Location : /var/log/squid/access.log
Description
It contains entries of each time the cache has been hit or missed
when a client requests HTTP content.
78
Analyzer
The identity of the host making the request (IP address) and the
content they are requesting.
Format
Option 1 : This option will be used if the emulate http daemon log is
off.
Native format (emulate_httpd_log off)
Timestamp Elapsed Client Action/Code Size Method URI Ident Hierarchy/From Content
Option 2 : This option will be used if the emulate http daemon log is
on.
Common format (emulate_httpd_log on)
Client Ident - [Timestamp1] "Method URI" Type Size
With:
Timestamp
The time when the request is completed (socket closed). The format is
"Unix time" (seconds since Jan 1, 1970) with millisecond resolution.
Timestamp1
When the request is completed
(Day/Month/CenturyYear:Hour:Minute:Second GMT-Offset)
Elapsed
The elapsed time of the request, in milliseconds. This is the time
between the accept() and close() of the client socket.
79
Analyzer
Client
The IP address of the connecting client, or the FQDN if the 'log_fqdn'
option is enabled in the config file.
Action
The Action describes how the request was treated locally (hit, miss,
etc).
Code
The HTTP reply code taken from the first line of the HTTP reply header.
For ICP requests this is always "000." If the reply code was not given,
it will be logged as "555."
Size
For TCP requests, the amount of data written to the client. For UDP
requests, the size of the request. (in bytes)
Method
The HTTP request method (GET, POST, etc), or ICP_QUERY for ICP
requests.
URI
The requested URI.
Ident
The result of the RFC931/ident lookup of the client username. If
RFC931/ident lookup is disabled (default: `ident_lookup off'), it is
logged as - .
Hierarchy
A description of how and where the requested object was fetched.
80
Analyzer
From
Hostname of the machine where we got the object
Content
Content-type of the Object (from the HTTP reply header).
The example of access.log file.
From Figure 7-1, we know that the native format has been used. Here,
we try to understand each format fields over the contents of access.log
file. By taking the first line, we found the result as in Table 7-1
Format
Value
Timestamp
1173680297.727
Elapsed
450
Client
10.0.5.10
Action
TCP_MISS
Code
302
Size
786
Method
GET
URI
http://www.google.com/search?
Ident
Hierarchy
DIRECT
From
64.233.189.104
Content
text/html
Table 7-1 The format and its value
81
Analyzer
$&/e;
Action
The TCP_ codes (Table 7-2) refer to requests on the HTTP port
(usually 3128). Meanwhile the UDP_ codes refer to requests on
the ICP port (usually 3130)
Codes
Explanation
TCP_HIT
TCP_MISS
TCP_REFRESH_HIT
TCP_REF_FAIL_HIT
TCP_REFRESH_MISS
82
Analyzer
some
analogous
cache
control
TCP_SWAPFAIL_MISS
TCP_NEGATIVE_HIT
believes
inaccessible.
to
Also
know
refer
that
to
it
is
the
TCP_MEM_HIT
TCP_DENIED
TCP_OFFLINE_HIT
UDP_HIT
Analyzer
UDP_DENIED
UDP_INVALID
UDP_MISS_NOFETCH
NONE
Seen
with
errors
and
cachemgr
requests
Code
These codes are taken from RFC 2616 and verified for Squid.
Squid-2 uses almost all codes except 307 (Temporary Redirect),
416 (Request Range Not Satisfiable) and 417 (Expectation
Failed)
Code
84
Explanation
000
100
Continue
101
Switching Protocols
102
Processing
200
OK
Analyzer
201
Created
202
Accepted
203
Non-Authoritative Information
204
No Content
205
Reset Content
206
Partial Content
207
Multi Status
300
Multiple Choices
301
Moved Permanently
302
Moved Temporarily
303
See Other
304
Not Modified
305
Use Proxy
[307
Temporary Redirect]
400
Bad Request
401
Unauthorized
402
Payment Required
403
Forbidden
404
Not Found
405
406
Not Acceptable
407
408
Request Timeout
409
Conflict
410
Gone
411
Length Required
412
Precondition Failed
413
414
415
85
Analyzer
[416
[417
Expectation Failed]
*424
Locked
*424
Failed Dependency
*433
Unprocessable Entity
500
501
Not Implemented
502
Bad Gateway
503
Service Unavailable
504
Gateway Timeout
505
*507
600
Insufficient Storage
Method
method
defined
cachabil. meaning
GET
HTTP/0.9
possibly
object
retrieval
and
simple
searches
HEAD
HTTP/1.0
possibly
POST
HTTP/1.0
CC
metadata retrieval
Exp.
PUT
HTTP/1.1
never
DELETE
HTTP/1.1
never
TRACE
HTTP/1.1
never
86
Analyzer
OPTIONS
HTTP/1.1
never
CONNECT
HTTP/1.1r3 never
ICP_QUERY Squid
never
PURGE
Squid
never
PROPFIND
rfc2518
PROPATCH
rfc2518
MKCOL
rfc2518
never
COPY
rfc2518
never
MOVE
rfc2518
never
LOCK
rfc2518
never
Lock
an
object
against
modifications
UNLOCK
rfc2518
never
unlock an object
87
Analyzer
Hierarchy
The following hierarchy codes are used in Squid-2 (Table 7-4):
Codes
Explanation
NONE
DIRECT
SIBLING_HIT
PARENT_HIT
DEFAULT_PARENT
SINGLE_PARENT
FIRST_UP_PARENT
NO_PARENT_DIRECT
FIRST_PARENT_MISS
88
Analyzer
trip time.
CLOSEST_PARENT_MISS
This
parent
was
chosen,
because
it
CLOSEST_DIRECT
NO_DIRECT_FAIL
The
object
could
not
be
requested
ROUNDROBIN_PARENT
CACHE_DIGEST_HIT
CD_PARENT_HIT
Analyzer
NO_CACHE_DIGEST_DIR
ECT
CARP
ANY_PARENT
part of src/peer_select.c:hier_strings[].
INVALID CODE
part of src/peer_select.c:hier_strings[].
Cache log
Location : /var/log/squid/cache.log
Description
Format
[Timestamp1]| Message
With
Timestamp1
When the event occurred (Year/Month/Day Hour:Minute:Second)
90
Analyzer
Message
Errors
ERR_READ_TIMEOUT
The
remote
site
or
network
is
ERR_NO_CLIENTS_BIG_OBJ
All
clients
went
away
before
ERR_CLIENT_ABORT
Client
dropped
connection
before
ERR_INVALID_REQ
ERR_UNSUP_REQ
Unsupported request
ERR_INVALID_URL
ERR_NO_FDS
ERR_DNS_FAIL
91
Analyzer
ERR_NOT_IMPLEMENTED
ERR_CANNOT_FETCH
ERR_NO_RELAY
ERR_DISK_IO
ERR_ZERO_SIZE_ OBJECT
The
remote
server
closed
the
ERR_FTP_DISABLED
This
cache
is
configured
to
NOT
Access
Denied.
The
user
must
92
Analyzer
Store log
Location : /var/log/squid/store.log
Description
Format
Timestamp Tag Code Date LM Expire Content Expect/Length Methods Key
With:
Timestamp
The time entry was logged. (Millisecond resolution since 00:00:00 UTC,
January 1, 1970)
Tag
SWAPIN (swapped into memory from disk), SWAPOUT (saved to disk)
or RELEASE (removed from cache)
Code
The HTTP replies code when available. For ICP requests this is always
"0". If the reply code was not given, it will be logged as "555."
93
Analyzer
The following three fields are timestamps parsed from the HTTP reply
headers. All are expressed in Unix time (i.e.(seconds since 00:00:00
UTC, January 1, 1970). A missing header is represented with -2 and an
unparsable header is represented as -1.
Date
The time captures from the HTTP Date reply header. If the Date
header is missing or invalid, the time of the request will be used
instead.
LM
The value of the HTTP Last-Modified: reply header.
Expires
The value of the HTTP Expires: reply header.
Content
The HTTP Content-Type reply header.
Expect
The value of the HTTP Content-Length reply header. The Zero value
will be returned if the Content-Length was missing.
/Length
The number of bytes of content actually read. If the Expect is nonzero, and not equal to the Length, the object will be released from the
cache.
Method
The request method (GET, POST, etc).
94
Analyzer
Key
The cache key. Often this is simply the URL. Cache objects which never
become public will have cache keys that include a unique integer
sequence number, the request method, and then the URL.
( /[post|put|head|connect]/URI )
The example of store.log file (Figure 7-3).
Based on Figure 7-3, we try to understand each format fields over the
contents of store.log file. By taking the second line, we found that
(Table 7-6):
Format
Value
Timestamp
1173680297.727
Tag
Release
Code
-1
Date
FFFFFFFF
LM
7832CBDDD1604B89D0F75A2437F37AD7
Expire
302
Content
1173680306 -1 -1 text/html
Expect
-1
/Length
/278
Methode
GET
Key
http://www.google.com/search?
Table 7-6 Format in Store.log
95
Analyzer
7.2 Methods
Log Analysis Using Grep Command
The log files also can be analysed using Linux or UNIX command such
as grep. It is used to filter the required information from any log files.
By using a terminal, follow the following commands in order to start
analysis the related log file.
For example:
# cat /var/log/squid/access.log | grep www.google.com
By referring Figure 7-4, the output shows the result of grep command
for the access.log file. The same technique can be applied for cache.log
and store.log files.
Analyzer
/installer
cd /installer/sar-2.2.3.1
./configure
make
make install
NOTE: Make sure the Squid already started before run the following
script.
97
Analyzer
98
Analyzer
99
Analyzer
100
Analyzer
Analyzer
102
Analyzer
103
Analyzer
2.
file
messages.
104
contains
many
important
status
and
debugging
Analyzer
From Figure 7-12, throughout this example we found that there are
three (3) reports generated. Basically, the latest version has no
number at the end of the filename. Each time the access log file being
analysed, the filename will renamed and an incremental number will be
placed automatically at the end of the file.
For example 2007Mar22-2007Mar22.2 was the first report had been
generated compared to 2007Mar22-2007Mar22 which indicated as the
latest version report.
105
Analyzer
Based on (Figure 7-13), the index.html file shows the list of reports
that have been generated by Sarg. To get more detail information for a
specific report, we need to click on the selected file name.
106
Analyzer
2.
Figure 7-15 Index html
107
Analyzer
3. Denied html
4. Download html
108
Analyzer
109
Analyzer
110