Professional Documents
Culture Documents
One of the wonders of Unix is the ability to create scripts which reduce the number of commands
that you have to type to get something done. I have a short script on all the machines I maintain
called rvi. Using rvi instead of vi allows me to use one command to edit files under RCS (as
opposed to the customary four). Put this file somewhere in your path and make it executable chmod
+x rvi. You can then simply use a command like rvi squid.conf to edit files that are under revision
control. This is a lot quicker than running each of the co, rcsdiff and ci commands.
#!/bin/sh
co -l $1
$VISUAL $1
rcsdiff -u $1
ci -u $1
The first option to the cache_dir tag sets the directory where data will be stored. The prefix value
simply has /cache/ tagged onto the end and it's used as the default directory. This directory is also
made by the make install command that we used earlier.
The next option to cache_dir is straight forward: it's a size value. Squid will store up to that amount
of data in that directory. The value is in megabytes, so of the cache store. The default is 100
megabytes.
The other two options are more complex: they set the number of subdirectories (first and second
tier) to create in this directory. Squid makes lots of directories and stores a few files in each of them
in an attempt to speed up disk access (finding the correct entry in a directory with one million files
in it is not efficient: it's better to split the files up into lots of smaller sets of files... don't worry too
much about this for the moment). I suggest that you use the default values for these options in the
mean time: if you have a very large cache store you may want to increase these values, but this is
covered in the section on
Rule sets like the above are great for small organisations: they are straightforward. Note that as
http_access and icp_access rules are processed in the order they appear in the file, you will need to
place the http_access and icp_access entries as is appropriate.
For large organizations, though, things are more convenient if you can create classes of users. You
can then allow or deny classes of users in more complex relationships. Let's look at an example like
this, where we duplicate the above example with classes of users:
Sure, it's more complex for this example. The benefits only become apparent if you have large
access lists, or when you want to integrate refresh-times (which control how long objects are kept)
and the sources of incoming requests. I am getting quite far ahead of myself, though, so let's skip
back.
We need some terminology to discuss access control lists, otherwise this could become a rather long
chapter. So: lines beginning with acl are (appropriately, I believe) acl lines. The lines that use these
acls (such as http_access and icp_access in the above example) are called acl-operators. An acl-
operator can either allow or deny a request.
So, to recap: acls are used to define classes. When Squid accepts a request it checks the list of acl-
operators specific to the type of request: an HTTP request causes the http_access lines to be
checked; an ICP request checks the icp_access lists.
Acl-operators are checked in the order that they occur in the file (ie from top to bottom). The first
acl-operator line that matches causes Squid to drop out of the acl list. Squid will not check through
all acl-operators if the first denies the request.
In the previous example, we used a src acl: this checks that the source of the request is within the
given IP range. The src acl-type accepts IP address lists in many formats, though we used the
subnet/netmask in the earlier example. CIDR (Classless Internet Domain Routing) notation can also
be used here. Here is an example of the same address range in either notation:
CIDR: 192.168.1.0/24
Subnet/Netmask (Dot Notation): 192.168.1.0/255.255.255.0
Access control lists inherit permissions when there is no matching acl If all acl-operators in the file
are checked, and no match is found, the last acl-operator checked determines whether the request is
allowed or denied. This can be confusing, so it's normally a good idea to place a final catch-all acl-
operator at the end of the list. The simplest way to create such an operator is to create an acl that
matches any IP address. This is done with a src acl with a netmask of all 0's. When the netmask
arithmetic is done, Squid will find that any IP matches this acl.
Your cache server may well be on the network placed in the relevant allow lists on your cache, and
if you were thus to run the client on the cache machine (as opposed to another machine somewhere
on your network) the above acl and http_access rules would allow you to test the cache. In many
cases, however, a program running on the cache server will end up connecting to (and from) the
address '127.0.0.1' (also known as localhost). Your cache should thus allow requests to come from
the address 127.0.0.1/255.255.255.255. In the below example we don't allow icp requests from the
localhost address, since there is no reason to run two caches on the same machine.
The squid.conf file that comes with Squid includes acls that deny all HTTP requests. To use your
cache, you need to explicitly allow incoming requests from the appropriate range. The squid.conf
file includes text that reads:
#
# INSERT YOUR OWN RULE(S) HERE TO ALLOW ACCESS FROM YOUR CLIENTS
#
To allow your client machines access, you need to add rules similar to the below in this space. The
default access-control rules stop people exploiting your cache, it's best to leave them in. by ace
Squid always attempts to cache pages. If you have a large Intranet system, it's a waste of cache store
disk space to cache your Intranet. Controlling which URLs and IP ranges not to cache are covered
in detail in chapter 6, using the no_cache acl operator.
[edit] Communicating with other proxy
servers
Squid supports the concept of a hierarchy of proxies. If your proxy does not have an object on disk,
its default action is to connect to the origin web server and retrieve the page. In a hierarchy, your
proxy can communicate with other proxies (in the hope that one of these servers will have the
relevant page). You will, obviously, only peer with servers that are 'close' to you, otherwise you
would end up slowing down access. If access to the origin server is faster than access to
neighboring cache servers it is not a good idea to get the page from the slower link!
Having the ability to treat other caches as siblings is very useful in some interactions. For example:
if you often do business with another company, and have a permanent link to their premises, you
can configure your cache to communicate with their cache. This will reduce overall latency: it's
almost certainly faster to get the page from them than from the other side of the country.
When querying more than one cache, Squid does not query each in turn, and wait for a reply from
the first before querying the second (since this would create a linear slowdown as you add more
siblings, and if the first server stops responding, you would slow down all incoming requests).
Squid thus sends all ICP queries together - without waiting for replies. Squid then puts the client's
request on hold until the first positive reply from a sibling cache is received, and will retrieve the
object from the fastest-replying cache server. Since the earliest returning reply packet is usually on
the fastest link (and from the least loaded sibling server), your server gets the page fast.
Squid will always get the page from the fastest-responding cache - be it a parent or a sibling.
The cache_peer option allows you to specify proxy servers that your server is to communicate with.
The first line of the following example configures Squid to query the cache machine
cache.myparent.example as a parent. Squid will communicate with the parent on HTTP port 3128,
and will use ICP to query the server using port 3130. Configuring Squid to query more than one
server is easy: simply add another cache_peer line. The second line configures
cache.sibling.example as a sibling, listening for HTTP request on port 8080 and ICP queries on port
3130.
cache_peer cache.myparent.example parent 3128 3130
cache_peer cache.sibling.example sibling 8080 3130
If you do not wish to query any other caches, simply leave all cache_peer lines commented out: the
default is to talk directly to origin servers.
Cache peering and hierarchy interactions are discussed in quite some detail in this book. In some
cases hierarchy setups are the most difficult part of your cache setup process (especially in a
distributed environment like a nationwide ISP). In depth discussion of hierarchies is beyond the
scope of this chapter, so much more information is given in chapter 8. There are cases, where you
need at least one hierarchy line to get Squid to work at all. This section covers the basics, just for
those setups.
You only need to read this material if one of the following scenarios applies to you:
• You have to use your Internet Service Provider's cache.
• You have a firewall.
The default option essentially tells Squid "Go through this cache for all requests. If it's down, return
an error message to the client: you cannot go direct".
The no-query option gets Squid to ignore the given ICP port (leaving the port number out will
return an error), and never to attempt to query the cache with ICP.
[edit] Inside
The cache is considered a trusted host, and is protected by the firewall. You will configure client
machines to use the cache server in their browser proxy settings, and when a request is made, the
cache server will pass the outgoing request to a firewall, treating the firewall as a parent proxy
server. The firewall will then connect to the destination server. If you have a large number of clients
configured to use the firewall as their proxy server, you could get the firewall to hand-off incoming
HTTP requests back into the network, to the cache server. This is less efficient though, since the
cache will then have to re-pass these requests through the firewall to get to the outside, using the
parent option to cache_peer. Since the latter involves traffic passing through the firewall twice, your
load is very likely to increase. You should also beware of loops, with the cache server parenting to
the firewall and the firewall handing-off the cache's request back to the cache!
As described in chapter 1, Squid will also send ICP queries to parents. Firewalls don't care for UDP
packets, and normally log (and then discard) such packets.
When Squid does not receive a response from a configured parent, it will mark the parent as down,
and proceed to go directly.
Whenever Squid is setup to use a parent that does not support ICP, the cache_peer line should
include the "default" and "no-query" options. These options stop Squid from attempting to go direct
when all caches are considered down, and specify that Squid is not to send ICP requests to that
parent.
Here is an example config entry:
cache_peer inside.fw.address.domain parent 3128 3130 default no-query
[edit] Outside
There are only two major reasons for you to put your cache outside the firewall:
One: Although squid can be configured to do authentication, this can lead to the duplication of
effort (you will encounter the "add new staff to 500 servers" syndrome). If you want to continue to
authenticate users on the firewall, you will have to put your cache on the outside or on the DMZ.
The firewall will thus accept requests from clients, authenticate them, and then pass them on to the
cache server.
Two: Communicating with cache hierarchies is easy. The cache server can communicate with other
systems using any protocol. Sibling caches, for example, are difficult to contact through a proxying
firewall.
You can only place your cache outside if your firewall supports hand-offs. Browsers inside will
connect to the firewall and request a URL, and the firewall will connect to the outside cache and
request the page.
If you place your cache outside your firewall, you may find that your client PCs have problems
connecting to internal web servers (your intranet, for example, may be unreachable). The problem is
that the cache is unable to connect back through to your internal network (which is actually a good
thing: don't change that). The best thing to do here is to add exclusions to your browser settings: this
is described in Chapter 5 - you should specifically have a look at the section on browser autoconfig.
In the meantime, let's just get Squid going, and we will configure browsers once you have a cache
to talk to.
Since the cache is not protected by the firewall, it must be very carefully configured - it must only
accept requests from the firewall, and must not run any strange services. If possible, you should
disable telnet, and use something like SSH (Secure SHell) instead. The access control lists (which
you will setup shortly) must only allow the firewall, otherwise people will be able to relay their
requests through your cache, using your bandwidth.
If you place the cache outside the firewall, your client PCs will be configured to use the firewall as
their proxy server (this is probably the case already). The firewall must be configured to hand-off
client HTTP requests to the cache server. The cache must be configured to only allow HTTP
requests when from the firewall's outside IP address. If not configured this way, other Internet users
could use your cache server as a relay, using your bandwidth and hardware resources for
illegitimate (and possibly illegal) purposes.
With your cache server on the outside network, you should treat the machine as a completely
untrusted host, lest a cracker find a hole somewhere on the system. It is recommended that you
place the cache server on a dedicated firewall network card, or on a switched ethernet port. This
way, if your cache server were to be cracked, the cracker would only be able to read passing HTTP
data. Since the majority of sensitive information is sent via email, this would reduce the potential
for sensitive data loss.
Since your cache server only accepts requests from the firewall, there is no cache_peer line needed
in the squid.conf. If you have to talk to your ISP's cache you will, of course, need one: see the
section on this a bit further back.
[edit] DMZ
The best place for a cache is your DMZ.
If you are concerned with the security of your cache server, and want to be able to communicate
with outside cache servers (using ICP), you may want to put your cache on the DMZ.
With Squid in your DMZ, internal client PCs are setup to proxy to the firewall. The firewall is then
responsible for handing-off these HTTP requests to the cache server (so the firewall in fact treats the
cache server as a parent).
Since your cache server is (essentially) on the outside of the firewall, the cache doesn't need to treat
the firewall as a parent or sibling: it only accepts requests from the firewall: it never passes them to
the firewall.
If your cache is outside your firewall, you will need to configure your client PCs not to use the
firewall as a proxy server for internal hosts. This is quite easy, and is discussed in the chapter on
browser configuration.
Since the firewall is acting as a filter between your cache and the outside world, you are going to
have to open up some ports on the firewall. The cache will need to be able to connect to port 80 on
any machine on the outside world. Since some valid web servers will run on ports other than 80,
you should consider allowing connections to any port from the cache server. In short, allow
connections to:
• Port 80 (for normal HTTP requests)
• Port 443 (for HTTPS requests)
• Ports higher than 1024 (site search engines often use high-numbered ports)
If you are going to communicate with a cache server outside the firewall, you will need even more
ports opened. If you are going to communicate with ICP, you will need to allow UDP traffic from
and to your cache machine on port 3130. You may find that the cache server that you are peering
with uses different ports for reply packets. It's probably a bad idea to open all UDP traffic, though.
Each concurrent incoming request uses at least one filedescriptor. 256 filedescriptors is only enough
for a small, lightly loaded cache server, see Chapter 12 for more details. Most of the following is
diagnostic:
1999/06/12 19:16:20| With 256 file descriptors available
1999/06/12 19:16:20| helperOpenServers: Starting 5 'dnsserver' processes
1999/06/12 19:16:20| Unlinkd pipe opened on FD 13
1999/06/12 19:16:20| Swap maxSize 10240 KB, estimated 787 objects
1999/06/12 19:16:20| Target number of buckets: 15
1999/06/12 19:16:20| Using 8192 Store buckets, replacement runs every 10 seconds
1999/06/12 19:16:20| Max Mem size: 8192 KB
1999/06/12 19:16:20| Max Swap size: 10240 KB
1999/06/12 19:16:20| Rebuilding storage in Cache Dir #0 (DIRTY)
When you connect to an ftp server without a cache, your browser chooses icons to match the files
based on their filenames. When you connect through a cache server, it assumes that the page
returned will be in html form, and will include tags to load any images so that the directory listing
looks normal. Squid adds these tags, and has a collection of icons that it refers clients to. These
icons are stored in /usr/local/squid/etc/icons/. If Squid has permission problems here, you need to
make sure that these files are owned by the appropriate users (in the previous section we set
permissions on the files in this directory.)
1999/06/12 19:16:20| Loaded Icons.
The next few lines are the most important. Once you see the Ready to serve requests line, you
should be able to start using the cache server. The HTTP port is where Squid is waiting for browser
connections, and should be the same as whatever we set it to in the previous chapter. The ICP port
should be 3130, the default, and if you have included other protocols (such as HTCP) you should
see them here. If you see permission denied errors here, it's possible that you are trying to bind to a
low-numbered port (like 80) as a normal user. Try run the startup command as root, or (if you don't
have root access on the machine) choose a high-numbered port. Another common error message at
this stage is Address already in use. This occurs when another process is already listening to the
given port. This could be because Squid is already started (perhaps you are upgrading from an older
version which is being restarted by the RunCache script) or you have some other process listening
on the same port (such as a web server.)
1999/06/12 19:16:20| Accepting HTTP connections on port 3128, FD 35.
1999/06/12 19:16:20| Accepting ICP messages on port 3130, FD 36.
1999/06/12 19:16:20| Accepting HTCP messages on port 4827, FD 37.
1999/06/12 19:16:20| Ready to serve requests.
Once Squid is up-and-running, it reads the cache-store. Since we are starting Squid for the first
time, you should see only zeros for all the numbers below:
1999/06/12 19:16:20| storeRebuildFromDirectory: DIR #0 done!
1999/06/12 19:16:25| Finished rebuilding storage disk.
1999/06/12 19:16:25| 0 Entries read from previous logfile.
1999/06/12 19:16:25| 0 Entries scanned from swap files.
1999/06/12 19:16:25| 0 Invalid entries.
1999/06/12 19:16:25| 0 With invalid flags.
1999/06/12 19:16:25| 0 Objects loaded.
1999/06/12 19:16:25| 0 Objects expired.
1999/06/12 19:16:25| 0 Objects cancelled.
1999/06/12 19:16:25| 0 Duplicate URLs purged.
1999/06/12 19:16:25| 0 Swapfile clashes avoided.
1999/06/12 19:16:25| Took 5 seconds ( 0.0 objects/sec).
1999/06/12 19:16:25| Beginning Validation Procedure
1999/06/12 19:16:26| storeLateRelease: released 0 objects
1999/06/12 19:16:27| Completed Validation Procedure
1999/06/12 19:16:27| Validated 0 Entries
1999/06/12 19:16:27| store_swap_size = 21k
If your cache is running on a different machine you will have to use the -h and -p options. The
following command will connect to the machine cache.qualica.comf on port 8080 and retrieve the
above web page.
The client program can also be used to access web sites directly. As you may remember from
reading Chapter 2, the protocol that clients use to access pages through a cache is part of the HTTP
specification. The client program can be used to send both "normal" and "cache" HTTP requests. To
check that your cache machine can actually connect to the outside world, it's a good idea to test
access to an outside web server.
The next example will retrieve the page at http://www.qualica.com/, and send the html contents of
the page to your terminal.
If you have a firewall between you and the internet, the request may not work, since the firewall
may require authentication (or, if it's a proxy-level firewall and is not doing transparent proxying of
the data, you may explicitly have to tell client to connect to the machine.) To test requests through
the firewall, look at the next section.
A note about the syntax of the next request: you are telling client to connect directly to the remote
site, and request the page /. With a request through a cache server, you connect to the cache (as you
would expect) and request a whole url instead of just the path to a file. In essence, both normal-
HTTP and cache-HTTP requests are identical; one just happens to refer to a whole URL, the other
to a file.
Client can also print out timing information for the download of a page. In this mode, the contents
of the page aren't printed: only the timing information is. The zero in the below example indicates
that Squid is to retrieve the page until interrupted (with Control-C or Break). If you want to retrieve
the page a limited number of times, simply replace the zero with a number.
[edit] Windows
In NT-based Windows systems, Squid NT can be installed as a native service. Simply unzip in the
root of C: and run c:\squid\sbin\squid -i. Rename and edit the files in c:\squid\etc and run net start
squid or start squid via services.msc. Also, make sure to create c:\squid\var\cache and run squid -z
to create swap directories (or you might spend a long time trying to figure out the cryptic "abnormal
program termination" message like I did! :) )
Inverting the last decision is a simple (if not immediately obvious) solution to one of the most
common acl mistakes: not adding a final deny all to the end of your acl list.
With this new knowledge, have a look at the first example in this chapter: you will see why I said
not to use it in your configs. Given that the last operator denies the local network, local people will
not be able to access the cache. The remainder of the Internet, however, will! As discussed in
Chapter 4, the simplest way of creating a catch-all acl is to match requests when they come from
any IP address. When programs do netmask arithmetic a subnet of all zeros will match any IP
address. A corrected version of the first example dispenses with the myNet acl.
Once the cache is considered stable and is moved into production, the config would change.
http_access lines do add a very small amount of overhead, but that's not the only reason to have
simple access rulesets: the fewer rulesets, the easier your setup is to understand. The below example
includes a deny all rule although it doesn't really need one: you may know of the automatic
inversion of the last rule, but someone else working on the cache may not.
You should always end your access lists with an explicit deny. In Squid-2.1 the default config file
does this for you when you insert your HTTP acl operators in the appropriate place.
The acl tag consists of a minimum of three fields: a unique name; an acl type and a decision string.
An acl line can have more than one decision string, hence the [string2] and [string3] in the line
above.
[edit] Type
So far we have discussed only acls that check the source IP address of the connection. This isn't
sufficient for many people: it may be useful for you to allow connections at only certain times, or to
only specific domains, or by only some users (using usernames and passwords). If you really want
to, you can even combine all of the above: only allow connections from users that have the right
password, have the right destination and are going to the right domain. There are quite a few
different acl types: the next section of this chapter discusses all of the different types in detail. In the
meantime, let's finish the description of the structure of the acl line.
The above acl will match when the IP address comes from any IP address between 10.0.0.0 and
10.0.255.255. In recent years more and more people are using Classless Internet Domain Routing
(CIDR) format netmasks, like 10.0.0.0/16. Squid handles both the traditional IP/Netmask and more
recent IP/Bits notation in the src acl type. IP ranges can also be specified in a further format: one
that is Squid specific. (? I need to spend some time hacking around with these: I am not sure of the
layout ?)
acl myNet src addr1-addr2/netmask
http_access allow myNet
Squid can also match connections by destination IP. The layout is very similar: simply replace src
with dst. Here are a couple of examples:
Reverse DNS matches should not be used where security is important. A determined attacker (who
controlled the reverse DNS entries for the attacking host) would be able to manipulate these entries
so that the request comes from your domain. Squid doesn't attempt to check that reverse and
forward DNS entries match, so this option is not recommended.
Squid can also be configured to deny requests to specific domains. Many people implement these
filter lists for pornographic sites. The legal implications of this filtering are not covered here: there
are many, and the relevant law is in a constant state of flux, so advice here would likely be obsolete
in a very short period of time. I suggest that you consult a good lawyer if you want to do something
like this.
The dst acl type allows one to match accesses by destination domain. This could be used to match
urls for popular adult sites, and refuse access (perhaps during specific times).
If you want to deny access to a set of sites, you will need to find out these site's IP addresses, and
deny access to these IP addresses too. If you just put the URL Domain name in, someone
determined to access a specific site could find out the IP address associated with that hostname and
access it by entering the IP address in their browser.
The above is best described with an example. Here, I assume that you want to restrict access to the
site www.adomain.example. If you use either the host of nslookup commands, you would find that
this server has the IP address 10.255.1.2. It's easiest to just have two acls: one for IPs and one for
domains. If the lists get too large, you can simply place them in a file.
Day list is a list of single characters indicating the days that the acl applies to. Using the first letter
of the day would be ambiguous (since, for example, both Tuesday and Thursday start with the same
letter). When the first letter is ambiguous, the second letter is used: T stands for Tuesday, H for
Thursday. Here is a list of the days with their single-letter abreviations:
S - Sunday M - Monday T - Tuesday W - Wednesday H - Thursday F - Friday A - Saturday
Start_hour and end_hour are times written in 24-hour ("military") time (17:00 instead of 5:00).
End_hour must always be larger than start_hour. Unfortunately, this means that you can't simply
write:
acl darkness 17:00-6:00 # won't work
As you can see from the original definition of the time acl, you can specify the day of the week
(with no time), the time (with no day), or both the time and day (?check!?). You can, for example,
create a rule that specifies weekends without specifying that the day starts at midnight and ends at
the following midnight. The following acl will match on either Saturday or Sunday.
acl weekends time SA
The following example is too basic for real-world use. Unfortunately, creating a good example
requires some of the more advanced features of the http_access line; these are covered (with
examples) in the next section of this chapter.
The format is pretty straightforward: a destination port of 443 or 563 is matched by the first acl,
while 80, 21, 443, etc. by the second line. The most complicated section of the examples above is
the end of the line: the text that reads "1025-65535".
The "-" character is used in squid to specify a range. The example thus matches any port from 1025
all the way up to 65535. These ranges are inclusive, so the second line matches ports 1025 and
65535 too.
The only low-numbered ports which Squid should need to connect to are 80 (the HTTP port), 21
(the FTP port), 70 (the Gopher port), 210 (wais) and the appropriate SSL ports. All other low-
numbered ports (where common services like telnet run) do not fall into the 1024-65535 range, and
are thus denied.
The following http_access line denies access to URLs that are not in the correct port ranges. You
have not seen the ! http_access operator before: it inverts the decision. The line below would read
"deny access if the request does not fall in the range specified by acl Safe_ports" if it were written
in english. If the port matches one of those specified in the Safe_ports acl line, the next http_access
line is checked. More information on the format of http_access lines is given in the next section
Acl-operator lines.
http_access deny !Safe_ports
If you were connecting using SSL, the GET word would be replaced with the word CONNECT.
You can control what methods are allowed through the cache using the post acl type. The most
common use is to stop CONNECT type requests to non-SSL ports. The CONNECT method allows
data transfer in any direction at any time: if you telnet to a badly configured proxy, and enter
something like:
CONNECT www.domain.example:23 HTTP/1.1
blank-line
you might end up with a telnet connection to www.domain.example just as if you had telnetted there
from the cache server itself. This can be used get around packet-filters, firewall access lists and
passwords, which is generally considered a bad thing! Since CONNECT requests can be quite easily
exploited, the default squid.conf denies access to SSL requests to non-standard ports (as described
in the section on the port acl-operator.)
Let's assume that you want to stop your clients from POSTing to any sites (note that doing this is
not a good idea, since people using some search engines (for example) would run into problems: at
this stage this is just an example. (?TODO: Example)
[edit] Username
Logs generally show the source IP address of a connection. When this address is on a multiuser
machine (let's use a Unix machine at a university as an example) you cannot pin down a request as
being from a specific user. There could be hundreds of people logged into the Unix machine, and
they could all be using the cache server. Trying to track down a misbehaver is very difficult in this
case, since you can never be sure which user is actually doing what. To solve this problem, the ident
protocol was created. When the cache server accepts a new connection, it can call back to the origin
server (on a low-numbered port, so the reply cannot be faked) to find out who's on the other end of
the connection. This doesn't make any sense on single-user systems: people can just load their own
ident servers (and become daffy duck for a day). If you run multi-user systems then you may want
only certain people on those machines to be able to use the cache. In this case you can use the ident
username to allow or deny access.
One of the best things about Unix is the flexibility you get. If you wanted (for example) only
students in their second year on to have access to the cache servers via your Unix machines, you
could create a replacement ident server. This server could find out which user that has connected to
the cache, but instead of returning the username you could return a string like "third_year" or
"postgrad". Rather than maintaining a list of which students are in on both the cache server and the
central Unix system, you could simple Squid rules, and the ident server could do all the work where
it checks which user is which.
You will also need to create the appropriate password file (/usr/local/squid/etc/passwd in the
example above). This file consists of a username and password pair, one per line, where the
username and password are seperated by a colon (:), just as they are in a Unix /etc/passwd file. The
password is encrypted with the same function as the passwords in /etc/passwd (or /etc/shadow on
newer systems) are. Here is an example password line:
oskar:lKdpxbNzhlo.w
Since the encrypted passwords are the same, and the ncsa_auth module understands the /etc/passwd
or /etc/shadow file format, you could simply copy the system password file periodically. If your
users do not already have passwords in Unix crypt format somewhere, you will have to use the
htpasswd program (in /usr/local/squid/bin/) to generate the appropriate user and password pairs.
[edit] Using the SMB authentication module
Very Simple...
authenticate_ip_ttl 5 minutes
auth_param basic children 5
auth_param basic realm Servidor de Autenticacion!
auth_param basic program /usr/lib/squid/smb_auth -W work_group -I server_name
After you have added this parameter you must edit /usr/local/squid/etc/squid_radius_auth.conf and
change the default hostname of RADIUS server hostname (or IP) and change the key. Restart squid
for it to take effect.
<<note: field-testing above on old squid 2.3 suggests "&" must be omitted>>
Let's work through the fields from left to right. The first word is http_access, the actual acl-operator.
The allow and deny words come next. If you want to deny access to a specific class of users, you
can change the customary allow to deny in the acl line. We have seen where a deny line is useful
before, with the final deny of all IP ranges in previous examples.
Let's say that you wanted to deny Internet access to a specific list of IP addresses during the day.
Since acls can only have one type per acl, you could not create an acl line that matches an IP
address during specific times. By combining more than one acl per acl-operator line, though, you
get the same effect. Consider the following acls:
acl dialup src 10.0.0.0/255.255.255.0
acl work time 08:00-17:00
If you could create an acl-operator that was matched when both the dialup and work acls were true,
clients in the range could only connect during the right times. This is where the aclname2 in the
above acl-operator definition comes in. When you specify more than one acl per acl-operator line,
both acls have to be matched for the acl-operator to be true. The acl-operator function AND's the
results from each acl check together to see if it is to return true of false.
You could thus deny the dialup range cache access during working hours with the following acl
rules:
You can also invert an acl's result value by using an exclamation mark (the traditional NOT value
from many programming languages) before the appropriate acl. In the following example I have
reduced Example 6-4 into one http_access line, taking advantage of the implicit inversion of the last
rule to deny access to all clients.
Since the above example is quite complicated, let's cover it in more detail:
In the above example an IP from the outside world will match the 'all' acl, but not the 'myNet' acl;
the IP will thus match the http_access line. Consider the binary logic for a request coming in from
the outside world, where the IP is not defined in the myNet acl.
Deny http access if ((true) & (!false))
If you consider the relevant matching of an IP in the 10.0.0.0 range, the myNet value is true, the
binary representation is as follows:
Deny http access if ((true) & (!true))
A 10.0.0.0 range IP will thus not match the only http_access line in the squid config file.
Remembering that Squid will default to the opposite of the last match in the file, accesses will be
allowed from the myNet IP range.
The first line uses a regular expression match to find urls that have cgi-bin or ? in the path (since we
are using the urlpath_regex acl type, a site with a name like cgi-bin.qualica.com will not be
matched.) The no_cache acl-operator is then used to eject matching objects from the cache.
Changing the services port does have implications: client programs (like any SNMP management
station software running on the machine) will also use the services file to find out which port they
should connect when forming outgoing requests. If you are running anything other than a simple
SNMP agent on the cache machine, you must not change the /etc/services file: if you do you will
encounter all sorts of strange problems!
Squid doesn't use the /etc/services file, but the port to listen to is stored in the standard Squid config
file. Once the other server is listening on port 3456, we need to get Squid to listen on the standard
SNMP port and proxy requests to port 3456.
First, change the snmp_port value in squid.conf to 161. Since we are forwarding requests to another
SNMP server, we also need to set forward_snmpd_port to our other-server port, port 3456.
The first line is a standard ACL: it returns true if the requested URL has the word abracadabra in it.
The -i flag is used to make the search case-insensitive.
The delay_pool_count variable tells Squid how many delay pools there will be. Here we have only
one pool, so this option is set to 1.
The third line creates a delay pool (delay pool number 1, the first option) of class 1 (the second
option to delay_class).
The first delay class is the simplest: the download rate of all connections in the class are added
together, and Squid keeps this aggregate value below a given maximum value.
The fourth line is the most complex, as if you can see. The delay_parameters option allows you to
set speed limits on each pool. The first option is the pool to be manipulated: since we have only one
pool in this example, this is set to 1. The second option consists of two values: the restore and max
values, separated by a forward-slash (/).
If you download a short file at high speed, you create a so-called burst of traffic. Generally these
short bursts of traffic are not a problem: these are normally html or text files, which are not the real
bandwidth consumers. Since we don't want to slow everyone's access down (just the people
downloading comparitively large files), Squid allows you to configure a size that the download is to
start slowing down at. If you download a short file, it arrives at full speed, but when you hit a
certain threshold the file arrives more slowly.
The restore value is used to set the download speed, and the max value lets you set the size at which
the files are to be slowed down from. Restore is in bytes per second, max is in bytes.
In the above example, downloads proceed at full speed until they have downloaded 16000 bytes.
This limit ensures that small file arrive reasonably fast. Once this much data has been transferred,
however, the transfer rate is slowed to 16000 bytes per second. At 8 bits per byte this means that
connections are limited to 128kilobits per second (16000 * 8).
EXAMPLE
acl all src 0.0.0.0/0.0.0.0
delay_pool_count 1
delay_class 1 2
delay_parameters 1 12500/12500 2500/2500
delay_access 1 allow all
EXAMPLE:
acl all src 0.0.0.0/0.0.0.0
delay_pool_count 1
delay_class 1 3
1. 56000*8 sets your overall limit at 448kbit/s
1. 18750*8 sets your per-network limit at 150kbit/s
1. 500*8 sets your per-user limit at 4kbit/s
delay_parameters 1 56000/56000 18750/18750 500/500
delay_access 1 allow all
[edit] Conclusion
Once your acl system is correctly set up, your cache should essentially be ready to become a
functional part of your infrastructure. If you are going to use some of the advanced Squid features
(like transparent operation mode, for example), (? Unintended truncation? ?)
Retrieved from "http://www.deckle.co.za/squid-users-
guide/Access_Control_and_Access_Control_Operators"