Simple Squid access log reporting.

Squid is one of the biggest and most used proxies on the interwebs. And generating reports from the access logs is already a done deal, there are many commercial and OSS apps that support the squid log format. But I found my self in a situation where I wanted stats but didn’t want to install a web server on my proxy or use syslog to push my logs to a centralised server which was running such software, and also wasn’t in a position to go buy one of those off the shelf amazing wiz bang Squid reporting and graphing tools.

As a Linux geek I surfed the web to see what others have done. I came across a list provided by the Squid website. Following a couple of links, I came across a awk script called ‘proxy_stats.gawk’ written by Richard Huveneers.

I downloaded it and tried it out… unfortunately it didn’t work, looking at the code.. which he nicely commented showed that he had it set up for access logs  from version 1.* of squid. Now the squid access log format from squid 2.6+ hasn’t changed too much from version 1.1. all they have really done is add a “content type” entry at the end of each line.

So as a good Linux geek does, he upgrades the script, my changes include:

  • Support for squid 2.6+
  • Removed the use a deprecated switches that now isn’t supported in the sort command.
  • Now that there is a an actual content type “column” lets use it to improve the ‘Object type report”.
  • Add a users section, as this was an important report I required which was missing.
  • And in a further hacked version, an auto generated size of the first “name” column.

Now with the explanation out of the way, let me show you it!

For those who are new to awk, this is how I’ve been running it:

zcat <access log file> | awk -f proxy_stats.gawk > <report-filename>

NOTE: I’ve been using it for some historical analysis, so I’m running it on old rotated files, which are compressed thus the zcat.

You can pass more then one file at a time and it order doesn’t matter, as each line of an access log contains the date in epoch time:

zcat `find /var/log/squid/ -name "access.log*"` |awk -f proxy_stats.gawk

The script produces an ascii report (See end of blog entry for example), which could be generated and emailed via cron. If you want it to look nice in any email client using html the I suggest wrapping it in <pre> tags.:

<html>
<head><title>Report Title</title></head>
Report title<body>
<pre>
... Report goes here ...
</pre>
</body>
</html>

For those experienced Linux sys admins out there using cron + ‘find -mtime’ would be a very simple way of having an automated daily, weekly or even monthly report.
But like I said earlier I was working on historic data, hundreds of files in a single report, hundreds because for business reasons we have been rotating the squid logs every hour… so I did what I do best, write a quick bash script to find all the files I needed to cat into the report:

#!/bin/bash

ACCESS_LOG_DIR="/var/log/squid/access.log*"
MONTH="$1"

function getFirstLine() {
	if [ -n  "`echo $1 |grep "gz$"`" ]
	then
		zcat $1 |head -n 1
	else
		head -n 1 $1 
	fi
}

function getLastLine() {
	if [ -n  "`echo $1 |grep "gz$"`" ]
	then
		zcat $1 |tail -n 1
	else
		tail -n 1 $1 
	fi
}

for log in `ls $ACCESS_LOG_DIR`
do
	firstLine="`getFirstLine $log`"
	epochStr="`echo $firstLine |awk '{print $1}'`"
	month=`date -d @$epochStr +%m`
	
	if [ "$month" -eq "$MONTH" ]
	then
		echo $log
		continue
	fi

	
	#Check the last line
	lastLine="`getLastLine $log`"
	epochStr="`echo $lastLine |awk '{print $1}'`"
        month=`date -d @$epochStr +%m`

        if [ "$month" -eq "$MONTH" ]
        then
                echo $log
        fi
	
done

So there you go, thanks to the work of Richard Huveneers there is a script that I think generates a pretty good acsii report, which can be automated or integrated easily into any Linux/Unix work flow.

If you interested in getting hold of the most up to date version of the script you can get it from my sysadmin github repo here.

As promised earlier here is an example report:

Parsed lines  : 32960
Bad lines     : 0

First request : Mon 30 Jan 2012 12:06:43 EST
Last request  : Thu 09 Feb 2012 09:05:01 EST
Number of days: 9.9

Top 10 sites by xfers           reqs   %all %xfers   %hit         MB   %all   %hit     kB/xf      kB/s
------------------------- ------------------------------- ------------------------ -------------------
213.174.155.216                   20   0.1% 100.0%   0.0%        0.0   0.0%   0.0%       1.7       2.5
30.media.tumblr.com                1   0.0% 100.0%   0.0%        0.0   0.0%   0.0%      48.3      77.4
28.media.tumblr.com                1   0.0% 100.0%   0.0%        0.1   0.0%   0.0%      87.1       1.4
26.media.tumblr.com                1   0.0%   0.0%      -        0.0   0.0%      -         -         -
25.media.tumblr.com                2   0.0% 100.0%   0.0%        0.1   0.0%   0.0%      49.2      47.0
24.media.tumblr.com                1   0.0% 100.0%   0.0%        0.1   0.0%   0.0%     106.4     181.0
10.1.10.217                      198   0.6% 100.0%   0.0%       16.9   0.9%   0.0%      87.2    3332.8
3.s3.envato.com                   11   0.0% 100.0%   0.0%        0.1   0.0%   0.0%       7.6      18.3
2.s3.envato.com                   15   0.0% 100.0%   0.0%        0.1   0.0%   0.0%       7.5      27.1
2.media.dorkly.cvcdn.com           8   0.0% 100.0%  25.0%        3.2   0.2%   0.3%     414.1     120.5

Top 10 sites by MB              reqs   %all %xfers   %hit         MB   %all   %hit     kB/xf      kB/s
------------------------- ------------------------------- ------------------------ -------------------
zulu.tweetmeme.com                 2   0.0% 100.0% 100.0%        0.0   0.0% 100.0%       3.1     289.6
ubuntu.unix.com                    8   0.0% 100.0% 100.0%        0.1   0.0% 100.0%       7.5     320.0
static02.linkedin.com              1   0.0% 100.0% 100.0%        0.0   0.0% 100.0%      36.0     901.0
solaris.unix.com                   2   0.0% 100.0% 100.0%        0.0   0.0% 100.0%       3.8     223.6
platform.tumblr.com                2   0.0% 100.0% 100.0%        0.0   0.0% 100.0%       1.1     441.4
i.techrepublic.com.com             5   0.0%  60.0% 100.0%        0.0   0.0% 100.0%       6.8    2539.3
i4.zdnetstatic.com                 2   0.0% 100.0% 100.0%        0.0   0.0% 100.0%      15.3     886.4
i4.spstatic.com                    1   0.0% 100.0% 100.0%        0.0   0.0% 100.0%       4.7     520.2
i2.zdnetstatic.com                 2   0.0% 100.0% 100.0%        0.0   0.0% 100.0%       7.8    2920.9
i2.trstatic.com                    9   0.0% 100.0% 100.0%        0.0   0.0% 100.0%       1.5     794.5

Top 10 neighbor report          reqs   %all %xfers   %hit         MB   %all   %hit     kB/xf      kB/s
------------------------- ------------------------------- ------------------------ -------------------
www.viddler.com                    4   0.0% 100.0%   0.0%        0.0   0.0%      -       0.0       0.0
www.turktrust.com.tr              16   0.0% 100.0%   0.0%        0.0   0.0%      -       0.0       0.0
www.trendmicro.com                 5   0.0% 100.0%   0.0%        0.0   0.0%      -       0.0       0.0
www.reddit.com                     2   0.0% 100.0%   0.0%        0.0   0.0%      -       0.0       0.0
www.linkedin.com                   2   0.0% 100.0%   0.0%        0.0   0.0%      -       0.0       0.0
www.google-analytics.com           2   0.0% 100.0%   0.0%        0.0   0.0%      -       0.0       0.0
www.facebook.com                   2   0.0% 100.0%   0.0%        0.0   0.0%      -       0.0       0.0
www.dynamicdrive.com               1   0.0% 100.0%   0.0%        0.0   0.0%      -       0.0       0.0
www.benq.com.au                    1   0.0% 100.0%   0.0%        0.0   0.0%      -       0.0       0.0
wd-edge.sharethis.com              1   0.0% 100.0%   0.0%        0.0   0.0%      -       0.0       0.0

Local code                      reqs   %all %xfers   %hit         MB   %all   %hit     kB/xf      kB/s
------------------------- ------------------------------- ------------------------ -------------------
TCP_CLIENT_REFRESH_MISS         2160   6.6% 100.0%   0.0%        7.2   0.4%   0.0%       3.4      12.9
TCP_HIT                          256   0.8% 100.0%  83.2%       14.0   0.8% 100.0%      56.0    1289.3
TCP_IMS_HIT                      467   1.4% 100.0% 100.0%       16.9   0.9% 100.0%      37.2    1747.4
TCP_MEM_HIT                      426   1.3% 100.0% 100.0%       96.5   5.3% 100.0%     232.0    3680.9
TCP_MISS                       27745  84.2%  97.4%   0.0%     1561.7  85.7%   0.3%      59.2      18.2
TCP_REFRESH_FAIL                  16   0.0% 100.0%   0.0%        0.2   0.0%   0.0%      10.7       0.1
TCP_REFRESH_MODIFIED             477   1.4%  99.8%   0.0%       35.0   1.9%   0.0%      75.3    1399.4
TCP_REFRESH_UNMODIFIED          1413   4.3% 100.0%   0.0%       91.0   5.0%   0.0%      66.0     183.5

Status code                     reqs   %all %xfers   %hit         MB   %all   %hit     kB/xf      kB/s
------------------------- ------------------------------- ------------------------ -------------------
000                              620   1.9% 100.0%   0.0%        0.0   0.0%      -       0.0       0.0
200                            29409  89.2% 100.0%   2.9%     1709.7  93.8%   7.7%      59.5     137.1
204                              407   1.2% 100.0%   0.0%        0.2   0.0%   0.0%       0.4       1.4
206                              489   1.5% 100.0%   0.0%      112.1   6.1%   0.0%     234.7     193.0
301                               82   0.2% 100.0%   0.0%        0.1   0.0%   0.0%       0.7       1.5
302                              356   1.1% 100.0%   0.0%        0.3   0.0%   0.0%       0.8       2.7
303                                5   0.0% 100.0%   0.0%        0.0   0.0%   0.0%       0.7       1.5
304                              862   2.6% 100.0%  31.2%        0.4   0.0%  30.9%       0.4      34.2
400                                1   0.0%   0.0%      -        0.0   0.0%      -         -         -
401                                1   0.0%   0.0%      -        0.0   0.0%      -         -         -
403                               47   0.1%   0.0%      -        0.0   0.0%      -         -         -
404                              273   0.8%   0.0%      -        0.0   0.0%      -         -         -
500                                2   0.0%   0.0%      -        0.0   0.0%      -         -         -
502                               12   0.0%   0.0%      -        0.0   0.0%      -         -         -
503                               50   0.2%   0.0%      -        0.0   0.0%      -         -         -
504                              344   1.0%   0.0%      -        0.0   0.0%      -         -         -

Hierarchie code                 reqs   %all %xfers   %hit         MB   %all   %hit     kB/xf      kB/s
------------------------- ------------------------------- ------------------------ -------------------
DIRECT                         31843  96.6%  97.7%   0.0%     1691.0  92.8%   0.0%      55.7      44.3
NONE                            1117   3.4% 100.0% 100.0%      131.6   7.2% 100.0%     120.7    2488.2

Method report                   reqs   %all %xfers   %hit         MB   %all   %hit     kB/xf      kB/s
------------------------- ------------------------------- ------------------------ -------------------
CONNECT                         5485  16.6%  99.2%   0.0%      132.8   7.3%   0.0%      25.0       0.3
GET                            23190  70.4%  97.7%   4.9%     1686.3  92.5%   7.8%      76.2     183.2
HEAD                            2130   6.5%  93.7%   0.0%        0.7   0.0%   0.0%       0.3       1.1
POST                            2155   6.5%  99.4%   0.0%        2.9   0.2%   0.0%       1.4       2.0

Object type report              reqs   %all %xfers   %hit         MB   %all   %hit     kB/xf      kB/s
------------------------- ------------------------------- ------------------------ -------------------
*/*                                1   0.0% 100.0%   0.0%        0.0   0.0%   0.0%       1.6       3.2
application/cache-digest         396   1.2% 100.0%  50.0%       33.7   1.8%  50.0%      87.1    3655.1
application/gzip                   1   0.0% 100.0%   0.0%        0.1   0.0%   0.0%      61.0      30.8
application/javascript           227   0.7% 100.0%  12.3%        2.2   0.1%   7.7%       9.9      91.9
application/json                 409   1.2% 100.0%   0.0%        1.6   0.1%   0.0%       4.1       6.0
application/ocsp-response        105   0.3% 100.0%   0.0%        0.2   0.0%   0.0%       1.9       2.0
application/octet-stream         353   1.1% 100.0%   6.8%       81.4   4.5%   9.3%     236.1     406.9
application/pdf                    5   0.0% 100.0%   0.0%       13.5   0.7%   0.0%    2763.3      75.9
application/pkix-crl              96   0.3% 100.0%  13.5%        1.0   0.1%   1.7%      10.6       7.0
application/vnd.google.sa       1146   3.5% 100.0%   0.0%        1.3   0.1%   0.0%       1.1       2.4
application/vnd.google.sa       4733  14.4% 100.0%   0.0%       18.8   1.0%   0.0%       4.1      13.4
application/x-bzip2               19   0.1% 100.0%   0.0%       78.5   4.3%   0.0%    4232.9     225.5
application/x-gzip               316   1.0% 100.0%  59.8%      133.4   7.3%  59.3%     432.4    3398.1
application/x-javascript        1036   3.1% 100.0%   5.8%        9.8   0.5%   3.4%       9.7      52.1
application/xml                   46   0.1% 100.0%  34.8%        0.2   0.0%  35.1%       3.5     219.7
application/x-msdos-progr        187   0.6% 100.0%   0.0%       24.4   1.3%   0.0%     133.7     149.6
application/x-pkcs7-crl           83   0.3% 100.0%   7.2%        1.6   0.1%   0.4%      19.8      10.8
application/x-redhat-pack         13   0.0% 100.0%   0.0%       57.6   3.2%   0.0%    4540.7     156.7
application/x-rpm                507   1.5% 100.0%   6.3%      545.7  29.9%   1.5%    1102.2     842.8
application/x-sdlc                 1   0.0% 100.0%   0.0%        0.9   0.0%   0.0%     888.3     135.9
application/x-shockwave-f        109   0.3% 100.0%  11.9%        5.4   0.3%  44.5%      50.6     524.1
application/x-tar                  9   0.0% 100.0%   0.0%        1.5   0.1%   0.0%     165.3      36.4
application/x-www-form-ur         11   0.0% 100.0%   0.0%        0.1   0.0%   0.0%       9.9      15.4
application/x-xpinstall            2   0.0% 100.0%   0.0%        2.5   0.1%   0.0%    1300.6     174.7
application/zip                 1802   5.5% 100.0%   0.0%      104.0   5.7%   0.0%      59.1       2.5
Archive                           89   0.3% 100.0%   0.0%        0.0   0.0%      -       0.0       0.0
audio/mpeg                         2   0.0% 100.0%   0.0%        5.8   0.3%   0.0%    2958.2      49.3
binary/octet-stream                2   0.0% 100.0%   0.0%        0.0   0.0%   0.0%       5.5      14.7
font/ttf                           2   0.0% 100.0%   0.0%        0.0   0.0%   0.0%      15.5      12.5
font/woff                          1   0.0% 100.0% 100.0%        0.0   0.0% 100.0%      42.5    3539.6
Graphics                         126   0.4% 100.0%   0.0%        0.1   0.0%   0.0%       0.6       2.5
HTML                              14   0.0% 100.0%   0.0%        0.0   0.0%   0.0%       0.1       0.1
image/bmp                          1   0.0% 100.0%   0.0%        0.0   0.0%   0.0%       1.3       3.9
image/gif                       5095  15.5% 100.0%   2.4%       35.9   2.0%   0.7%       7.2       9.5
image/jpeg                      1984   6.0% 100.0%   4.3%       52.4   2.9%   0.6%      27.0      62.9
image/png                       1684   5.1% 100.0%  10.3%       28.6   1.6%   1.9%      17.4     122.2
image/vnd.microsoft.icon          10   0.0% 100.0%  30.0%        0.0   0.0%  12.8%       1.0       3.3
image/x-icon                      72   0.2% 100.0%  16.7%        0.2   0.0%   6.0%       3.2      15.0
multipart/bag                      6   0.0% 100.0%   0.0%        0.1   0.0%   0.0%      25.2      32.9
multipart/byteranges              93   0.3% 100.0%   0.0%       16.5   0.9%   0.0%     182.0     178.4
text/cache-manifest                1   0.0% 100.0%   0.0%        0.0   0.0%   0.0%       0.7       3.1
text/css                         470   1.4% 100.0%   7.9%        3.4   0.2%   5.8%       7.4      59.7
text/html                       2308   7.0%  70.7%   0.4%        9.6   0.5%   0.6%       6.0      14.7
text/javascript                 1243   3.8% 100.0%   2.7%       11.1   0.6%   5.2%       9.1      43.3
text/json                          1   0.0% 100.0%   0.0%        0.0   0.0%   0.0%       0.5       0.7
text/plain                      1445   4.4%  99.4%   1.5%       68.8   3.8%   5.5%      49.0      41.9
text/x-cross-domain-polic         24   0.1% 100.0%   0.0%        0.0   0.0%   0.0%       0.7       1.7
text/x-js                          2   0.0% 100.0%   0.0%        0.0   0.0%   0.0%      10.1       6.4
text/x-json                        9   0.0% 100.0%   0.0%        0.0   0.0%   0.0%       3.0       8.5
text/xml                         309   0.9% 100.0%  12.9%       12.9   0.7%  87.5%      42.8     672.3
unknown/unknown                 6230  18.9%  99.3%   0.0%      132.9   7.3%   0.0%      22.0       0.4
video/mp4                          5   0.0% 100.0%   0.0%        3.2   0.2%   0.0%     660.8      62.7
video/x-flv                      117   0.4% 100.0%   0.0%      321.6  17.6%   0.0%    2814.9     308.3
video/x-ms-asf                     2   0.0% 100.0%   0.0%        0.0   0.0%   0.0%       1.1       4.7

Ident (User) Report             reqs   %all %xfers   %hit         MB   %all   %hit     kB/xf      kB/s
------------------------- ------------------------------- ------------------------ -------------------
-                              32960 100.0%  97.8%   3.5%     1822.6 100.0%   7.2%      57.9     129.0

Weekly report                   reqs   %all %xfers   %hit         MB   %all   %hit     kB/xf      kB/s
------------------------- ------------------------------- ------------------------ -------------------
2012/01/26                     14963  45.4%  97.6%   3.6%      959.8  52.7%   1.8%      67.3     104.5
2012/02/02                     17997  54.6%  98.0%   3.4%      862.8  47.3%  13.2%      50.1     149.4

Total report                    reqs   %all %xfers   %hit         MB   %all   %hit     kB/xf      kB/s
------------------------- ------------------------------- ------------------------ -------------------
All requests                   32960 100.0%  97.8%   3.5%     1822.6 100.0%   7.2%      57.9     129.0

Produced by : Mollie's hacked access-flow 0.5
Running time: 2 seconds

Happy squid reporting!

3 Comments

  1. Emanuele says:

    I try to download your proxy_stats.gawk from github but give me a error:

    [root@webserver logs]# zcat access.log | awk -f proxy_stats.gawk > reportawk
    awk: proxy_stats.gawk:1:
    awk: proxy_stats.gawk:1: ^ syntax error

    gzip: access.log: not in gzip format

    where i wrong?

  2. matt says:

    The access.log your are using is not in gzip format, so use cat instead of zcat.

    Once log rotate rotates your logs they get compressed, so if the file ends in .gz then use zcat, if they just end in .log then use cat.
    So try:
    cat access.log | awk -f proxy_stats.gawk > reportawk

    When in doubt use the file command to see if the file is compressed:
    file access.log

    I hope this helps,
    Matt

  3. Edmar Martins says:

    Nice script, very nice. It works very well on squid 2.7. But I’d upgraded my squid to 3.1.20 version, and this script doesn’t work anymore.

    Could you, please, make it compatible with this newer squid version? I´m very pleasent with it, your effort will be VERY appreciated.

    Thank you in advance.

Leave a Reply