Squid is one of the biggest and most used proxies on the interwebs. And generating reports from the access logs is already a done deal, there are many commercial and OSS apps that support the squid log format. But I found my self in a situation where I wanted stats but didn’t want to install a web server on my proxy or use syslog to push my logs to a centralised server which was running such software, and also wasn’t in a position to go buy one of those off the shelf amazing wiz bang Squid reporting and graphing tools.
As a Linux geek I surfed the web to see what others have done. I came across a list provided by the Squid website. Following a couple of links, I came across a awk script called ‘proxy_stats.gawk’ written by Richard Huveneers.
I downloaded it and tried it out… unfortunately it didn’t work, looking at the code.. which he nicely commented showed that he had it set up for access logs from version 1.* of squid. Now the squid access log format from squid 2.6+ hasn’t changed too much from version 1.1. all they have really done is add a “content type” entry at the end of each line.
So as a good Linux geek does, he upgrades the script, my changes include:
- Support for squid 2.6+
- Removed the use a deprecated switches that now isn’t supported in the sort command.
- Now that there is a an actual content type “column” lets use it to improve the ‘Object type report”.
- Add a users section, as this was an important report I required which was missing.
- And in a further hacked version, an auto generated size of the first “name” column.
Now with the explanation out of the way, let me show you it!
For those who are new to awk, this is how I’ve been running it:
zcat <access log file> | awk -f proxy_stats.gawk > <report-filename>
NOTE: I’ve been using it for some historical analysis, so I’m running it on old rotated files, which are compressed thus the zcat.
You can pass more then one file at a time and it order doesn’t matter, as each line of an access log contains the date in epoch time:
zcat `find /var/log/squid/ -name "access.log*"` |awk -f proxy_stats.gawk
The script produces an ascii report (See end of blog entry for example), which could be generated and emailed via cron. If you want it to look nice in any email client using html the I suggest wrapping it in <pre> tags.:
<html>
<head><title>Report Title</title></head>
Report title<body>
<pre>
... Report goes here ...
</pre>
</body>
</html>
For those experienced Linux sys admins out there using cron + ‘find -mtime’ would be a very simple way of having an automated daily, weekly or even monthly report.
But like I said earlier I was working on historic data, hundreds of files in a single report, hundreds because for business reasons we have been rotating the squid logs every hour… so I did what I do best, write a quick bash script to find all the files I needed to cat into the report:
#!/bin/bash ACCESS_LOG_DIR="/var/log/squid/access.log*" MONTH="$1" function getFirstLine() { if [ -n "`echo $1 |grep "gz$"`" ] then zcat $1 |head -n 1 else head -n 1 $1 fi } function getLastLine() { if [ -n "`echo $1 |grep "gz$"`" ] then zcat $1 |tail -n 1 else tail -n 1 $1 fi } for log in `ls $ACCESS_LOG_DIR` do firstLine="`getFirstLine $log`" epochStr="`echo $firstLine |awk '{print $1}'`" month=`date -d @$epochStr +%m` if [ "$month" -eq "$MONTH" ] then echo $log continue fi #Check the last line lastLine="`getLastLine $log`" epochStr="`echo $lastLine |awk '{print $1}'`" month=`date -d @$epochStr +%m` if [ "$month" -eq "$MONTH" ] then echo $log fi done
So there you go, thanks to the work of Richard Huveneers there is a script that I think generates a pretty good acsii report, which can be automated or integrated easily into any Linux/Unix work flow.
If you interested in getting hold of the most up to date version of the script you can get it from my sysadmin github repo here.
As promised earlier here is an example report:
Parsed lines : 32960 Bad lines : 0 First request : Mon 30 Jan 2012 12:06:43 EST Last request : Thu 09 Feb 2012 09:05:01 EST Number of days: 9.9 Top 10 sites by xfers reqs %all %xfers %hit MB %all %hit kB/xf kB/s ------------------------- ------------------------------- ------------------------ ------------------- 213.174.155.216 20 0.1% 100.0% 0.0% 0.0 0.0% 0.0% 1.7 2.5 30.media.tumblr.com 1 0.0% 100.0% 0.0% 0.0 0.0% 0.0% 48.3 77.4 28.media.tumblr.com 1 0.0% 100.0% 0.0% 0.1 0.0% 0.0% 87.1 1.4 26.media.tumblr.com 1 0.0% 0.0% - 0.0 0.0% - - - 25.media.tumblr.com 2 0.0% 100.0% 0.0% 0.1 0.0% 0.0% 49.2 47.0 24.media.tumblr.com 1 0.0% 100.0% 0.0% 0.1 0.0% 0.0% 106.4 181.0 10.1.10.217 198 0.6% 100.0% 0.0% 16.9 0.9% 0.0% 87.2 3332.8 3.s3.envato.com 11 0.0% 100.0% 0.0% 0.1 0.0% 0.0% 7.6 18.3 2.s3.envato.com 15 0.0% 100.0% 0.0% 0.1 0.0% 0.0% 7.5 27.1 2.media.dorkly.cvcdn.com 8 0.0% 100.0% 25.0% 3.2 0.2% 0.3% 414.1 120.5 Top 10 sites by MB reqs %all %xfers %hit MB %all %hit kB/xf kB/s ------------------------- ------------------------------- ------------------------ ------------------- zulu.tweetmeme.com 2 0.0% 100.0% 100.0% 0.0 0.0% 100.0% 3.1 289.6 ubuntu.unix.com 8 0.0% 100.0% 100.0% 0.1 0.0% 100.0% 7.5 320.0 static02.linkedin.com 1 0.0% 100.0% 100.0% 0.0 0.0% 100.0% 36.0 901.0 solaris.unix.com 2 0.0% 100.0% 100.0% 0.0 0.0% 100.0% 3.8 223.6 platform.tumblr.com 2 0.0% 100.0% 100.0% 0.0 0.0% 100.0% 1.1 441.4 i.techrepublic.com.com 5 0.0% 60.0% 100.0% 0.0 0.0% 100.0% 6.8 2539.3 i4.zdnetstatic.com 2 0.0% 100.0% 100.0% 0.0 0.0% 100.0% 15.3 886.4 i4.spstatic.com 1 0.0% 100.0% 100.0% 0.0 0.0% 100.0% 4.7 520.2 i2.zdnetstatic.com 2 0.0% 100.0% 100.0% 0.0 0.0% 100.0% 7.8 2920.9 i2.trstatic.com 9 0.0% 100.0% 100.0% 0.0 0.0% 100.0% 1.5 794.5 Top 10 neighbor report reqs %all %xfers %hit MB %all %hit kB/xf kB/s ------------------------- ------------------------------- ------------------------ ------------------- www.viddler.com 4 0.0% 100.0% 0.0% 0.0 0.0% - 0.0 0.0 www.turktrust.com.tr 16 0.0% 100.0% 0.0% 0.0 0.0% - 0.0 0.0 www.trendmicro.com 5 0.0% 100.0% 0.0% 0.0 0.0% - 0.0 0.0 www.reddit.com 2 0.0% 100.0% 0.0% 0.0 0.0% - 0.0 0.0 www.linkedin.com 2 0.0% 100.0% 0.0% 0.0 0.0% - 0.0 0.0 www.google-analytics.com 2 0.0% 100.0% 0.0% 0.0 0.0% - 0.0 0.0 www.facebook.com 2 0.0% 100.0% 0.0% 0.0 0.0% - 0.0 0.0 www.dynamicdrive.com 1 0.0% 100.0% 0.0% 0.0 0.0% - 0.0 0.0 www.benq.com.au 1 0.0% 100.0% 0.0% 0.0 0.0% - 0.0 0.0 wd-edge.sharethis.com 1 0.0% 100.0% 0.0% 0.0 0.0% - 0.0 0.0 Local code reqs %all %xfers %hit MB %all %hit kB/xf kB/s ------------------------- ------------------------------- ------------------------ ------------------- TCP_CLIENT_REFRESH_MISS 2160 6.6% 100.0% 0.0% 7.2 0.4% 0.0% 3.4 12.9 TCP_HIT 256 0.8% 100.0% 83.2% 14.0 0.8% 100.0% 56.0 1289.3 TCP_IMS_HIT 467 1.4% 100.0% 100.0% 16.9 0.9% 100.0% 37.2 1747.4 TCP_MEM_HIT 426 1.3% 100.0% 100.0% 96.5 5.3% 100.0% 232.0 3680.9 TCP_MISS 27745 84.2% 97.4% 0.0% 1561.7 85.7% 0.3% 59.2 18.2 TCP_REFRESH_FAIL 16 0.0% 100.0% 0.0% 0.2 0.0% 0.0% 10.7 0.1 TCP_REFRESH_MODIFIED 477 1.4% 99.8% 0.0% 35.0 1.9% 0.0% 75.3 1399.4 TCP_REFRESH_UNMODIFIED 1413 4.3% 100.0% 0.0% 91.0 5.0% 0.0% 66.0 183.5 Status code reqs %all %xfers %hit MB %all %hit kB/xf kB/s ------------------------- ------------------------------- ------------------------ ------------------- 000 620 1.9% 100.0% 0.0% 0.0 0.0% - 0.0 0.0 200 29409 89.2% 100.0% 2.9% 1709.7 93.8% 7.7% 59.5 137.1 204 407 1.2% 100.0% 0.0% 0.2 0.0% 0.0% 0.4 1.4 206 489 1.5% 100.0% 0.0% 112.1 6.1% 0.0% 234.7 193.0 301 82 0.2% 100.0% 0.0% 0.1 0.0% 0.0% 0.7 1.5 302 356 1.1% 100.0% 0.0% 0.3 0.0% 0.0% 0.8 2.7 303 5 0.0% 100.0% 0.0% 0.0 0.0% 0.0% 0.7 1.5 304 862 2.6% 100.0% 31.2% 0.4 0.0% 30.9% 0.4 34.2 400 1 0.0% 0.0% - 0.0 0.0% - - - 401 1 0.0% 0.0% - 0.0 0.0% - - - 403 47 0.1% 0.0% - 0.0 0.0% - - - 404 273 0.8% 0.0% - 0.0 0.0% - - - 500 2 0.0% 0.0% - 0.0 0.0% - - - 502 12 0.0% 0.0% - 0.0 0.0% - - - 503 50 0.2% 0.0% - 0.0 0.0% - - - 504 344 1.0% 0.0% - 0.0 0.0% - - - Hierarchie code reqs %all %xfers %hit MB %all %hit kB/xf kB/s ------------------------- ------------------------------- ------------------------ ------------------- DIRECT 31843 96.6% 97.7% 0.0% 1691.0 92.8% 0.0% 55.7 44.3 NONE 1117 3.4% 100.0% 100.0% 131.6 7.2% 100.0% 120.7 2488.2 Method report reqs %all %xfers %hit MB %all %hit kB/xf kB/s ------------------------- ------------------------------- ------------------------ ------------------- CONNECT 5485 16.6% 99.2% 0.0% 132.8 7.3% 0.0% 25.0 0.3 GET 23190 70.4% 97.7% 4.9% 1686.3 92.5% 7.8% 76.2 183.2 HEAD 2130 6.5% 93.7% 0.0% 0.7 0.0% 0.0% 0.3 1.1 POST 2155 6.5% 99.4% 0.0% 2.9 0.2% 0.0% 1.4 2.0 Object type report reqs %all %xfers %hit MB %all %hit kB/xf kB/s ------------------------- ------------------------------- ------------------------ ------------------- */* 1 0.0% 100.0% 0.0% 0.0 0.0% 0.0% 1.6 3.2 application/cache-digest 396 1.2% 100.0% 50.0% 33.7 1.8% 50.0% 87.1 3655.1 application/gzip 1 0.0% 100.0% 0.0% 0.1 0.0% 0.0% 61.0 30.8 application/javascript 227 0.7% 100.0% 12.3% 2.2 0.1% 7.7% 9.9 91.9 application/json 409 1.2% 100.0% 0.0% 1.6 0.1% 0.0% 4.1 6.0 application/ocsp-response 105 0.3% 100.0% 0.0% 0.2 0.0% 0.0% 1.9 2.0 application/octet-stream 353 1.1% 100.0% 6.8% 81.4 4.5% 9.3% 236.1 406.9 application/pdf 5 0.0% 100.0% 0.0% 13.5 0.7% 0.0% 2763.3 75.9 application/pkix-crl 96 0.3% 100.0% 13.5% 1.0 0.1% 1.7% 10.6 7.0 application/vnd.google.sa 1146 3.5% 100.0% 0.0% 1.3 0.1% 0.0% 1.1 2.4 application/vnd.google.sa 4733 14.4% 100.0% 0.0% 18.8 1.0% 0.0% 4.1 13.4 application/x-bzip2 19 0.1% 100.0% 0.0% 78.5 4.3% 0.0% 4232.9 225.5 application/x-gzip 316 1.0% 100.0% 59.8% 133.4 7.3% 59.3% 432.4 3398.1 application/x-javascript 1036 3.1% 100.0% 5.8% 9.8 0.5% 3.4% 9.7 52.1 application/xml 46 0.1% 100.0% 34.8% 0.2 0.0% 35.1% 3.5 219.7 application/x-msdos-progr 187 0.6% 100.0% 0.0% 24.4 1.3% 0.0% 133.7 149.6 application/x-pkcs7-crl 83 0.3% 100.0% 7.2% 1.6 0.1% 0.4% 19.8 10.8 application/x-redhat-pack 13 0.0% 100.0% 0.0% 57.6 3.2% 0.0% 4540.7 156.7 application/x-rpm 507 1.5% 100.0% 6.3% 545.7 29.9% 1.5% 1102.2 842.8 application/x-sdlc 1 0.0% 100.0% 0.0% 0.9 0.0% 0.0% 888.3 135.9 application/x-shockwave-f 109 0.3% 100.0% 11.9% 5.4 0.3% 44.5% 50.6 524.1 application/x-tar 9 0.0% 100.0% 0.0% 1.5 0.1% 0.0% 165.3 36.4 application/x-www-form-ur 11 0.0% 100.0% 0.0% 0.1 0.0% 0.0% 9.9 15.4 application/x-xpinstall 2 0.0% 100.0% 0.0% 2.5 0.1% 0.0% 1300.6 174.7 application/zip 1802 5.5% 100.0% 0.0% 104.0 5.7% 0.0% 59.1 2.5 Archive 89 0.3% 100.0% 0.0% 0.0 0.0% - 0.0 0.0 audio/mpeg 2 0.0% 100.0% 0.0% 5.8 0.3% 0.0% 2958.2 49.3 binary/octet-stream 2 0.0% 100.0% 0.0% 0.0 0.0% 0.0% 5.5 14.7 font/ttf 2 0.0% 100.0% 0.0% 0.0 0.0% 0.0% 15.5 12.5 font/woff 1 0.0% 100.0% 100.0% 0.0 0.0% 100.0% 42.5 3539.6 Graphics 126 0.4% 100.0% 0.0% 0.1 0.0% 0.0% 0.6 2.5 HTML 14 0.0% 100.0% 0.0% 0.0 0.0% 0.0% 0.1 0.1 image/bmp 1 0.0% 100.0% 0.0% 0.0 0.0% 0.0% 1.3 3.9 image/gif 5095 15.5% 100.0% 2.4% 35.9 2.0% 0.7% 7.2 9.5 image/jpeg 1984 6.0% 100.0% 4.3% 52.4 2.9% 0.6% 27.0 62.9 image/png 1684 5.1% 100.0% 10.3% 28.6 1.6% 1.9% 17.4 122.2 image/vnd.microsoft.icon 10 0.0% 100.0% 30.0% 0.0 0.0% 12.8% 1.0 3.3 image/x-icon 72 0.2% 100.0% 16.7% 0.2 0.0% 6.0% 3.2 15.0 multipart/bag 6 0.0% 100.0% 0.0% 0.1 0.0% 0.0% 25.2 32.9 multipart/byteranges 93 0.3% 100.0% 0.0% 16.5 0.9% 0.0% 182.0 178.4 text/cache-manifest 1 0.0% 100.0% 0.0% 0.0 0.0% 0.0% 0.7 3.1 text/css 470 1.4% 100.0% 7.9% 3.4 0.2% 5.8% 7.4 59.7 text/html 2308 7.0% 70.7% 0.4% 9.6 0.5% 0.6% 6.0 14.7 text/javascript 1243 3.8% 100.0% 2.7% 11.1 0.6% 5.2% 9.1 43.3 text/json 1 0.0% 100.0% 0.0% 0.0 0.0% 0.0% 0.5 0.7 text/plain 1445 4.4% 99.4% 1.5% 68.8 3.8% 5.5% 49.0 41.9 text/x-cross-domain-polic 24 0.1% 100.0% 0.0% 0.0 0.0% 0.0% 0.7 1.7 text/x-js 2 0.0% 100.0% 0.0% 0.0 0.0% 0.0% 10.1 6.4 text/x-json 9 0.0% 100.0% 0.0% 0.0 0.0% 0.0% 3.0 8.5 text/xml 309 0.9% 100.0% 12.9% 12.9 0.7% 87.5% 42.8 672.3 unknown/unknown 6230 18.9% 99.3% 0.0% 132.9 7.3% 0.0% 22.0 0.4 video/mp4 5 0.0% 100.0% 0.0% 3.2 0.2% 0.0% 660.8 62.7 video/x-flv 117 0.4% 100.0% 0.0% 321.6 17.6% 0.0% 2814.9 308.3 video/x-ms-asf 2 0.0% 100.0% 0.0% 0.0 0.0% 0.0% 1.1 4.7 Ident (User) Report reqs %all %xfers %hit MB %all %hit kB/xf kB/s ------------------------- ------------------------------- ------------------------ ------------------- - 32960 100.0% 97.8% 3.5% 1822.6 100.0% 7.2% 57.9 129.0 Weekly report reqs %all %xfers %hit MB %all %hit kB/xf kB/s ------------------------- ------------------------------- ------------------------ ------------------- 2012/01/26 14963 45.4% 97.6% 3.6% 959.8 52.7% 1.8% 67.3 104.5 2012/02/02 17997 54.6% 98.0% 3.4% 862.8 47.3% 13.2% 50.1 149.4 Total report reqs %all %xfers %hit MB %all %hit kB/xf kB/s ------------------------- ------------------------------- ------------------------ ------------------- All requests 32960 100.0% 97.8% 3.5% 1822.6 100.0% 7.2% 57.9 129.0 Produced by : Mollie's hacked access-flow 0.5 Running time: 2 seconds
Happy squid reporting!
I try to download your proxy_stats.gawk from github but give me a error:
[root@webserver logs]# zcat access.log | awk -f proxy_stats.gawk > reportawk
awk: proxy_stats.gawk:1:
awk: proxy_stats.gawk:1: ^ syntax error
gzip: access.log: not in gzip format
where i wrong?
The access.log your are using is not in gzip format, so use cat instead of zcat.
Once log rotate rotates your logs they get compressed, so if the file ends in .gz then use zcat, if they just end in .log then use cat.
So try:
cat access.log | awk -f proxy_stats.gawk > reportawk
When in doubt use the file command to see if the file is compressed:
file access.log
I hope this helps,
Matt
Nice script, very nice. It works very well on squid 2.7. But I’d upgraded my squid to 3.1.20 version, and this script doesn’t work anymore.
Could you, please, make it compatible with this newer squid version? I´m very pleasent with it, your effort will be VERY appreciated.
Thank you in advance.
just replace ” == 9″ with ” == 10″ for newer squid
Very very good