To build another log analyser

2005 May 13 at 10:05 » Tagged as :

No one would want to build yet another log analyser. There are dozens of them out there. Things are not as bad in the in the field of Content (Mis)Management Systems where everyone and his dog seems to have a product listed at freshmeat but there are too many log parsers out there.

So why do I even think of doing my own web log analyser? I am not. I am thinking of building a data mining system for log files. Sure you will be able to use to figure out how many times your baby on the lawn pictures were seen, as long as you don't mind waiting for dozen queries to be executed.

The idea I have in mind is all about ratios. Webdruid and other try to show you graphs supposedly of the path that users took through your site. As analog's authors have taken great pains to explain such reports are unreliable at best.

What is reliable and what works is ratios. Let's take an example; the success of your ecommerce site can be meatured in the ratio of homepage views/product views/add to cart/checkout/finalize or some such order.

Trouble is, with many web log analyzers you need the help of a calculater or spreadsheet to get to these ratios. If you want to plot ratios against time or some other factor it takes a lot of work, and how do you compensate for seasonal slumps/spurts?

The answer lies in using a database. Entering the webserver log files into a datbase directly is woefully inefficient. For a site that delivers only a few thousand pages a day you might need to put the database in it's own server, if you took this approach. For simple logging there is nothing like a text file - which is just what your good old apache log file.

Traditional web log analysers are all about telling you how many pages a day were delivered and which pages were most seen etc etc. For such tasks you don't need a database. Suppose on the other hand that you want to generate reports with what ever data, what ever period that takes your fancy and not be contrained by the rigid daily/weekly/monthly structure of the traditional log analysers. Then you need a db.