Today's web hosting data centers are deployed with multiple web servers, running heterogeneous operating systems architecture. With ever increasing online businesses, it is important to know how many customers are really reaching to your websites. Beyond just the number of web hits, it is now imperative to know the trend of customers needing web analytics.
This article talks about top 10 tools which can perform analysis and bring visibility into the website access. These tools are categorized based on their popularity, functionality and ease of use, which make them the "must have" software gadgets in every network administrator's toolbox.
Websites directly catering to a business is always a complex thing for the business owner as well as the technology support team. Business owner wants to know how many web hits are being generated over a period of time and also which product pages are being accessed more frequently than other page etc. This information is essential for him to co-relate it directly or in-directly with the sales and profit. Business owner would also want to know the trend of customers, for example he may want to find out if the web users are trying to access a particular set of products just because those products are on discount etc. From technology support standpoint, the web administrators want to ensure reliability and stability of their websites. If the web hits are increasing, they would want to know what impact it can cause on CPU and memory usage, as well as on the network throughput. Similarly, it would be important for them to know if and when the hardware of web servers need and upgrade, or when to add more web servers into the pool. Another requirement could be to troubleshoot website related problems by looking at HTTP error field where a 404 would mean that there are few links on the website which link to pages which doesn't exist, causing a bad user experience.
Web servers create detailed and verbose web logs in the form of text files. All fields are important for analytics, however Table No 1. lists fields which are crucial for analyzing web site usage and trends.
|Important Log Fields for Web Analysis|
|Date & Time|
|Protocol (HTTP or FTP)|
Web analyzers parse the details of those text files and perform analysis. For example, sorting based on the source IP address we can find out how many hits were generated by a particular web client, or by intelligently sorting on the web page file names, we can know which pages are top hit and which are not. Based on the values in the browser field and OS type, it is easy to know the count of Windows machine running IE browser or FireFox, or Mac users running Safari etc. As we can see, all this information is extremely useful to tune the website according to user's experience, thus increasing traffic towards better business.
Below is our list of top ten tools which can perform web analysis for mid to large IT web infrastructures. We selected these tools based on their popularity, deployment base and simplicity to install, configure and put to use. The list contains few tools which can perform on-the-fly web analysis that can help in troubleshooting website code related problems. AWStats - Though this is one of the first generations tool, it is still in use widely. Written in perl script, it works well on multiple platforms. A great feature of AWStats is that it supports virtually all famous web servers' log formats, right from Microsoft IIS and Linux Apache to O'reilly web servers. It is capable of creating customizable views including bar graphs and pie chart, thus giving a good visibility into the web traffic statistics. AWStats is meant for small to medium infrastructures where the log files are not too heavy to process. This tool is being managed and updated at http://awstats.sourceforge.net
Unlike various GUI based tools, Webalizer is a complete command line operable utility, which makes it popular among Linux and Unix administrators. It has its own small configuration language which can be used to decide how the tools should read and parse the log files and the field contained in it. For example, configuring its IgnoreSite option with an internal IP address range can help get rid of internal web traffic and focus only on the external hits. Due to its extensive command switches, it can be used in a scheduled job to perform daily administrative tasks of looking into web logs or automatically create useful reports. This tool can be downloaded from www.webalizer.org
When the number of log files contain huge amount of information, it becomes cumbersome and time consuming to parse those. This needs a faster log parsing tool and Piwik solves that problem. Besides just the typical web analysis, Piwik comes with a set of plugins to enhance the reporting styles. For example, its GeoIP plugin can be utilized to map source ip address in the web log files to a particular country or state or city. While supporting multiple platforms, it has its own Python based command interface to get the most of it in terms of reports. Today many web hosting facilities use Piwik and also provide its customizable web user interface to their customers, as an offering. This tool can be found at http://piwik.org
Written in PHP and using MySql as backend, this utility comes handy especially when the website administrators want to collaboratively process logs of multiple websites together. OpenWebAnalytics is capable of processing really large logs and can optionally fetch those directly from a database format too. Unlike many other professional tools, this open source version can provide a click-stream report, whereby the user clicks on the web page are shown in a date and time format. This helps website code troubleshooter, to know exactly what the web user did, and can try repeating those steps to replicate the problem. It can also create heatmap type of report whereby the website statistics is segregated into most-hit and least-hit pages, shown in the form of color gradients for easy understanding. This tool is available at www.openwebanalytics.com
This tool is a typical web analysis utility, however unlike most other tools, it is very useful to process the FTP logs. Besides the standard reporting, it can create a list of keywords and the hits on the web pages which contain that keyword too. This is especially important for SEO (Search Engine Optimization). The tool uses standard MS-Access style database format which can be exported to any other database engine and queried with standard SQL for further customization of reports. For small infrastructures Deep Log Analyzer can be more than adequate to get web business visibility. This tool can be found at http://www.deep-software.com
This tool comes equipped with features, which are meant for a different audience. Today most of the websites are being developed in open source content management systems such as Drupal, Joomla or Wordpress. These systems have their own unique style of handling file names, cookies and other parameters. FireStats is capable of interpreting the cookies found in the web log files, as well as the name of web code files accessed by the user, and segregate information for each content management system in use. It is capable of running on all the available browsers and can translate the report into multiple languages. Unlike many other tools, FireStats can be installed on the web server, whereby it can act as a silent service in the background and parse the traffic to create instant reports. This tool is available at http://firestats.cc
Specifically for Linux administrators who want an immediate visibility into their websites, GoAccess is probably the only correct choice in the open source world. A great feature about this tool is that it can work in "real time", which means that the administrators can pull up a report by querying the service on the fly. This report however is not in HTML format, but certainly gives enough information to know exactly what is happening on the web server, which files are being access at that moment, errors occurring with the web engine at that time etc. Besides this, it is capable of supporting IPv6 protocol, and can also parse any custom log format. This makes it a "must have" tool especially for parsing logs of network components and devices. It is available at http://sourceforge.net/projects/goaccess
while many analytics tools focus heavily on the website usage statistics and patterns, Web Forensik focuses more on the security angle of a website. It is specifically written for Apache log style, however with proper log file conversion, any web server log file can be processed. Many web developers don't have visibility into the security of their code. Web Forensik is capable of finding commonly known web attacks such as cross site scripting, cookie injection and SQL injection etc. upon developing the code, the development team can subject it to such common attacks using penetration testing tools, and put Web Forensik utility to work, to find out which code files exhibit the possibility of security loopholes. Besides this feature, it can also show the output in a graphical form to create meaningful reports. The utility is available athttp://sourceforge.net/projects/webforensik/
Although this tool is not exactly free, there is a lite version of it, which is open source. This tool focuses more on the search engine robots. As we know, each search engine scrolls the websites using pre-defined bots, which leaves its access trail in the web log files. AW Log Analyzer has built-in parsing mechanism which can find out whether or not the website was accessed by more than 400 different search engines. This is important from business perspective, to understand where to channel the marketing efforts. To serve this purpose better, it can list of pages which are hit the most by visitors but not by search engines, and vice-a-versa. It can work in an offline mode too, whereby multiple log files accumulated in the past can be subjected to the tool, to get historical trend reports. This tool is available at http://www.alterwind.com/loganalyzer
This is again a semi-commercial tool, with a paid enterprise version and a free lite version. This is a traditional tool available for users who want to perform basic log analysis on their windows desktop, while the log files to be parsed can be either IIS or Apache format. Unique feature of this tool is that it can perform reverse DNS lookup to try finding domain name of the source IP addresses found in the logs. It also contains a built-in database to map IP addresses to the countries. This tool is at www.weblogexpert.com .
A web log analyzer is an essential tool for web administrators from technology as well as business standpoint. While selecting an analyzer, the focus should be on the simplicity of usage and the quality as well as details producible in the graphical report output. A powerful web log analyzer provides great visibility into the customers accessing website and their mind-set, which makes it an essential tool for business decision making.
Tools mentioned in this article are purely to bring clarity for readers into the web analytics domain. The order in which these tools are mentioned is our own perspective; it is not intended to undermine any tool's ratings or features.