Chris Nauroth
Welcome About Me Diary Software Downloads About the Site
Introduction
www.cnauroth.net
WebSlinger
Mancala
Base64Converter
LogAnalyzer
Permutations

Log Analyzer

Download the source code.

LogAnalyzer is a command-line application capable of analyzing hypothetical web server log files from an online shopping site to derive statistical patterns. The expected log file format is arbitrary, but the concepts explored in the code can apply to similar real-world applications.

LogAnalyzer accepts a file name as a command-line argument. The file represents a simplified log of web server page requests. For example:

2006-02-25 23:24:36,S1,C3,Internet Explorer 6.0,Books,200

This is one entry in a log file. The comma-delimited fields are:

  1. Date/time of page request.
  2. Session ID string.
  3. Customer ID string.
  4. Browser.
  5. Category of requested page.
  6. HTTP response code returned to the client (i.e. 200 OK).

LogAnalyzer is capable of aggregating data from multiple log entries in a file and reporting some statistical information on the results.

Implementation

I've seen the problem of analyzing log files come up in various ways during job interviews. I was installing and configuring Cygwin on our laptop recently, and I decided to implement a log parser in a C++ project running under my Cygwin environment. One of my goals was to establish a basic project that could serve as a good skeleton for creating future C++ projects in the *nix style, complete with Makefile and Doxygen documentation. I wanted to create a good example of using Doxygen documentation comments with a wide variety of C++ constructs, particularly abstract base classes, template classes, and template functions.

The application primarily makes use of the Observer pattern. I defined an abstract LogObserver class. Instances of LogObserver can be attached to an instance of LogProcessor to receive notifications as log entries are read from a file. Subclasses of LogObserver define specific actions to take in response to notifications from the LogProcessor.

One of the requirements that I wanted to implement was a count and a percentage of how many times the value for a specific field in the log entries occurs. The data type of each field can vary, but the algorithm for calculating the count and the percentage of the total is independent of the specific field type. I chose to implement this with a template CountLogObserver class that can be instantiated for any field type. The parameter for the template instantiation is a functor class that simply defines the return type and a simple function for retrieving the field from a LogEntry object. (In practice, this is a very simple function that the compiler should be able to inline for efficiency.)

Plenty of additional details are available in the Doxygen documentation, which is included in the downloadable zip file along with the source code and the compiled executable for Cygwin.