wusage 2.5

usage statistics for WWW servers

For the LATEST version of this document click here.

Table of Contents

Up to Quest Home Page

What's New in version 2.5!

Support for the new common log file format. Those of you who have just upgraded to NCSA httpd 1.2 will note that wusage has stopped working. This is because the log file format has changed. The first line of your wusage.conf file now most likely reads either NCSA or CERN; change it to COMMON, if you have the latest version of either server. Note that data in the old format can't be read once you do this, and will be ignored. If you've been running wusage all along, this is not a problem, since you already have reports for all previous weeks; you may need to fudge your results a bit for the one week during which you switched servers.

Please see configuring wusage for your server.

What's New in version 2.4!

Incredibly dumb mistake on my part while creating 2.3 which led to problems even worse than those in 2.2 is now fixed. It's amazing how much trouble one line of code can cause. THIS VERSION WORKS, at least on the systems available to me for testing.

What's New in version 2.3!

One-line but extremely important bug fix in wusage.c! I deleted a critical line between version 2.1 and 2.2. Mea culpa.

The bullet is no longer inside the anchor in the list of weeks, owing to a problem with at least one client that won't accept this (although I believe it's valid HTML).

What's New in version 2.2!

Bug fixes in several places. Sites with less than one week of data should work fine now (of course the graph is rather dull with no data points, wait a week or two); meaningful error messages should appear for missing directories.

A reference to an out-of-date version of pbmplus found in the only US link in my short collection of sites was removed and replaced with a reference to a site that has the real thing. The out-of-date version was inadequate to support wusage.

Compatibility fixes for yet more compilers.

As always, if you have problems, contact me and I'll do my best to get you (and others with similar setups) up and running.

Upgrade Notes: ppmfig has not changed since version 2.0, but both wusage.c and usagegraph.c have changed in this version. So if you already have a working ppmfig you can forego rebuilding it. If you're having problems, though, be sure to try rebuilding ppmfig against an up-to-date pbmplus version.

What's New in version 2.1!

Version 2.1 now includes support for the CERN httpd as well as the NCSA httpd. You should insert the line

NCSA_HTTPD

or

CERN_HTTPD

at the beginning of the wusage.conf file, as appropriate. (For backwards compatibility, wusage will assume NCSA if this line is absent.)

Version 2.1 corrects a bug in the use of wild cards: wild cards at the beginning of an entry in one of the exclusion lists now work properly (so entries such as "*.gif" are now correctly processed).

Version 2.1 now ignores white space at the end of entries in the exclusion lists; not strictly a bug in 2.0, but it saves a lot of grief.

What's New in version 2.0!

First, and most important- version 2.0 is now compatible with all, or nearly all, versions of Unix! Version 2.0 relied on certain time-handling routines that did not exist in non-Sun versions of Unix. These have been replaced.

Second, Version 2.0 supports the exclusion of unwanted accesses such as gif files, personal files, and other materials that distort the statistical picture of the server, in the opinion of the operator. This mechanism is entirely under the control of the operator-- no code changes are needed.

Third, a major bug resulting in incorrect top-ten lists when wusage attempted to take care of several unprocessed weeks in one pass was fixed.

Credits and license terms

wusage 2.5 is copyright 1993, 1994, Quest Protein Database Center, Cold Spring Harbor Labs. Permission granted to copy and distribute this work provided that this notice remains intact. Modified versions should be cleared through Quest first; if this is not done, any modified version of the program must be clearly labeled as such.

The Quest Protein Database Center is funded under Grant P41-RR02188 by the National Institutes of Health.

Written by Thomas Boutell, 11/93 - 1/94.

What is wusage?

wusage maintains usage statistics for a WWW server. Specifically, it updates the following information, week by week:

What else do I need to use wusage?

To use wusage, you will need the following:

What if I don't use the NCSA or CERN server?

wusage is intended for use with the NCSA or CERN httpd servers, and can produce useful reports from their log files. If you use a different server with a different access log file format, it will be necessary to patch the wusage.c source code appropriately, which should not be overly difficult. I will be glad to assist as best I can.

Where can I get pbmplus source code?

pbmplus source code is required because wusage takes advantage of one "extra" pbmplus utility that doesn't officially exist! Included in the wusage package is an additional pbmplus program called "ppmfig" which draws the usage graphs and usage icons.

pbmplus is available from many FTP sites. Here are a few links turned up by archie-- click away! These links fetch the December 10th, 1991 release of pbmplus, which is the latest I have am aware of. Save it as pbmplus10dec91.tar.Z and uncompress and untar it, etc., if you do not already have pbmplus.

My apologies to those who ran afoul of the Ohio link in old versions of this documentation, which led to an outdated version of pbmplus that did not work with wusage.

NOTE: Unfortunately, not all browsers are friendly with all ftp sites. If you have problems using these links, please ftp directly to the sites specified, set binary mode and get the specified file (you may have to cd to that location step by step, or... who knows; not all FTP sites are alike).

How do I get wusage?

Now that you have the rest of the pieces together, you can fetch wusage as a tar file here. Or you can FTP it directly from isis.cshl.org, in the subdirectory pub/wusage.

How do I build wusage?

In order to build wusage, first untar the wusage.tar file with the following command:

tar -xf wusage2.5.tar
This will create the directory "wusage2.5" beneath the current directory.

cd to this directory and examine the Makefile, which you may need to change slightly. Specifically, if you are using a different ANSI C compiler, such as Sun's acc, then change:

CC=gcc
to read
CC=acc
Or to another appropriate compiler.

Now, to build the package, just type "make all". If all goes well, two programs, wusage and usagegraph, will be compiled without incident.

It is still necessary, however, to build ppmfig, which must be added to your pbmplus source tree and built there.

Assuming that your pbmplus distribution is in the directory pbmplus, copy the file ppmfig.c to the directory pbmplus/ppm. Also copy the file font.c to this directory.

Now, assuming that you have already built the rest of the pbmplus utilities, edit the Makefile in the directory pbmplus/ppm.

In the Makefile you will find a list containing the following:

PORTBINARIES = giftoppm gouldtoppm ilbmtoppm imgtoppm mtvtoppm \
  pcxtoppm pgmtoppm pi1toppm picttoppm \
  pjtoppm ppmdither ppmhist ppmmake ppmquant \ ...
or a similar list of binaries. Add "ppmfig" to this list.

(NOTE: if you have not yet built the pbmplus utilities, then you can add ppmfig to the Imakefile instead, in a similar manner, and it will be built with the rest.)

Now, to compile ppmfig, enter the command "make ppmfig" in the directory pbmplus/ppm. If all goes well, a binary for ppmfig will be produced.

Copy this binary to the same directory in which the rest of the pbmplus utilities are to be found, since usagegraph will expect to find it there.

You have now built wusage. All that remains is to configure it for use with your server.

Configuring wusage for your server

There are several parameters which must be set in order for wusage to properly interact with your server. These are set in the file wusage.conf. A sample wusage.conf file is included in the tarfile, and you can use this file as a starting point. You will definitely need to edit this file to configure wusage properly for your server unless it is identical to ours.

Here is the sample wusage.conf file. Note that lines beginning with "#" are comments and are ignored. Note also that blank lines are NOT considered comments and should be avoided.

#Type of server log: COMMON (all new servers), NCSA_HTTPD or CERN_HTTPD.
#The latter two are for older versions of those servers; newer versions
# use the COMMON log file format.
COMMON
#Name of your server as it should be presented
Quest
#Directory where html pages generated by usage program should be located
/home/www/web/usage
#Directory where wusage and usagegraph are installed
/home/www
#URL to which locations of html pages should be appended for usage reports
#(the same as the first line, but in web space, not filesystem space)
/usage
#URL of server home page (a local URL is adequate)
/index.html
#Path of ncsa httpd log file
/home/www/ncsa/logs/access_log
#Path where pbmplus utilities are installed
/usr/local/bin
#Hidden items
{
}
#Ignored items
{
}
#Ignored sites
{
}

The first non-comment line should read:

COMMON

or

NCSA_HTTPD

or

CERN_HTTPD
as appropriate to your server's log file format. Note that the latest versions of BOTH servers produce the COMMON log file format, and setting this line to a different value won't work for those versions! (For backwards compatibility, wusage will tolerate the absence of this line, in which case NCSA_HTTPD is assumed. But don't use this; it will go away eventually. Please set this line to one of the options above.) UPPERCASE REQUIRED.

Note to those upgrading: once you switch to the COMMON log file setting, wusage can't read any data in the old format that may be lying around, but it can skip over it tactfully. The upshot of this is that if you've been running wusage all along, you'll simply be able to start using it again and will only need to adjust the results for the one week during which you made the changeover to a common-logfile-format server version.

For those using wusage for the first time, this is a thornier problem. I encourage server authors (and anyone else for that matter!) to write a conversion filter to translate old-style log file formats to the new style. It shouldn't be very difficult. At worst, you'll have statistics only from the point at which you switched to a common-logfile-format server.

The second non-comment line should contain the name of your server as you would like it to be referred to in the usage page.

The third line contains the directory in your file system in which html pages generated by wusage should reside. This will usually be a subdirectory of your server root directory called "usage". (In our case, SERVER_ROOT is /home/www/web.)

IMPORTANT: this directory should not be shared with other information! Please give usage a subdirectory to itself, since it creates and deletes files fairly freely and assumes its directory is a safe place in which to do so.

The fourth line is the directory in which the wusage and usagegraph binaries are installed. Install them where you would prefer they be kept and set this line appropriately. We keep them in /home/www, but /usr/local/bin is likely to be a more common choice.

The fifth line is the "base URL" for html pages generated by wusage. This is similar to the second line, but is the location in web space, not in filesystem space. Thus, if SERVER_ROOT is /home/www/web and you set the second line to /home/www/web/usage, the fourth line should be set to just /usage.

The sixth line should be the location of your server's home page, to which a link "up" will be created on the main usage page. Again, this is its location in web space; the most common setting for this line is /index.html.

The seventh line is the location of the NCSA server access_log file, which wusage needs to be able to read in order to compute statistics. This file is located in .../ncsa/logs; ... is the location at which you installed the server. In our case it is installed beneath /home/www.

The eighth line is the location of the pbmplus binaries including ppmfig on your system. In our case, this is the directory /usr/local/bin.

Excluding unwanted accesses

The lines above are followed by three lists of items, enclosed by { and } characters. By default, these lists are empty. The absence of the lists is tolerated for backwards compatibility with wusage 1.0.

The first is a list of items which should be "hidden". This means taht they will still register in the total number of accesses, but they will never be in the top ten for any week.

The second is a list of items which should be "ignored". These items never appear in the total number of accesses OR in the top ten-- they are completely ignored.

The third is a list of sites to be ignored. This is useful if many of the accesses to your server are made by you personally and you are more interested in counting accesses made by other sites.

For instance, if you want to keep .gif files (frequently inline) out of the top ten, completely ignore files coming from users' personal directories, and ignore accesses from your own site "here.com", the three lists would look as follows. (Note that asterisks are acceptable as wild cards, just as they are in the file system; question marks are also acceptable to substitute for any single character.)

#Hidden items
{
*.gif
}
#Ignored items
{
/~*
}
#Ignored sites
{
here.com
}

This mechanism makes it much easier to arrive at a meaningful top-ten list.

Installing wusage as a cron job

wusage needs to be run on a weekly basis in order to keep useful statistics. Specifically, it should be run as soon after midnight on Sunday as possible. For the purposes of creating an html report, wusage should always be run with these options:

-c (location of wusage.conf file)
which specifies the location of the configuration file, and
-h
which specifies that an html report should be built. There are other arguments which are used internally in various situations in which wusage and usagegraph invoke each other.

So, in order to install wusage as a regularly-scheduled automatically-run program, you need to add it to your crontab file and submit it to the program "crontab". Our crontab file looks like this:

1 0 * * 0 /home/www/wusage -h -c /home/www/wusage.conf
... other jobs, if any ...
The crontab file submitted to the Unix system with the following command, assuming it is called "crontab.txt":
crontab crontab.txt

Of course, if you run the www server as root, you no doubt already have a crontab file for root, to which you will want to add this line, following this with a reinstall using crontab. (We created a separate www account to facilitate this sort of thing; I recommend this strategy to other server administrators.)

Hooking up wusage

Everything else is taken care of; all that remains is to run wusage for the first time (to make sure the various html and .gif files actually exist) and linking the usage report to your home page. Run wusage by hand using the following command:

wusage -h -c /home/www/wusage.conf
(Substitute the directory where wusage.conf resides on your system for /home/www in the above.)

Now, if all has gone well, edit your home page to include a link to the usage report. Here is the relevant excerpt from our home page:

<P>Usage of the Quest WWW server is kept track of through
<A HREF="http:/usage/index.html">
<IMG ALIGN=TOP SRC="http:/usage/usage.graph.small.gif"></A>
 <A HREF="http:/usage/index.html">usage statistics</A>.</P>
In addition to obvious name changes, you may need to change the directory linked to if you did not use /usage in your configuration file.

Note that in addition to a normal text link, a small usage graph is provided as an icon. This graph is genuine- it is updated at the same time as the larger graph on the main usage page!

Purging access_log (how and why)

Your access_log file will grow tremendously over time, particularly if your server is heavily used. It is desirable to purge this file periodically, and this can be done provided you follow these directions.

Take note of the most recent week for which wusage has generated a complete report. Determine the date on which this week ended (the usage report displays the date the week began).

Now edit your access_log file and find the first entry that falls AFTER the completion of that week. It is safe to delete all entries BEFORE that line in the access_log file.

Important note:if you do purge your access_log file, then be sure to back up the directory in which wusage keeps its html pages. This directory contains important summary information for previous weeks which wusage must have in order to graph information regarding past weeks no longer in the access_log file.

If you have problems

If you have any difficulties with wusage, feel free to contact the author, Thomas Boutell.