Datareign

At the heart of the AWStats suite is the script awstats.pl.

The following description, based on version 1.910 (Dated 2008/04/21 21:13:28) is intended to assist in carrying out changes to awstats.pl. You need to be very carefull when making changes as almost everything is handled via anonymous arrays or hashes and it is important to be quite clear what value(s) you are affecting when making a change. In this context, an understanding of Read_History_With_TmpUpdate() and DefinePerlParsingFormat() is a good place to start as they provide much of the data mapping used by the script.

The script starts, in the conventional manner with the list of included libraries. At the moment, only the 'strict', 'Time::Local', 'Socket' and 'vars' packages are invoked.

Then comes the definition of variables and arrays to be used later. There are several hundred of these.

The subroutines are defined next. Apart from the packages mentioned above, all functionality is contained within the script, so it should be possible to resolve all queries by sufficiently determined perusal.

Major Subroutines

These are the routines that are of special interest in understanding the functionality of the script.

html_head()

Builds the heading of an HTML page. Calls any plug-ins that have been defined for the page.

html_end()

Writes the tail of an HTML page. Once again, invokes any plug-ins defined for the job.

error()

Very script specific error handling. Its behaviour depends on whether the script is running as CLI or CGI.

debug()

Writes debugging statements to the file debug.log, which resides, by default, in the current run directory. Relies on the value of $DEBUGFORCED to decide whether or not to emit a message.

OptimizeArray()

Basically an array packer that removes unwanted space from the array passed to it.

Read_Config()

Reads the configuraion file defined by the session parameters and passes the handle of the file to Parse_Config for unpacking.

Parse_Config()

Given the handle of a configuration file, reads the file from start to end, assigning the values retrieved to the appropriate global variables.

Read_Ref_Data()

Loads the data entries in the /lib/*.pm files under the default run directory.

Read_Language_Data()

Loads the entries in the specified language file.

Substitute_Tags()

Reformats dates according to a set of hard-coded rules.

Check_Config()

Validates all the configuration values (defined as globals) and sets missing entries to default values.

Read_Plugins()

Loads extension modules requested by the user. These are usually stored in {run directory}/plugins.

Read_History_With_TmpUpdate()

This is an enormous routine that reads and writes the statistics file. It is pretty much at the heart of the AWStats processing. It does the following jobs…

  1. Works out which sections have to be loaded from an existing file.
  2. Decides which sections are to be writen to $filetowrite.
  3. Reads each line of the input file into @field.
  4. Parses $field[0] to decide which value it has found, using a detailed parse tree that searches for specified strings (so you can match it to the actual history file)
  5. Loads $field[1] into the appropriate global variable.
  6. Sorts all the data referenced by $SectionsToSave.
  7. Invokes Save_History().

Save_History()

Creates or updates a history file pointed to by the global pointer HISTORYTMP. If the file is new, the routine writes a standard header at the top, otherwise it writes the requested section.

Rename_All_Tmp_History()

Works out the names for newly written files and renames them appropriately.

DefinePerlParsingFormat()

Maps the position of the contents of a log line to a set of pre-defined variables. These variables are used as the indices into the @field array and are conveniently listed at the start of the subroutine.

Main Script

In the discussion that follows, typewriter entries are strings that can be searched for, to go to the relevant section within the script. Note that the search string may not necessarily indicate the start of the relevant section.

A comment has been added to the file, 'Main script starts here…' to make it easier to find the starting point.

Initialisation

  1. $starttime=time(); Split the Perl command line to determine the invoked directory, script name and extension, then get, reformat and validate both the current datetime and that of the following day.
  2. if ($ENV{'AWSTATS_DEL_GATEWAY_INTERFACE'}) Assess whether the script is being run from the command line or via a browser, testing the appropriate CGI or CLI parameters in either case.
  3. if ($ENV{'AWSTATS_FORCE_CONFIG'}) Decide whether to perform a setup process and show a detailed warning page if in CGI mode and the configuration fails.
  4. &Read_Config($DirConfig); Read and check the configuration file via the Read_Config() subroutine.Once this has completed, carry out general housekeeping, such as setting the language, validating the parameters from the specified file via CheckConfig() and presetting the default HTML page layout.
  5. if ($MigrateStats) Update a previously existing history file to the current format.

Update Log Process

  1. open(LOG,”$LogFile”) Opens the specified log file, setting the handle to binary mode in case unexpected characters have been inserted.
  2. while ($line=<LOG>) Read the log file inside a loop. The script parses one line at a time, based on the format it was told to use by the configuration file.
  3. if ($LogFormat eq '2' && $line =~ /^#Fields:/) If a new format line is found , map the new field into the fixField array, thus adding the additional field to the preset line format.
  4. if ($ShowCorrupted) If the corruption flag is set , show any suspect lines.
  5. if ($pos_vh>=0 && $field[$pos_vh] !~ /^$SiteDomain$/i) Remove any unwanted references for virtual domains, methods, etc. This is achieved by doing nothing to the unwanted line, so, to enable the line to be processed, remove its prototype from the test 'if' the test invokes a 'next'. If the test does not invoke a 'next', the test may be treated as commentary.
  6. $field[$pos_date] =~ tr/,-\/ \t/:::::/s; Check for a date field in the expected position on the line and parse it if found. If the date evaluates as corrupted, increment the error count and go to the next line, otherwise, look to see if this is a line that the script has seen in a previous run and, if so, ignore it. If the line survives all these tests, we've found the first new line in the file (i.e. the first line that the script has not previously seen) and we carry on from here.
  7. if(@SkipHosts && (&SkipHost($field[$pos_host]) Test for any disallowed lines, using the @SkipHost, @SkipFiles, etc. arrays as referents. Any matching lines are dropped and the next line is fetched.
  8. At this point, we assume that the line is valid.
  9. if ($daterecord > $lastprocesseddate) Test the line to see if it starts a new section (i.e. the month or year has changed). If so, reset the section accumulators.
  10. if ($URLNotCaseSensitive) Validate the URL, checking for unwanted parameters and ensuring that the URL makes sense.
  11. $extension=($LevelForFileTypesDetection>=2 Increment any appropriate counters for the URL, going to the end of the loop if a counter is incremented.
  12. && $urlwithnoquery =~ /$regfavico/o) counts Favico traffic, if appropriate.
  13. if ($LevelForWormsDetection) counts hits from worms.
  14. if ($field[$pos_code] == 304) Decides how to count error codes.
  15. elsif ($LogType eq 'M' Processes mail records.
  16. elsif ($LogType eq 'F') An unused marker for code to count mail traffic.
  17. if ($DecodeUA) Processes Robot entries.
  18. if ($LevelForFileTypesDetection) Counts compressed file entries.
  19. if ($PageBool) Count the streams served.
  20. if ($pos_logname>=0 Count the login entries.
  21. if ($DNSLookup) Looks in various locations for a host entry that matches the value being checked.
  22. my $Domain='ip'; Tries to resolve the top level domain for the host (theoretically country).
  23. my $timehostl=$_host_l{$HostResolved}; Count the visits from the given host.
  24. if ($LevelForBrowsersDetection) All the browser counters.
  25. my $uaos=$TmpOS{$UserAgent} The operating system counters.
  26. if ($ShowDirectOrigin) Count the referer entries, such as search engines, etc. This does a lot of work to establish how the page was invoked.
  27. if ($pos_emails>=0 && $field[$pos_emails]) Counts the emails sent and received.
  28. if ($pos_cluster>=0) Increments the page count and byte counts for cluster hits.
  29. foreach my $extranum (1..@ExtraName-1) Counts for user added sections.
  30. if (++$counterforflushtest >= 20000) Decide whether to write the analysis out to disk.
  31. seek(LOG,$lastlineoffset,0); Indicates we have reached the end of the current line and are updating the log. The remainder of this section is concerned with housekeeping.

Report Creation

  1. my $max_p; my $max_h; my $max_k; my $max_v; Indicates the start of the section.
Last modified: 2009/01/03 19:16