2.2. lars.apache - Reading Apache Logs¶
This module provides a wrapper for Apache log files, typically in common or combined format (but technically any Apache format which is can be unambiguously parsed with regexes).
The ApacheSource class is the major element that this module exports; this is the class which wraps a file-like object containing a common, combined, or otherwise Apache formatted log file and yields rows from it as tuples.
2.2.1. Classes¶
- class lars.apache.ApacheSource(source, log_format=COMMON)[source]¶
Wraps a stream containing a Apache formatted log file.
This wrapper converts a stream containing an Apache log file into an iterable which yields tuples. Each tuple has fieldnames derived from the following mapping of Apache format strings (which occur in the optional log_format parameter):
Format String Field Name %a remote_ip %A local_ip %B size %b size %{Foobar}C cookie_Foobar (1) %D time_taken_ms %{FOOBAR}e env_FOOBAR (1) %f filename %h remote_host %H protocol %{Foobar}i req_Foobar (1) %k keepalive %l ident %m method %{Foobar}n note_Foobar (1) %{Foobar}o resp_Foobar (1) %p port %{canonical}p port %{local}p local_port %{remote}p remote_port %P pid %{pid}P pid %{tid}P tid %{hextid}P hextid %q url_query %r request %R handler %s status %t time %{format}t time %T time_taken %u remote_user %U url_stem %v server_name %V canonical_name %X connection_status %I bytes_received %O bytes_sent Notes:
- Any characters in the field-name which are invalid in a Python identifier are converted to underscore, e.g. %{foo-bar}C becomes "cookie_foo_bar".
Warning
The wrapper will only operate on log_format specifications that can be unambiguously parsed with a regular expression. In particular, this means that if a field can contain whitespace it must be surrounded by characters that it cannot legitimately contain (or cannot contain unescaped versions of). Typically double-quotes are used as Apache (from version 2.0.46) escapes double-quotes within %r, %i, and %o. See Apache’s Custom Log Formats documentation for full details.
Parameters: - source¶
The file-like object that the source reads rows from
- count¶
Returns the number of rows successfully read from the source
- log_format¶
The Apache LogFormat string that the class will use to decode rows
2.2.2. Data¶
- lars.apache.COMMON¶
This string contains the Apache LogFormat string for the common log format (sometimes called the CLF). This is the default format for the ApacheSource class.
- lars.apache.COMMON_VHOST¶
This string contains the Apache LogFormat strnig for the common log format with an additional virtual-host specification at the beginning of the string. This is a typical configuration used by several distributions of Apache which are configured with virtualhosts by default.
- lars.apache.COMBINED¶
This string contains the Apache LogFormat string for the NCSA combined/extended log format. This is a popular variant that many server administrators use as it combines the COMMON format with REFERER and USER_AGENT formats.
2.2.3. Exceptions¶
- class lars.apache.ApacheError(message, line_number=None, line=None)[source]¶
Base class for ApacheSource errors.
Exceptions of this class take the optional arguments line_number and line for specifying the index and content of the line that caused the error respectively. If specified, the __str__() method is overridden to include the line number in the error message.
Parameters:
2.2.4. Examples¶
A typical usage of this class is as follows:
import io
from lars import apache, csv
with io.open('/var/log/apache2/access.log', 'rb') as infile:
with io.open('access.csv', 'wb') as outfile:
with apache.ApacheSource(infile) as source:
with csv.CSVTarget(outfile) as target:
for row in source:
target.write(row)