3.4. lars.sql - Direct Database Output

This module provides a target wrapper for SQL-based databases, which can provide a powerful means of analyzing log data.

The SQLTarget class accepts row objects in its write() method and automatically generates the required SQL INSERT statements to append records to the specified target table.

The implementation has been tested with SQLite3 (built into Python), and PostgreSQL, but should work with any PEP-249 (Python DB API 2.0) compatible database cursor. A list of available Python database drives is maintained on the Python wiki DatabaseInterfaces page.

3.4.1. Classes

class lars.sql.SQLTarget(db_module, connection, table, insert=1, commit=1000, create_table=False, drop_table=False, ignore_drop_errors=True, str_type='VARCHAR(1000)', int_type='INTEGER', fixed_type='DOUBLE', bool_type='SMALLINT', date_type='DATE', time_type='TIME', datetime_type='TIMESTAMP', ip_type='VARCHAR(53)', hostname_type='VARCHAR(255)', path_type='VARCHAR(260)')[source]

Wraps a database connection to insert row tuples into an SQL database table.

This wrapper provides a simple write() method which can be used to insert row tuples into a specified table, which can optionally by created automatically by the wrapper before insertion of the first row. The wrapper must be passed a database connection object that conforms to the Python DB-API (version 2.0) as defined by PEP-249.

The db_module parameter must be passed the module that defines the database interface (this odd requirement is so that the wrapper can look up the parameter style that the interface uses, and the exceptions that it declares).

The connection parameter must be given an active database connection object (presumably belonging to the module passed to db_module).

The table parameter is the final mandatory parameter which names the table that values are to be inserted into. If the table name requires quoting in the target SQL dialect, you should include such quoting in the table value (this class does not try and discern what database engine you are connecting to and thus has no idea about non-standard quoting styles like `MySQL` or [MS-SQL]).

The insert parameter controls how many rows are inserted in a single INSERT statement. If this is set to a value greater than 1 (the default), then the write() method will buffer rows until the count is reached and attempt to insert all rows at once.

New in version 0.2.

Warning

This is a relatively risky option. If an error occurs while inserting one of the rows in a multi-row insert, then normally all rows in the buffer will fail to be inserted, but you will not be able to determine (in your script) which row caused the failure, or which rows should be re-attempted.

In other words, only use this if you are certain that failures cannot occur during insertion (e.g. if the target table has no constraints, no primary/unique keys, and no triggers which might signal failure).

The commit parameter controls how often a COMMIT statement is executed when inserting rows. By default, this is 1000 which is usually sufficient to provide decent performance but may (in certain database engines with fixed size transaction logs) cause errors, in which case you may wish to specify a lower value. This parameter must be a multiple of the value of the insert parameter (otherwise, the COMMIT statement will not be run reliably).

If the create_table parameter is set to True (it defaults to False), when the write() method is first called, the class will determine column names and types from the row passed in and will attempt to generate and execute a CREATE TABLE statement to set up the target table automatically. The database types that are used in the CREATE TABLE statement are controlled by other optional parameters and are documented in the table below:

Parameter Default Value (SQL)
str_type VARCHAR(1000) - typically used for URL fields.
int_type INTEGER - used for fields like status and size. If your server is serving large binaries you may wish to use a 64-bit type like BIGINT here instead.
fixed_type DOUBLE - used for fields like time_taken. Some users may wish to change this an appropriate NUMERIC or DECIMAL specification for precision.
bool_type SMALLINT - used for any boolean values in the input (0 for False, 1 for True)
date_type DATE
time_type TIME
datetime_type TIMESTAMP - MS-SQL users will likely wish to change this to DATETIME or SMALLDATETIME. MySQL users may wish to change this to DATETIME, although TIMESTAMP is technically also supported (albeit with functional differences).
ip_type VARCHAR(53) - this is sufficient for storing all possible IP address and port combinations up and including an IPv6 v4-mapped address. If you are certain you will only need IPv4 support you may wish to use a length of 21 (with port) or 15 (no port). PostgreSQL users may wish to use the special inet type instead as this is much more efficient but cannot store port information.
hostname_type VARCHAR(255)
path_type VARCHAR(260)

If the drop_table parameter is set to True (it defaults to False), the wrapper will first attempt to use DROP TABLE to destroy any existing table before attempting CREATE TABLE. If ignore_drop_errors is True (which it is by default) then any errors encountered during the drop operation (e.g. if the table does not exist) will be ignored.

commit

The number of rows which the class will attempt to write before performing a COMMIT. It is strongly recommended to set this to a reasonably large number (e.g. 1000) to ensure decent INSERT performance

insert

The number of rows which the class will attempt to insert with each INSERT statement. The commit parameter must be a multiple of this value.

New in version 0.2.

count

Returns the number of rows successfully written to the database so far

create_table

If True, the class will attempt to create the target table during the first call to the write() method

drop_table

If True, the class will attempt to unconditionally drop any existing target table during the first call to the write() method

ignore_drop_errors

If True, and drop_table is True, any errors encountered during the DROP TABLE operation will be ignored (typically useful when you are not sure the target table exists or not)

table

The name of the target table in the database, including any required escaping or quotation

close()[source]

Close the SQL target. This flushes any remaining rows from the internal buffer and the cursor against the provided connection. Note that it does not close the connection (as this instance didn’t open the connection).

write(row)[source]

Write row (a tuple of values) to the table specified in the constructor. If this is the first row written, and create_table was set to True in the constructor, this operation will also attempt to create the table (optionally dropping any existing table, again depending on constructor values).

class lars.sql.OracleTarget(db_module, connection, table, insert=1, commit=1000, create_table=False, drop_table=False, ignore_drop_errors=True, str_type='VARCHAR2(1000)', int_type='NUMBER(10)', fixed_type='NUMBER', bool_type='NUMBER(1)', date_type='DATE', time_type='DATE', datetime_type='DATE', ip_type='VARCHAR2(53)', hostname_type='VARCHAR2(255)', path_type='VARCHAR2(260)')[source]

The Oracle database is sufficiently peculiar (particularly in its non-standard syntax for multi-row INSERTs, and odd datatypes) to require its own sub-class of SQLTarget. This sub-class takes all the same parameters as SQLTarget, but customizes them specifically for Oracle, and overrides the SQL generation methods to cope with Oracle’s strange syntax.

New in version 0.2.

3.4.2. Exceptions

exception lars.sql.SQLError(message, row=None)[source]

Base class for all fatal errors generated by classes in the sql module.

Exceptions of this class take the optional argument row for specifying the row (if any) that was being inserted (or retrieved) when the error occurred. If specified, the __str__() method is overridden to include the row in the error message.

Parameters:
  • message (str) – The error message
  • row – The row being processed when the error occurred
exception lars.sql.SQLWarning[source]

Raised when a non-fatal condition occurs while inserting data into a database.

3.4.3. Examples

A typical example of working with the class is shown below:

import io
import sqlite3
from lars import apache, sql

connection = sqlite3.connect('apache.db',
                             detect_types=sqlite3.PARSE_DECLTYPES)

with io.open('/var/log/apache2/access.log', 'rb') as infile:
    with io.open('apache.csv', 'wb') as outfile:
        with apache.ApacheSource(infile) as source:
            with sql.SQLTarget(sqlite3, connection, 'log_entries',
                               create_table=True) as target:
                for row in source:
                    target.write(row)