3.4. lars.sql - Direct Database Output¶
This module provides a target wrapper for SQL-based databases, which can provide a powerful means of analyzing log data.
The SQLTarget
class accepts row objects in its
write()
method and automatically generates the required SQL
INSERT
statements to append records to the specified target table.
The implementation has been tested with SQLite3 (built into Python), and PostgreSQL, but should work with any PEP-249 (Python DB API 2.0) compatible database cursor. A list of available Python database drives is maintained on the Python wiki DatabaseInterfaces page.
3.4.1. Classes¶
-
class
lars.sql.
SQLTarget
(db_module, connection, table, insert=1, commit=1000, create_table=False, drop_table=False, ignore_drop_errors=True, str_type='VARCHAR(1000)', int_type='INTEGER', fixed_type='DOUBLE', bool_type='SMALLINT', date_type='DATE', time_type='TIME', datetime_type='TIMESTAMP', ip_type='VARCHAR(53)', hostname_type='VARCHAR(255)', path_type='VARCHAR(260)')[source]¶ Wraps a database connection to insert row tuples into an SQL database table.
This wrapper provides a simple
write()
method which can be used to insert row tuples into a specified table, which can optionally by created automatically by the wrapper before insertion of the first row. The wrapper must be passed a database connection object that conforms to the Python DB-API (version 2.0) as defined by PEP-249.The db_module parameter must be passed the module that defines the database interface (this odd requirement is so that the wrapper can look up the parameter style that the interface uses, and the exceptions that it declares).
The connection parameter must be given an active database connection object (presumably belonging to the module passed to db_module).
The table parameter is the final mandatory parameter which names the table that values are to be inserted into. If the table name requires quoting in the target SQL dialect, you should include such quoting in the table value (this class does not try and discern what database engine you are connecting to and thus has no idea about non-standard quoting styles like
`MySQL`
or[MS-SQL]
).The insert parameter controls how many rows are inserted in a single
INSERT
statement. If this is set to a value greater than 1 (the default), then thewrite()
method will buffer rows until the count is reached and attempt to insert all rows at once.New in version 0.2.
Warning
This is a relatively risky option. If an error occurs while inserting one of the rows in a multi-row insert, then normally all rows in the buffer will fail to be inserted, but you will not be able to determine (in your script) which row caused the failure, or which rows should be re-attempted.
In other words, only use this if you are certain that failures cannot occur during insertion (e.g. if the target table has no constraints, no primary/unique keys, and no triggers which might signal failure).
The commit parameter controls how often a
COMMIT
statement is executed when inserting rows. By default, this is 1000 which is usually sufficient to provide decent performance but may (in certain database engines with fixed size transaction logs) cause errors, in which case you may wish to specify a lower value. This parameter must be a multiple of the value of the insert parameter (otherwise, theCOMMIT
statement will not be run reliably).If the create_table parameter is set to True (it defaults to False), when the
write()
method is first called, the class will determine column names and types from the row passed in and will attempt to generate and execute aCREATE TABLE
statement to set up the target table automatically. The database types that are used in theCREATE TABLE
statement are controlled by other optional parameters and are documented in the table below:Parameter Default Value (SQL) str_type VARCHAR(1000)
- typically used for URL fields.int_type INTEGER
- used for fields like status and size. If your server is serving large binaries you may wish to use a 64-bit type likeBIGINT
here instead.fixed_type DOUBLE
- used for fields like time_taken. Some users may wish to change this an appropriateNUMERIC
orDECIMAL
specification for precision.bool_type SMALLINT
- used for any boolean values in the input (0 for False, 1 for True)date_type DATE
time_type TIME
datetime_type TIMESTAMP
- MS-SQL users will likely wish to change this toDATETIME
orSMALLDATETIME
. MySQL users may wish to change this toDATETIME
, althoughTIMESTAMP
is technically also supported (albeit with functional differences).ip_type VARCHAR(53)
- this is sufficient for storing all possible IP address and port combinations up and including an IPv6 v4-mapped address. If you are certain you will only need IPv4 support you may wish to use a length of 21 (with port) or 15 (no port). PostgreSQL users may wish to use the specialinet
type instead as this is much more efficient but cannot store port information.hostname_type VARCHAR(255)
path_type VARCHAR(260)
If the drop_table parameter is set to True (it defaults to False), the wrapper will first attempt to use
DROP TABLE
to destroy any existing table before attemptingCREATE TABLE
. If ignore_drop_errors is True (which it is by default) then any errors encountered during the drop operation (e.g. if the table does not exist) will be ignored.-
commit
¶ The number of rows which the class will attempt to write before performing a COMMIT. It is strongly recommended to set this to a reasonably large number (e.g. 1000) to ensure decent INSERT performance
-
insert
¶ The number of rows which the class will attempt to insert with each INSERT statement. The
commit
parameter must be a multiple of this value.New in version 0.2.
-
count
¶ Returns the number of rows successfully written to the database so far
-
create_table
¶ If True, the class will attempt to create the target table during the first call to the
write()
method
-
drop_table
¶ If True, the class will attempt to unconditionally drop any existing target table during the first call to the
write()
method
-
ignore_drop_errors
¶ If True, and
drop_table
is True, any errors encountered during theDROP TABLE
operation will be ignored (typically useful when you are not sure the target table exists or not)
-
table
¶ The name of the target table in the database, including any required escaping or quotation
-
close
()[source]¶ Close the SQL target. This flushes any remaining rows from the internal buffer and the cursor against the provided connection. Note that it does not close the connection (as this instance didn’t open the connection).
-
write
(row)[source]¶ Write row (a tuple of values) to the table specified in the constructor. If this is the first row written, and create_table was set to
True
in the constructor, this operation will also attempt to create the table (optionally dropping any existing table, again depending on constructor values).
-
-
class
lars.sql.
OracleTarget
(db_module, connection, table, insert=1, commit=1000, create_table=False, drop_table=False, ignore_drop_errors=True, str_type='VARCHAR2(1000)', int_type='NUMBER(10)', fixed_type='NUMBER', bool_type='NUMBER(1)', date_type='DATE', time_type='DATE', datetime_type='DATE', ip_type='VARCHAR2(53)', hostname_type='VARCHAR2(255)', path_type='VARCHAR2(260)')[source]¶ The Oracle database is sufficiently peculiar (particularly in its non-standard syntax for multi-row INSERTs, and odd datatypes) to require its own sub-class of
SQLTarget
. This sub-class takes all the same parameters asSQLTarget
, but customizes them specifically for Oracle, and overrides the SQL generation methods to cope with Oracle’s strange syntax.New in version 0.2.
3.4.2. Exceptions¶
-
exception
lars.sql.
SQLError
(message, row=None)[source]¶ Base class for all fatal errors generated by classes in the sql module.
Exceptions of this class take the optional argument row for specifying the row (if any) that was being inserted (or retrieved) when the error occurred. If specified, the
__str__()
method is overridden to include the row in the error message.Parameters: - message (str) – The error message
- row – The row being processed when the error occurred
3.4.3. Examples¶
A typical example of working with the class is shown below:
import io
import sqlite3
from lars import apache, sql
connection = sqlite3.connect('apache.db',
detect_types=sqlite3.PARSE_DECLTYPES)
with io.open('/var/log/apache2/access.log', 'rb') as infile:
with io.open('apache.csv', 'wb') as outfile:
with apache.ApacheSource(infile) as source:
with sql.SQLTarget(sqlite3, connection, 'log_entries',
create_table=True) as target:
for row in source:
target.write(row)