3.3. lars.csv - Writing CSV Files

This module provides a target wrapper for CSV (Comma Separated Values) formatted text files, which are typically used as a generic source format for bulk loading databases.

The CSVTarget class is the major element that this module provides; it is a standard target class (a context manager with a write() method that accepts row tuples).

3.3.1. Classes

class lars.csv.CSVTarget(fileobj, header=False, dialect=CSV_DIALECT, encoding='utf-8', **kwargs)[source]

Wraps a stream to format rows as CSV (Comma Separated Values).

This wrapper provides a simple write() method which can be used to format row tuples as comma separated values in a variety of common dialects. The dialect defaults to CSV_DIALECT which produces a typical CSV file compatible with the vast majority of products.

If you desire a different output format you can either specify a different value for the dialect parameter, or if you only wish to use a minimal modification of the dialect you can override its attributes with keyword arguments. For example:

CSVTarget(outfile, dialect=CSV_DIALECT, lineterminator='\n')

The encoding parameter controls the character set used in the output. This defaults to UTF-8 which is a sensible default for most modern systems, but is a multi-byte encoding which some legacy systems (notably mainframes) may have troubles with. In this case you can either select a single byte encoding like ISO-8859-1 or even EBCDIC. See Python standard encodings for a full list of supported encodings.

Warning

The file that you wrap with CSVTarget must be opened in binary mode ('wb') partly because the dialect dictates the line terminator that is used, and partly because the class handles its own character encoding.

close()[source]

Closes the CSV output. Further calls to write() are not permitted after calling this method.

write(row)[source]

Write the specified row (a tuple of values) to the wrapped output. All provided rows must have the same number of elements. There is no need to convert elements of the tuple to str; this will be handled implicitly.

class lars.csv.CSV_DIALECT

This is the default dialect used by the CSVTarget class which has the following attributes:

Attribute Value
delimiter ',' (comma)
quotechar '"' (double-quote)
quoting QUOTE_MINIMAL
lineterminator '\r\n' (DOS line breaks)
doublequote True
escapechar None

This dialect is compatible with Microsoft Excel and the vast majority of of other products which accept CSV as an input format. However, please note that some UNIX based database products require UNIX style line endings ('\n') in which case you may wish to override the lineterminator attribute (see CSVTarget for more information).

class lars.csv.TSV_DIALECT

This is a dialect which produces tab-delimited files, another common data exchange format also supported by Microsoft Excel and numerous database products. This dialect has the following properties:

Attribute Value
delimiter '\t' (tab)
quotechar '"' (double-quote)
quoting QUOTE_MINIMAL
lineterminator '\r\n' (DOS line breaks)
doublequote True
escapechar None

3.3.2. Data

lars.csv.QUOTE_NONE

This value indicates that no values should ever be quoted, even if they contain the delimiter character. In this case, any delimiter characters appearing the data will be preceded by the dialect’s escapechar which should be set to an appropriate value. If escapechar is not set (None) an exception will be raised if any character that require quoting are encountered.

lars.csv.QUOTE_MINIMAL

This is the default quoting mode. In this mode the writer will only quote those values that contain the delimiter or quotechar characters, or any of the characters in lineterminator.

lars.csv.QUOTE_NONNUMERIC

This value tells the writer to quote all numeric (int and float) values.

lars.csv.QUOTE_ALL

This value simply tells the writer to quote all values written.

3.3.3. Examples

A typical example of working with the class is shown below:

import io
from lars import apache, csv

with io.open('/var/log/apache2/access.log', 'rb') as infile:
    with io.open('apache.csv', 'wb') as outfile:
        with apache.ApacheSource(infile) as source:
            with csv.CSVTarget(outfile, lineterminator='\n') as target:
                for row in source:
                    target.write(row)