3.3. lars.csv - Writing CSV Files¶
This module provides a target wrapper for CSV (Comma Separated Values) formatted text files, which are typically used as a generic source format for bulk loading databases.
The CSVTarget
class is the major element that this module provides; it
is a standard target class (a context manager with a
write()
method that accepts row tuples).
3.3.1. Classes¶
-
class
lars.csv.
CSVTarget
(fileobj, header=False, dialect=CSV_DIALECT, encoding='utf-8', **kwargs)[source]¶ Wraps a stream to format rows as CSV (Comma Separated Values).
This wrapper provides a simple
write()
method which can be used to format row tuples as comma separated values in a variety of common dialects. The dialect defaults toCSV_DIALECT
which produces a typical CSV file compatible with the vast majority of products.If you desire a different output format you can either specify a different value for the dialect parameter, or if you only wish to use a minimal modification of the dialect you can override its attributes with keyword arguments. For example:
CSVTarget(outfile, dialect=CSV_DIALECT, lineterminator='\n')
The encoding parameter controls the character set used in the output. This defaults to UTF-8 which is a sensible default for most modern systems, but is a multi-byte encoding which some legacy systems (notably mainframes) may have troubles with. In this case you can either select a single byte encoding like ISO-8859-1 or even EBCDIC. See Python standard encodings for a full list of supported encodings.
Warning
The file that you wrap with
CSVTarget
must be opened in binary mode ('wb'
) partly because the dialect dictates the line terminator that is used, and partly because the class handles its own character encoding.
-
class
lars.csv.
CSV_DIALECT
¶ This is the default dialect used by the
CSVTarget
class which has the following attributes:Attribute Value delimiter ','
(comma)quotechar '"'
(double-quote)quoting QUOTE_MINIMAL
lineterminator '\r\n'
(DOS line breaks)doublequote True escapechar None This dialect is compatible with Microsoft Excel and the vast majority of of other products which accept CSV as an input format. However, please note that some UNIX based database products require UNIX style line endings (
'\n'
) in which case you may wish to override the lineterminator attribute (seeCSVTarget
for more information).
-
class
lars.csv.
TSV_DIALECT
¶ This is a dialect which produces tab-delimited files, another common data exchange format also supported by Microsoft Excel and numerous database products. This dialect has the following properties:
Attribute Value delimiter '\t'
(tab)quotechar '"'
(double-quote)quoting QUOTE_MINIMAL
lineterminator '\r\n'
(DOS line breaks)doublequote True escapechar None
3.3.2. Data¶
-
lars.csv.
QUOTE_NONE
¶ This value indicates that no values should ever be quoted, even if they contain the delimiter character. In this case, any delimiter characters appearing the data will be preceded by the dialect’s escapechar which should be set to an appropriate value. If escapechar is not set (None) an exception will be raised if any character that require quoting are encountered.
-
lars.csv.
QUOTE_MINIMAL
¶ This is the default quoting mode. In this mode the writer will only quote those values that contain the delimiter or quotechar characters, or any of the characters in lineterminator.
-
lars.csv.
QUOTE_NONNUMERIC
¶ This value tells the writer to quote all numeric (int and float) values.
-
lars.csv.
QUOTE_ALL
¶ This value simply tells the writer to quote all values written.
3.3.3. Examples¶
A typical example of working with the class is shown below:
import io
from lars import apache, csv
with io.open('/var/log/apache2/access.log', 'rb') as infile:
with io.open('apache.csv', 'wb') as outfile:
with apache.ApacheSource(infile) as source:
with csv.CSVTarget(outfile, lineterminator='\n') as target:
for row in source:
target.write(row)