CSV source section¶
A CSV source pipeline section lets you create pipeline items
from CSV files. The CSV source section blueprint name is
collective.transmogrifier.sections.csvsource
.
A CSV source section will load the CSV file named in the
filename
option or the CSV file named in an item key using the
key
option, and will yield an item for each line in the CSV
file. It'll use the first line of the CSV file to determine
what keys to use, or you can specify a
fieldnames
option to specify the key names.
The
filename
option may be an absolute path, or a package reference, e.g.
my.package:foo/bar.csv
.
By default the CSV file is assumed to use the Excel CSV
dialect, but you can specify any dialect supported by the
python csv module if you specify it with the
dialect
option. You can also specify
`fmtparams`_
using options that start with
fmtparam-
.
>>> import os
>>> from collective.transmogrifier import tests
>>> csvsource = """
... [transmogrifier]
... pipeline =
... csvsource
... logger
...
... [csvsource]
... blueprint = collective.transmogrifier.sections.csvsource
... filename = {}/csvsource.csv
...
... [logger]
... blueprint = collective.transmogrifier.sections.logger
... name = logger
... level = INFO
... """.format(os.path.dirname(tests.__file__))
>>> registerConfig(u'collective.transmogrifier.sections.tests.csvsource.file',
... csvsource)
>>> transmogrifier(u'collective.transmogrifier.sections.tests.csvsource.file')
>>> print handler
logger INFO
{'bar': 'first-bar', 'baz': 'first-baz', 'foo': 'first-foo'}
logger INFO
{'bar': 'second-bar', 'baz': 'second-baz', 'foo': 'second-foo'}
The CSV file column field names can also be specified.
>>> handler.clear()
>>> transmogrifier(u'collective.transmogrifier.sections.tests.csvsource.file',
... csvsource=dict(fieldnames='monty spam eggs'))
>>> print handler
logger INFO
{'eggs': 'baz', 'monty': 'foo', 'spam': 'bar'}
logger INFO
{'eggs': 'first-baz', 'monty': 'first-foo', 'spam': 'first-bar'}
logger INFO
{'eggs': 'second-baz', 'monty': 'second-foo', 'spam': 'second-bar'}
Here is the same example, loading a file from a package instead:
>>> csvsource = """
... [transmogrifier]
... pipeline =
... csvsource
... logger
...
... [csvsource]
... blueprint = collective.transmogrifier.sections.csvsource
... filename = collective.transmogrifier.tests:sample.csv
...
... [logger]
... blueprint = collective.transmogrifier.sections.logger
... name = logger
... level = INFO
... """
>>> registerConfig(u'collective.transmogrifier.sections.tests.csvsource.package',
... csvsource)
>>> handler.clear()
>>> transmogrifier(u'collective.transmogrifier.sections.tests.csvsource.package')
>>> print handler
logger INFO
{'bar': 'first-bar', 'baz': 'first-baz', 'foo': 'first-foo'}
logger INFO
{'_csvsource_rest': ['corge', 'grault'],
'bar': 'second-bar',
'baz': 'second-baz',
'foo': 'second-foo'}
We can also load a file from a GS import context:
>>> from collective.transmogrifier.transmogrifier import Transmogrifier
>>> from collective.transmogrifier.genericsetup import IMPORT_CONTEXT
>>> from zope.annotation.interfaces import IAnnotations
>>> class FakeImportContext(object):
... def __init__(self, subdir, filename, contents):
... self.filename = filename
... self.subdir = subdir
... self.contents = contents
... def readDataFile(self, filename, subdir=None):
... if subdir is None and self.subdir is not None:
... return None
... if filename != self.filename:
... return None
... return self.contents
>>> csvsource = """
... [transmogrifier]
... pipeline =
... csvsource
... logger
...
... [csvsource]
... blueprint = collective.transmogrifier.sections.csvsource
... filename = importcontext:sub/dir/somefile.csv
...
... [logger]
... blueprint = collective.transmogrifier.sections.logger
... name = logger
... level = INFO
... """
>>> registerConfig(u'collective.transmogrifier.sections.tests.csvsource.gs',
... csvsource)
>>> handler.clear()
>>> t = Transmogrifier({})
>>> IAnnotations(t)[IMPORT_CONTEXT] = FakeImportContext('sub/dir/', 'somefile.csv',
... """animal,name
... cow,daisy
... pig,george
... duck,archibald
... """)
>>> t(u'collective.transmogrifier.sections.tests.csvsource.gs')
>>> print handler
logger INFO
{'animal': 'cow', 'name': 'daisy'}
logger INFO
{'animal': 'pig', 'name': 'george'}
logger INFO
{'animal': 'duck', 'name': 'archibald'}
Import contexts can be chunked, and that's okay:
>>> import StringIO
>>> class FakeChunkedImportContext(object):
... def __init__(self, subdir, filename, contents):
... self.filename = filename
... self.contents = contents
... def openDataFile(self, filename, subdir=None):
... if subdir is None and self.subdir is not None:
... return None
... if filename != self.filename:
... return None
... return StringIO.StringIO(self.contents)
>>> handler.clear()
>>> t = Transmogrifier({})
>>> IAnnotations(t)[IMPORT_CONTEXT] = FakeChunkedImportContext(None, 'somefile.csv',
... """animal,name
... fish,wanda
... """)
>>> t(u'collective.transmogrifier.sections.tests.csvsource.gs')
>>> print handler
logger INFO
{'animal': 'fish', 'name': 'wanda'}
Attempting to load a nonexistant file won't do anything:
>>> handler.clear()
>>> t = Transmogrifier({})
>>> IAnnotations(t)[IMPORT_CONTEXT] = FakeImportContext(None, 'someotherfile.csv',
... """animal,name
... cow,daisy
... pig,george
... duck,archibald
... """)
>>> t(u'collective.transmogrifier.sections.tests.csvsource.gs')
>>> print handler
Not having an import context around will also find nothing:
>>> handler.clear()
>>> t = Transmogrifier({})
>>> t(u'collective.transmogrifier.sections.tests.csvsource.gs')
>>> print handler
The file can also be taken from a source item's key. A key can also be specified for rows that have more values than the fieldnames.
>>> csvsource = """
... [transmogrifier]
... include = collective.transmogrifier.sections.tests.csvsource.package
... pipeline =
... csvsource
... filename
... item-csvsource
... logger
...
... [csvsource]
... blueprint = collective.transmogrifier.sections.csvsource
... filename = collective.transmogrifier.tests:keysource.csv
...
... [filename]
... blueprint = collective.transmogrifier.sections.inserter
... key = string:_item-csvsource
... condition = exists:item/_item-csvsource
... value = python:modules['os.path'].join(modules['os.path'].dirname(
... modules['collective.transmogrifier.tests'].__file__),
... item['_item-csvsource'])
...
... [item-csvsource]
... blueprint = collective.transmogrifier.sections.csvsource
... restkey = _args
... row-key = string:_csvsource
...
... """
>>> registerConfig(u'collective.transmogrifier.sections.tests.csvsource.key',
... csvsource)
>>> handler.clear()
>>> transmogrifier(u'collective.transmogrifier.sections.tests.csvsource.key')
>>> print handler
logger INFO
{'_item-csvsource': '.../collective/transmogrifier/tests/sample.csv'}
logger INFO
{'_csvsource': '.../collective/transmogrifier/tests/sample.csv',
'bar': 'first-bar',
'baz': 'first-baz',
'foo': 'first-foo'}
logger INFO
{'_args': ['corge', 'grault'],
'_csvsource': '.../collective/transmogrifier/tests/sample.csv',
'bar': 'second-bar',
'baz': 'second-baz',
'foo': 'second-foo'}
The
fmtparam-
expressions have access to the following:
key
|
the `fmtparam`_ attribute |
transmogrifier
|
the transmogrifier |
name
|
the name of the inserter section |
options
|
the inserter options |
modules
|
sys.modules |
The
row-key
and
row-value
expressions have access to the following:
item
|
the pipeline item to be yielded from this CSV row |
source_item
|
the pipeline item the CSV filename was taken from |
transmogrifier
|
the transmogrifier |
name
|
the name of the inserter section |
options
|
the inserter options |
modules
|
sys.modules |
key
|
(only for the value and condition expressions) the key being inserted |