Querying¶
Description
How to programmatically search and query content from a Plone site.
- Introduction
-
Accesing the
portal_catalog
tool -
Querying
portal_catalog
- Brain result id
- Brain result path
- Brain object schema
- Sorting and limiting the number of results
- Text format
- Accessing indexed data
- Dumping portal catalog content
- Bypassing query security check
- Bypassing language check
- Bypassing Expired content check
- None as query parameter
- Query by path
- Query multiple values
- Querying by interface
- Query by content type
- Query published items
- Getting a random item
- Querying FieldIndexes by Range
- Querying by date
- Query by language
- Boolean queries (AdvancedQuery)
- Setting Up A New Style Query
- Accessing metadata
- Fuzzy search
- Unique values
- Performance
- Batching
- Walking through all content
- Other notes
Introduction¶
Querying is the action to retrieve data from
search indexes. In Plone's case this usually means
querying content items using the
portal_catalog
tool. Plone uses the
portal_catalog
tool to perform most content-related queries. Special
catalogs, like
reference_catalog
, exist, for specialized and optimized queries.
Accesing the
portal_catalog
tool¶
Plone queries are performed using
portal_catalog
persistent tool which is available as an persistent object
at the site root.
Example:
# portal_catalog is defined in the site root
portal_catalog = site.portal_catalog
You can also use
ITools
tool to get access to
portal_catalog
if you do not have Plone site object directly available:
context = aq_inner(self.context)
tools = getMultiAdapter((context, self.request), name=u'plone_tools')
portal_url = tools.catalog()
There is also a third way, using traversing. This is discouraged, as this includes extra processing overhead:
# Use magical Zope acquisition mechanism
portal_catalog = context.portal_catalog
... and the same in TAL template:
<div tal:define="portal_catalog context/portal_catalog" />
A safer method is to use the
getToolByName
helper function:
from Products.CMFCore.utils import getToolByName
catalog = getToolByName(context, 'portal_catalog')
Querying
portal_catalog
¶
To search for something and get the resulting brains, write:
results = catalog.searchResults(**kwargs)
Note
The catalog returns "brains". A brain is a lightweight proxy for a found object, which has attributes corresponding to the metadata defined for the catalog.
Where
kwargs
is a dictionary of index names and their associated query
values. Only the indexes that you care about need to be
included. This is really useful if you have variable
searching criteria, for example, coming from a form where
the users can select different fields to search for. For
example:
results = catalog.searchResults({'portal_type': 'Event', 'review_state': 'pending'})
It is worth pointing out at this point that the indexes that you include are treated as a logical AND, rather than OR. In other words, the query above will find all the items that are both an Event, AND in the review state of pending.
Additionally, you can call the catalog tool directly,
which is equivalent to calling
catalog.searchResults()
:
results = catalog(portal_type='Event')
If you call portal_catalog() without arguments it will return all indexed content objects:
# Print all content on the site
all_brains = catalog()
for brain in all_brains:
print "Name:" + brain["Title"] + " URL:" + brain.getURL()
The catalog tool queries return an iterable of catalog brain objects.
As mentioned previously, brains contain a subset of the actual content object information. The available subset is defined by the metadata columns in portal_catalog. You can see available metadata columns on the portal_catalog "Metadata" tab in ZMI. For more information, see indexing.
Available indexes¶
To see the full list of available indexes in your catalog, open the ZMI (what usually means navigating to http://yoursiteURL/manage) look for the portal_catalog object tool into the root of your Plone site and check the Indexes tab. Note that there are different types of indexes, and each one admits different types of search parameters, and behave differently. For example, FieldIndex and KeywordIndex support sorting, but ZCTextIndex doesn't. To learn more about indexes, see The Zope Book, Searching and Categorizing Content.
Some of the most commonly used ones are:
- Title
- The title of the content object.
- Description
- The description field of the content.
- Subject
-
The keywords used to categorize the content. Example:
catalog.searchResults(Subject=('cats', 'dogs'))
- portal_type
-
As its name suggest, search for content whose portal type is indicated. For example:
catalog.searchResults(portal_type='News Item')
You can also specify several types using a list or tuple format:
catalog.searchResults(portal_type=('News Item', 'Event'))
- review_state
-
The current workflow review state of the content. For example:
catalog.searchResults(review_state='pending')
- object_provides
-
From Plone 3, you can search by the interface provided by the content. Example:
from Products.MyProduct.path.to import IIsCauseForCelebration catalog(object_provides=IIsCauseForCelebration.__identifier__)
Searching for interfaces can have some benefits. Suppose you have several types, for example, event types like Birthday, Wedding and Graduation, in your portal which implement the same interface (for example,
IIsCauseForCelebration
). Suppose you want to get items of these types from the catalog by their interface. This is more exact than naming the types explicitly (like portal_type=['Birthday','Wedding','Graduation' ]), because you don't really care what the types' names really are: all you really care for is the interface. This has the additional advantage that if products added or modified later add types which implement the interface, these new types will also show up in your query.
Brain result id¶
Result ID (RID) is given with the brain object and you can use this ID to query further info about the object from the catalog.
Example:
(Pdb) brain.getRID()
872272330
Brain result path¶
Brain result path can be extraced as string using
getPath()
method:
print r.getPath()
/site/sisalto/ajankohtaista
Brain object schema¶
To see what metadata columns a brain object contain, you
can access this information from
__record_schema__
attribute which is a dict.
Example:
for i in brain.__record_schema__.items(): print i
('startDate', 32)
('endDate', 33)
('Title', 8)
('color', 31)
('data_record_score_', 35)
('exclude_from_nav', 13)
('Type', 9)
('id', 19)
('cmf_uid', 29)
Todo
What do those numbers represent?
Getting the underlying object, its path, and its URL from a brain¶
As it was said earlier, searching inside the catalog returns catalog brains, not the object themselves. If you want to get the object associated with a brain, do:
brain.getObject()
To get the path of the object without fetching it:
brain.getPath()
which returns the path as an string, corresponding to
obj.getPhysicalPath()
And finally, to get the URL of the underlying object, usually to provide a link to it:
brain.getURL()
which is equivalent to
obj.absolute_url()
.
Note
Calling getObject() has performance implications. Waking up each object needs a separate query to the database.
getObject() and unrestrictedSearchResults() permission checks¶
You cannot call getObject() for a restricted result, even in trusted code.
Instead, you need to use:
unrestrictedTraverse(brain.getPath())
Todo
How to call
unrestrictedTraverse
For more information, see
Counting value of an specific index¶
The efficient way of counting the number value of an index is to work directly in this index. For example we want to count the number of each portal_type. Quering via search results is a performance bootleneck for that. Iterating on all brains put those in zodb cache. This method is also a memory bottleneck. So the good way for do that
### count portal_type index
stats = {}
x = getToolByName(context, 'portal_catalog')
index = x._catalog.indexes['portal_type']
for key in index.uniqueValues():
t = index._index.get(key)
if type(t) is not int:
stats[str(key)] = len(t)
else:
stats[str(key)] = 1
Sorting and limiting the number of results¶
To sort the results, use the sort_on and sort_order arguments. The sort_on argument accepts any available index, even if you're not searching by it. The sort_order can be either 'ascending' or 'descending', where 'ascending' means from A to Z for a text field. 'reverse' is an alias equivalent to 'descending'. For example:
results = catalog_searchResults(Description='Plone documentation',
sort_on='sortable_title', sort_order='ascending')
The catalog.searchResults() returns a list-like object, so to limit the number of results you can just use Python's slicing. For example, to get only the first 3 items:
results = catalog.searchResults(Description='Plone documentation')[:3]
In addition, ZCatalogs allow a sort_limit argument. The
sort_limit is only a hint for the search algorithms and
can potentially return a few more items, so it's
preferable to use both
sort_limit
and slicing simultaneously:
limit = 50
results = catalog.searchResults(Description='Plone documentation',
sort_limit=limit)[:limit]
portal_catalog query takes sort_on argument which tells the index used for sorting. sort_order defines sort direction. It can be string "reverse".
Sorting is supported only on FieldIndexes. Due to nature of searchable text indexes (they index split text, not strings) they cannot be used for sorting. For example, to do sorting by title, an index called sortable_tite should be used.
Example how to sort by id:
results = context.portal_catalog.searchResults(sort_on="id",
portal_type="Document",
sort_order="reverse")
Text format¶
Since most indexes use Archetypes accessors to index the field value, the returned text is UTF-8 encoded. This is a limitation inherited from the early ages of Plone.
To get unicode value for e.g. title you need to do the following:
title = brain["Title"]
title = title.decode("utf-8")
if title[0] == u"å":
# Unicode text matching etc. functions work correctly now
pass
Accessing indexed data¶
Normally you don't get copy of indexed data with brains, only metadata. You can still access the raw indexed data if you know what you are doing by using RID of the brain object.
Example:
(Pdb) data = self.context.portal_catalog.getIndexDataForRID(872272330)
(Pdb) for i in data.items(): print i
('Title', ['ulkomuseon', 'tarinaopastukset'])
('effectiveRange', (21305115, 278752140))
('object_provides', ['Products.CMFCore.interfaces._content.IDublinCore', 'Products.ATContentTypes.interface.interfaces.IHistoryAware', 'AccessControl.interfaces.IOwned', 'OFS.interfaces.ITraversable', 'plone.portlets.interfaces.ILocalPortletAssignable', 'Products.Archetypes.interfaces._base.IBaseObject', 'zope.annotation.interfaces.IAttributeAnnotatable', 'vs.event.interfaces.IVSEvent', 'Products.CMFCore.interfaces._content.IMutableMinimalDublinCore', 'OFS.interfaces.IPropertyManager', 'OFS.interfaces.IZopeObject', 'AccessControl.interfaces.IRoleManager', 'zope.annotation.interfaces.IAnnotatable', 'Acquisition.interfaces.IAcquirer', 'Products.ATContentTypes.interface.event.IATEvent', 'OFS.interfaces.ICopySource', 'Products.LinguaPlone.interfaces.ITranslatable', 'Products.ATContentTypes.interface.interfaces.ICalendarSupport', 'Products.ATContentTypes.interface.interfaces.IATContentType', 'plone.app.iterate.interfaces.IIterateAware', 'Products.Archetypes.interfaces._base.IBaseContent', 'Products.CMFCore.interfaces._content.ICatalogableDublinCore', 'Products.CMFDynamicViewFTI.interface._base.IBrowserDefault', 'Products.Archetypes.interfaces._referenceable.IReferenceable', 'plone.locking.interfaces.ITTWLockable', 'plone.app.imaging.interfaces.IBaseObject', 'persistent.interfaces.IPersistent', 'webdav.interfaces.IDAVResource', 'AccessControl.interfaces.IPermissionMappingSupport', 'OFS.interfaces.ISimpleItem', 'plone.app.kss.interfaces.IPortalObject', 'plone.app.kss.interfaces.IContentish', 'archetypes.schemaextender.interfaces.IExtensible', 'App.interfaces.IUndoSupport', 'OFS.interfaces.IManageable', 'App.interfaces.IPersistentExtra', 'Products.CMFCore.interfaces._content.IMutableDublinCore', 'Products.Archetypes.interfaces._athistoryaware.IATHistoryAware', 'dateable.kalends.IRecurringEvent', 'OFS.interfaces.IItem', 'zope.interface.Interface', 'OFS.interfaces.IFTPAccess', 'Products.CMFDynamicViewFTI.interface._base.ISelectableBrowserDefault', 'webdav.interfaces.IWriteLock', 'Products.CMFCore.interfaces._content.IMinimalDublinCore', 'Products.CMFCore.interfaces._content.IDynamicType', 'Products.CMFCore.interfaces._content.IContentish'])
('Type', u'VSEvent')
('id', 'ulkomuseon-tarinaopastukset')
('cmf_uid', 2)
('recurrence_days', [733960, 733981, 733974, 733967])
('end', 1077028380)
('Description', ['saamelaismuseon', 'ulkomuseossa', ...
('is_folderish', False)
('getId', 'ulkomuseon-tarinaopastukset')
('start', 1077028380)
('is_default_page', False)
('Date', 1077036795)
('review_state', 'published')
('Language', <LanguageIndex.IndexEntry id 872272330 language fi, cid 8b9a08c216b8e086f3446775ad71a748>)
('portal_type', 'VSEvent')
('expires', 1339244460)
('allowedRolesAndUsers', ['Anonymous'])
('getObjPositionInParent', 10)
('path', '/siida/sisalto/8-vuodenaikaa/ulkomuseon-tarinaopastukset')
('in_reply_to', '')
('UID', '8b9a08c216b8e086f3446775ad71a748')
('Creator', 'admin')
('effective', 1077036795)
('getRawRelatedItems', [])
('getEventType', [])
('created', 1077036792)
('modified', 1077048720)
('SearchableText', ['ulkomuseon', 'tarinaopastukset', ...
('sortable_title', 'ulkomuseon tarinaopastukset')
('meta_type', 'VSEvent')
('Subject', [])
You can also directly access a single index:
# Get event brain result id
rid = event.getRID()
# Get list of recurrence_days indexed value.
# ZCatalog holds internal Catalog object which we can directly poke in evil way
# This call goes to Products.PluginIndexes.UnIndex.Unindex class and we
# read the persistent value from there what it has stored in our index
# recurrence_days
indexed_days = portal_catalog._catalog.getIndex("recurrence_days").getEntryForObject(rid, default=[])
Dumping portal catalog content¶
Following is useful in unit test debugging:
# Print all objects visible to the currently logged in user
for i in portal_catalog(): print i.getURL()
Bypassing query security check¶
Note
Security: All portal_catalog queries are limited to the current user permissions by default.
If you want to bypass this restrictions, use the unrestrictedSearchResults() method.
Example:
# Print absolute content of portal_catalog
for i in portal_catalog.unrestrictedSearchResults(): print i.getURL()
With
unrestrictedSearchResults()
you need also a special way to get access to the objects
without triggering a security exception:
obj = brain._unrestrictedGetObject()
Bypassing language check¶
Note
All portal_catalog() queries are limited to the selected language of current user. You need to explicitly bypass the language check if you want to do multilingual queries.
Example of how to bypass language check:
all_content_brains = portal_catalog(Language="")
Some older LinguaPlone versions, which are still using
LanguageIndex
to keep language information in portal_catalog() may
require:
all_content_brains = portal_catalog(Language="all")
More information
Bypassing Expired content check¶
Plone and portal_catalog have a mechanism to list only active (non-expired) content by default.
Below is an example of how the expired content check is made:
mtool = context.portal_membership
show_inactive = mtool.checkPermission('Access inactive portal content', context)
contents = context.portal_catalog.queryCatalog(show_inactive=show_inactive)
See also:
* :doc:`Listing </content/listing>`
None as query parameter¶
Warning
Usually if you pass in None as the query value, it will match all the objects instead of zero objects.
Note
Querying for None values is possible with AdvancedQuery (see below).
Query by path¶
ExtendedPathIndex is the index used for content object paths. The path index stores the physical path of the objects.
- ** Warning: ** If you ever rename your Plone site instance, the path
- index needs to be completely rebuilt.
Example:
portal_catalog(path={ "query": "/myploneinstance/myfolder" }) # return myfolder and all child content
Searching for content within a folder¶
Use the 'path' argument to specify the physical path to the folder you want to search into.
By default, this will match objects into the specified folder and all existing sub-folders. To change this behaviour, pass a dictionary with the keys 'query' and 'depth' to the 'path' argument, where
- 'query' is the physical path, and
- 'depth' can be either 0, which will return only the brain for the path queried against, or some number greater, which will query all items down to that depth (eg, 1 means searching just inside the specified folder, or 2, which means searching inside the folder, and inside all child folders, etc).
The most common use case is listing the contents of an
existing folder, which we'll assume to be the
context
object in this example:
folder_path = '/'.join(context.getPhysicalPath())
results = catalog(path={'query': folder_path, 'depth': 1})
Query multiple values¶
KeywordIndex
index type indexes list of values. It is used e.g. by
Plone's categories (subject) feature and
object_provides
provided interfaces index.
You can either query
- a single value in the list
- many values in the list (all must present)
- any value in the list
The index of the catalog to query is either the name of the keyword argument, a key in a mapping, or an attribute of a record object.
Attributes of record objects
-
query
-- either a sequence of objects or a single value to be passed as query to the index (mandatory) -
operator
-- specifies the combination of search results when query is a sequence of values. (optional, default: 'or'). Allowed values: 'and', 'or'
Below is an example of matching any of multiple values gives as a Python list in KeywordIndex. It queries all event types and recurrence_days KeywordIndex must match any of given dates:
# Query all events on the site
# Note that there is no separate list for recurrent events
# so if you want to speed up you can hardcode
# recurrent event type list here.
matched_recurrence_events = self.context.portal_catalog(
portal_type=supported_event_types,
recurrence_days={
"query":recurrence_days_in_this_month,
"operator" : "or"
})
Querying by interface¶
Suppose you have several content types (for example, event
types like 'Birthday','Wedding','Graduation') in your
portal which implement the same interface (for example,
IIsCauseForCelebration
). Suppose you want to get items of these types from the
catalog by their interface. This is more exact than naming
the types explicitly (like
portal_type=['Birthday',
'Wedding',
'Graduation'
]
), because you don't really care what the types' names
really are: all you really care for is the interface.
This has the additional advantage that if products added or modified later add types which implement the interface, these new types will also show up in your query.
Import the interface:
from Products.MyProduct.interfaces import IIsCauseForCelebration
catalog(object_provides=IIsCauseForCelebration.__identifier__)
In a script, where you can't import the interface due to restricted Python, you might do this:
object_provides='Products.MyProduct.interfaces.IIsCauseForCelebration'
The advantage of using
.__identifier__
instead instead of a dotted name-string is that you will
get errors at startup time if the interface cannot be
found. This will catch typos and missing imports.
Caveats¶
-
object_provides
is a KeywordIndex which indexes absolute Python class names. A string matching is performed for the dotted name. Thus, you will have zero results for this:catalog(object_provides="Products.ATContentTypes.interface.IATDocument")
, because Products.ATContentTypes.interface imports everything from
document.py
. But this will work:catalog(object_provides="Products.ATContentTypes.interface.document.IATDocument") # products.atcontenttypes.document.iatdocument declares the interfacea
-
As with all catalog queries, if you pass an empty value for search parameter, it will return all results. so if the interface you defined would yield a none type object, the search would return all values of object_provides.
(Originally from this tutorial.)
Note
Looks like query by Products.CMFCore.interfaces._content.IFolderish does not seem to work in Plone 4.1 as this implementation information is not populated in portal_catalog.
Query by content type¶
To get all catalog brains of certain content type on the whole site:
campaign_brains = self.context.portal_catalog(portal_type="News Item")
To see available type names, visit in portal_types tool in ZMI.
Query published items¶
By default, the portal_catalog query does not care about the workflow state. You might want to limit the query to published items.
Example:
campaign_brains = self.context.portal_catalog(portal_type="News Item", review_state="published")
review_state is a portal_catalog index which reads portal_workflow variable "review_state". For more information, see what portal_workflow tool Content tab in ZMI contains.
Getting a random item¶
The following view snippet allows you to get one random item on the site:
import random
def getRandomCampaign(self):
"""
"""
campaign_brains = self.context.portal_catalog(portal_type="CampaignPage", review_state="published")
# Filter out the current item which we have
bad_ids = [ "you", "might", "want to black list some ids here" ]
items = [ brain for brain in campaign_brains if brain["getId"] not in bad_ids ]
# Check that we have items left after filtering
items = list(items)
if len(items) >= 1:
# Pick one
chosen = random.choice(items)
return chosen.getObject()
else:
# Fallback to the current content item if no random options available
return self.context
Querying FieldIndexes by Range¶
The following examples demonstrate how to do range based queries. This is useful if you want to find the "minimum" or "maximum" values of something, the example assumes that there is an index called 'getPrice'.
Get a value that is greater than or equal to 2:
items = portal_catalog({'getPrice':{'query':2,'range':'min'}})
Get a value that is less than or equal to 40:
items = portal_catalog({'getPrice':{'query':40,'range':'max'}})
Get a value that falls between 2 and 1000:
items = portal_catalog({'getPrice':{'query':[2,1000],'range':'min:max'}})
Querying by date¶
See DateIndex.
Example:
items = portal_catalog(effective_date = {'query':(DateTime('2002-05-08 15:16:17'),
DateTime('2062-05-08 15:16:17')),
'range': 'min:max'})
Note that
effectiveRange
may be a lot more efficient. This will return only objects
whose
effective_date
is in the past, ie. objects that are not unpublished:
items = portal_catalog(effectiveRange=DateTime())
Example 2 - how to get items one day old of FeedFeederItem content type:
# DateTime deltas are days as floating points
end = DateTime.DateTime() + 0.1 # If we have some clock skew peek a little to the future
start = DateTime.DateTime() - 1
date_range_query = { 'query':(start,end), 'range': 'min:max'}
items = portal_catalog.queryCatalog({"portal_type":"FeedFeederItem",
"created" : date_range_query,
"sort_on":"positive_ratings",
"sort_order":"reverse",
"sort_limit":count,
"review_state":"published"})
Example 3: how to get news items for a particular year in the template code
<div metal:fill-slot="main" id="content-news"
tal:define="boundLanguages here/portal_languages/getLanguageBindings;
prefLang python:boundLanguages[0];
DateTime python:modules['DateTime'].DateTime;
start_year request/year| python: 2004;
end_year request/year| python: 2099;
start_year python: int(start_year);
end_year python: int(end_year);
results python:container.portal_catalog(
portal_type='News Item',
sort_on='Date',
sort_order='reverse',
review_state='published',
id=prefLang,
created={ 'query' : [DateTime(start_year,1,1), DateTime(end_year,12,31)], 'range':'minmax'}
);
results python:[r for r in results if r.getObject()];
Batch python:modules['Products.CMFPlone'].Batch;
b_start python:request.get('b_start',0);
portal_discussion nocall:here/portal_discussion;
isDiscussionAllowedFor nocall:portal_discussion/isDiscussionAllowedFor;
getDiscussionFor nocall:portal_discussion/getDiscussionFor;
home_url python: mtool.getHomeUrl;
localized_time python: modules['Products.CMFPlone.PloneUtilities'].localized_time;">
...
</div>
Example 4 - how to get upcoming events of next two months:
def formatDate(self, event):
"""
"""
dt = event["start"]
return dt.strftime("%d.%m.%Y")
def update(self):
portal_catalog = self.context.portal_catalog
start = DateTime.DateTime() - 1 # yesterday
end = DateTime.DateTime() + 60 # Two months future
date_range_query = {'query': (start, end), 'range': 'min:max'}
count = 5
self.events = portal_catalog.queryCatalog({"portal_type": "Event",
"start": date_range_query,
"sort_on": "start",
"sort_order": "reverse",
"sort_limit": count,
"review_state": "published"})
More info
Query by language¶
You can query by language:
portal_catalog({"Language":"en"})
Note
Products.LinguaPlone must be installed.
Boolean queries (AdvancedQuery)¶
AdvancedQuery is an add-on product for Zope's ZCatalog providing queries using boolean logic. AdvancedQuery is developer level product, providing Python interface for constructing boolean queries.
AdvancedQuery monkey-patches
portal_catalog
to provide new method
portal_catalog.evalAdvancedQuery()
.
Example:
from Products import AdvancedQuery
portal_catalog = self.portal_catalog # Acquire portal_catalog from higher hierarchy level
path = self.getPhysicalPath() # Limit the search to the current folder and its children
# object.getPhysicalPath() returns the path as tuples of path parts
# Convert path to string
path = "/".join(path)
# Limit search to path in the current contex object and
# match all children implementing either of two interfaces
# AdvancedQuery operations can be combined using Python expressions & | and ~
# or AdvancedQuery objects
query = AdvancedQuery.Eq("path", path) & (AdvancedQuery.Eq("getMyIndexGetter1", "foo") | AdvancedQuery.Eq("getMyIndexGetter2", "bar"))
# The following result variable contains iterable of CatalogBrain objects
results = portal_catalog.evalAdvancedQuery(query)
# Convert the catalog brains to a Python list containing tuples of object unique ID and Title
pairs = []
for nc in results:
pairs.append((nc["UID"], nc["Title"]))
# query = Eq("path", diagnose_path) & Eq("SearchableText", text_query_target)
query = Eq("path", diagnose_path) & Eq("SearchableText", text_query_target)
return self.context.portal_catalog.evalAdvancedQuery(query)
Note
Plone 3 ships with AdvancedQuery but it is not part of Plone. Always declare AdvancedQuery dependency in your egg's setup.py install_requires.
Warning
AdvancedQuery does not necessarily apply the same automatic limitations which normal portal_catalog() queries do, like language and expiration date. Always check your query code against these limitations.
More information
Setting Up A New Style Query¶
With Plone 4.2, collections use so-called new-style queries by default. These are, technically speaking, canned queries, and they appear to have the following advantages over old-style collection's criteria:
- They are not complicated sub-objects of collections, but comparably simple subobjects that can be set using simple Python expressions.
- These queries are apparently much faster to execute, as well as
- much easier to understand, and
- content-type agnostic in the sense that they are no longer tied to ArcheTypes.
The easiest way to get into these queries is to grab a debug shell alongside an instance, then fire up a browser pointing to that instance, then manipulate the queries and watch the changes on the debug shell, if you want to experiment. I've constructed a dummy collection for demonstration purposes, named testquery. I've formatted the output a little, for readability.
Discovering the query:
>>> site.invokeFactory('Collection', id='testquery') # actually with my browser
>>> tq = site['testquery']
>>> tq.getRawQuery()
[
{'i': 'created', 'o': 'plone.app.querystring.operation.date.today'},
{'i': 'Description', 'o': 'plone.app.querystring.operation.string.contains', 'v': 'my querystring'},
{'i': 'portal_type', 'o': 'plone.app.querystring.operation.selection.is', 'v': ['Document']},
{'i': 'Subject', 'o': 'plone.app.querystring.operation.selection.is', 'v': ['some_tag']}
]
>>> tq.getSort_on()
'effective'
>>> tq.getSort_reversed()
True
>>> tq.getLimit()
1000
>>> tq.selectedViewFields()
[
('Title', u'Title'),
('Creator', 'Creator'),
('Type', u'Item Type'),
('ModificationDate', u'Modification Date'),
('ExpirationDate', u'Expiration Date'),
('getId', u'Short Name'),
('getObjSize', u'Size')
]
This output should be pretty self-explaining: This query finds objects that were created today, which have "my querystring" in their description, are of type "Document" (ie, "Page"), and have "some_tag" in their tag set (you'll find that under "Classification"). Also, the results are being sorted in reverse order of the Effective Date (ie, the publishing date). We're getting at most 1000 results, which is the default cut-off.
You can set the query expression (individual parts are evaluated as logical AND) using
>>> tq.setQuery( your query expression, see above )
The three parts of an individual query term are
- 'i': which index to query
- 'o': which operator to use (see plone.app.querystring for a list)
- 'v': the possible value of an argument to said operator - eg. the query string.
Other parameters can be manipulated the same way:
>>> tq.setSort_reversed(True)
Accessing metadata¶
Metadata is collected from the object during cataloging and is copied to brain object for faster access (no need to wake up the actual object from the database).
ZCatalog brain objects use Python dictionary-like API to access metadata. Below is a fail-safe example for a metadata access:
def getImageTag(self, brain):
"""
Get lead image for ZCatalog brain in folder listing.
(Based on collective.contentleadimage add-on product)
@param brain: Products.ZCatalog.Catalog.mybrains object
@return: HTML source code for content lead <img>
"""
# First check if the index exist
if not brain.has_key("hasContentLeadImage"):
return None
# Index can have indexed value None or
# custom value Missing.Value if the indexer
# for brain's object failed to run or returned Missing.
# Both of these values evaluate to False in Python
has_image = brain["hasContentLeadImage"]
# The value was missing, None or False
if not has_image:
return None
context = brain.getObject()
# AT inspection API
field = context.getField(IMAGE_FIELD_NAME)
if not field:
return None
# ImageField.tag() API
if field.get_size(context) != 0:
scale = "tile" # 64x64
return field.tag(context, scale=scale)
Note
This is for example purposes only - the code above is working, but not optimal, and can be written up without waking up the object.
Unique values¶
ZCatalog has uniqueValuesFor() method to retrieve all unique values for a certain index. It is intended to work on FieldIndexes only.
Example:
# getArea() is Archetype accessor for area field
# which is a string and tells the content area.
# Custom getArea FieldIndex indexes these values
# to portal catalog.
# The following line gives all area values
# inputted on the site.
areas = portal_catalog.uniqueValuesFor("getArea")
Performance¶
The following community mailing list blog posts is very insightful about the performance characteristics of Plone search and indexing:
Batching¶
Todo
Complete writeup
Example:
results = Batch(contents, self.b_size, self.b_start, orphan=0)
- orphan - the next page will be combined with the current page if it does not contain more than orphan elements
Walking through all content¶
portal_catalog()
call without search parameters will return all indexed
site objects.
Here is an example how to crawl through Plone content to search HTML snippets. This can be done by rendering every content object and check whether certain substrings exists the output HTML This snippet can be executed through-the-web in Zope Management Interface.
This kind of scripting is especially useful if you need to find old links or migrate some text / HTML snippets in the content itself. There might be artifacts which only appear on the resulting pages (portlets, footer texts, etc.) and thus they are invisible to the normal full text search.
Example:
# Find arbitrary HTML snippets on Plone content pages
# Collect script output as text/html, so that you can
# call this script conveniently by just typing its URL to a web browser
buffer = ""
# We need to walk through all the content, as the
# links might not be indexed in any search catalog
for brain in context.portal_catalog(): # This queries cataloged brain of every content object
try:
obj = brain.getObject()
# Call to the content object will render its default view and return it as text
# Note: this will be slow - it equals to load every page from your Plone site
rendered = obj()
if "yourtextmatch" in rendered:
# found old link in the rendered output
buffer += "Found old links on <a href='%s'>%s</a><br>\n" % (obj.absolute_url(), obj.Title())
except:
pass # Something may fail here if the content object is broken
return buffer
More info: