Caching proxies

It is common to place a so-called caching reverse proxy in front of Zope when hosting large Plone sites. On Unix, a popular option is Varnish, although Squid is also a good choice. On Windows, you can use Squid or the (commercial, but better) Enfold Proxy.

It is important to realise that whilst plone.app.caching provides some functionality for controlling how Plone interacts with a caching proxy, the proxy itself must be configured separately.

Some operations in plone.app.caching can set response headers that instruct the caching proxy how best to cache content. For example, it is normally a good idea to cache static resources (such as images and stylesheets) and "downloadables" (such as Plone content of the types File or Image) in the proxy. This content will then be served to most users straight from the proxy, which is much faster than Zope.

The downside of this approach is that an old version of a content item may returned to a user, because the cache has not been updated since the item was modified. There are three general strategies for dealing with this:

  • Since resources are cached in the proxy based on their URL, you can "invalidate" the cached copy by changing an item's URL when it is updated. This is the approach taken by Plone's ResourceRegistries (portal_css, portal_javascript & co): in production mode, the links that are inserted into Plone's content pages for resource managed by ResourceRegistries contain a time-based token, which changes when the ResourceRegistries are updated. This approach has the benefit of also being able to "invalidate" content stored in a user's browser cache.
  • All caching proxies support setting timeouts. This means that content may be stale, but typically only up to a few minutes. This is sometimes an acceptable policy for high-volume sites where most users do not log in.
  • Most caching proxies support receiving PURGE requests for paths that should be purged. For example, if the proxy has cached a resource at /logo.jpg, and that object is modified, a PURGE request could be sent to the proxy (originating from Zope, not the client) with the same path to force the proxy to fetch a new version the next time the item is requested.

The final option, of course is to avoid caching content in the proxy altogether. The default policies will not allow standard content pages to be cached in the proxy, because it is too difficult to invalidate cached instances. For example, if you change a content item's title, that may require invalidation of a number of pages where that title appears in the navigation tree, folder listings, Collections, portlets, and so on. Tracking all these dependencies and purging in an efficient manner is impossible unless the caching proxy configuration is highly customised for the site.

Purging a caching proxy

Synchronous and asynchronous purging is enabled via plone.cachepurging. In the control panel, you can configure the use of a proxy via various options, such as:

  • Whether or not to enable purging globally.
  • The address of the caching server to which PURGE requests should be sent.
  • Whether or not virtual host rewriting takes place before the caching proxy receives a URL or not. This has implications for how the PURGE path is constructed.
  • Any domain aliases for your site, to enable correct purging of content served via e.g. http://example.com and http://www.example.com.

The default purging policy is geared mainly towards purging file and image resources, not content pages, although basic purging of content pages is included. The actual paths to purge are constructed from a number of components providing the IPurgePaths interface. See plone.cachepurging for details on how this works, especially if you need to write your own.

The default purge paths include:

  • ${object_path}, -- the object's canonical path
  • ${object_path}/ -- in case the object is a folder
  • ${object_path}/view -- the view method alias
  • ${object_path}/${default-view} -- in case a default view template is used
  • The download URLs for any Archetypes object fields, in the case of Archetypes content. This includes support for the standard File and Image types.

Files and images created (or customised) in the ZMI are purged automatically when modified. Files managed through the ResourceRegistries do not need purging, since they have "stable" URLs. To purge Plone content when modified (or removed), you must select the content types in the control panel. By default, only the File and Image types are purged.

You should not enable purging for types that are not likely to be cached in the proxy. Although purging happens asynchronously at the end of the request, it may still place unnecessary load on your server.

Finally, you can use the Purge tab in the control panel to manually purge one or more URLs. This is a useful way to debug cache purging, as well as a quick solution for the awkward situation where your boss walks in and wonders why the "about us" page is still showing that old picture of him, before he had a new haircut.

Installing and configuring a caching proxy

The plone.app.caching package includes some example buildout configurations in the proxy-configs directory. Two versions are included: one demonstrating a Squid-behind-Apache proxy setup and another demonstrating a Varnish-behind-Apache proxy setup. Both examples also demonstrate how to properly configure split-view caching.

These configurations are provided for instructional purposes but with a little modification they can also be used in production. To use in a real production instance, you will need to adjust the configuration to match your setup. For a simple standard setup, you might only need to change the hostname value in the buildout.cfg. Read the README.txt files in each example for more instructions.

There are also some alternative buildout recipes for building and configuring proxy configs: plone.recipe.squid and plone.recipe.varnish. The examples in this package do not use these recipes in favor of using a more explicit, and hopefully more educational, template-based approach. Even if you decide to use one of the automated recipes, it will probably be worth your while to study the examples included in this package to get a few pointers.

Running Plone behind Apache 2.2 with mod_cache

Apache 2.2 has a known bug around its handling of the HTTP response header CacheControl with value max-age=0 or headers Expires with a date in the past. In these scenarios mod_cache will not cache the response no matter what value of s-maxage is set.

https://issues.apache.org/bugzilla/show_bug.cgi?id=35247

One possible workaround for this is to use mod_headers directives in your Apache configuration to set max-age=1 if s-maxage is positive and max-age is 0 and also to drop the Expires header

Header edit Cache-Control max-age=0(.*s-maxage=[1-9].*) max-age=1$1 Header unset Expires

Dropping the Expires header has the disadvantage that HTTP 1.0 clients and proxies may not cache your responses as you wish.