Backing up your Plone deployment

Description

Strategies for backing up operating Plone installations.

A guide to determining what to back up and how to back it up and restore it safely.

Introduction

The key rules of backing up a working system are probably:

  • Back up everything
  • Maintain multiple generations of backup
  • Test restoring your backups

Warning

This guide assumes that you are already doing this for your system as a whole, and will only cover the considerations specific to Plone. When we say we are assuming you're already doing this for the system as a whole, what we mean is that your system backup mechanisms - rsync, bakula, whatever - are already backing up the directories into which you've installed Plone.

So, your buildout and buildout caches are already backed up, and you've tested the restore process. So, your remaining consideration is making sure that Plone's database files are adequately backed up and recoverable.

Objects in motion

Objects in motion tend to remain in motion. Objects that are in motion are difficult or impossible to back up accurately.

Translation: Plone is a long-lived process that is constantly changing its content database. The largest of these files, the Data.fs filestorage which contains everything except Binary Large OBjects (BLOBs), is always open for writing. The BLOB storage, a potentially very complex file hierarchy, is constantly changing and must be referentially synchronized to the filestorage.

This means that most system backup schemes are incapable of making useful backups of the content database while it's in use. We assume you don't want to stop your Plone site just to backup, so you need to add procedures to make sure you have useful backups of Plone's data. (We assume that you know that the same thing is true of your relational database storage. If not, get to studying!)

Where's my data?

Your Plone instance installation will contain a ./var directory (in the same directory as buildout.cfg) that contains the frequently changing data files for the instance. Much of what's in ./var, though, is not your actual content database. Rather, it's log, process id, and socket files.

The directories that actually contain content data are:

./var/filestorage

This is where Zope Object Database filestorage is maintained. Unless you've multiple storages or have changed the name, the key file is Data.fs. It's typically a large file and contains everything except BLOBS.

The other files in filestorage, with extensions like .index, .lock, .old, .tmp are ephemeral, and will be recreated by Zope if they're absent.

./var/blobstorage

This directory contains a very deeply nested directory hierarchy that, in turn, contains the BLOBs of your database: PDFs, image files, office automation files and such.

The key thing to know about filestorage and blobstorage is that they are maintained synchronously. The filestorage has references to BLOBs in the blobstorage. If the two storages are not perfectly synchronized, you'll get errors.

collective.recipe.backup

collective.recipe.backup is a well-maintained and well-supported recipe for solving the "objects in motion" problem for a live Plone database. It makes it easy to both back up and restore the object database. The recipe is basically a sophisticated wrapper around repozo, a Zope database backup tool, and rsync, the common file synchronization tool.

Note

Big thanks to Reinout van Rees, Maurits van Rees and community helpers for creating and maintaining collective.recipe.backup. We all owe them drinks of their choice.

If you're using any of Plone's installation kits, collective.recipe.backup is included in your install. If not, you may add it to your buildout by adding a backup part:

[buildout]
parts =
    ...
    backup
    ...

[backup]
recipe = collective.recipe.backup

There are several useful option settings for the recipe, all set by adding configuration information. All are documented on the PyPI page. Perhaps the most useful is the location option, which sets the destination for backup files:

[backup]
recipe = collective.recipe.backup
location = /path/to/reliably/attached/storage/filestorage
blobbackuplocation =  /path/to/reliably/attached/storage/blobstorage

If this is unspecified, the backup destination is the buildout var directory. The backup destination, though, may be any reliably attached location - including another partition, drive or network storage.

Operation

Once you've run buildout, you'll have bin/backup and bin/restore scripts in your buildout. Since all options are set via buildout, there are few command-line options, and operation is generally as simple as using the bare commands. bin/restore will accept a date-time argument if you're keeping multiple backups. See the docs for details.

Backup operations may be run without stopping Plone. Restore operations require that you stop Plone, then restart after the restore is complete.

bin/backup is commonly included in a cron table for regular operation. Make sure you test backup/restore before relying on it.

Incremental backups

collective.recipe.backup offers both incremental and full backup and will maintain multiple generations of backups. Tune these to meet your needs.

When incremental backup is enabled, doing a database packing operation will automatically cause the next backup to be a full backup.

If your backup continuity needs are extreme, your incremental backup may be equally extreme. There are Plone installations where incremental backups are run every few minutes.