Before explaining some examples, it’s not too bad if you know what you are doing. Here are some important aspects about how storeBackup works: (The following explains the principle mechanisms, for performance reasons it’s implemented a little bit different. There are several waiting queues, parallelisms and a tiny scheduler inside which are not described here.)

storeBackup uses at least two internal flat files in each generated backup:

.md5CheckSums.info— general information about the backup
.md5CheckSums[.bz2]— information about every file (dir, etc.) saved

When starting storeBackup.pl, it will basically do (beside some other things):

  1. read the contents of the previous .md5CheckSums[.bz2] file and store it in two dbm databases:
    dbm(md5sum) and dbm(filename) (dbm(md5sum) means, that md5sum is the key). Default is to store these databases in memory.
  2. read the contents of other .md5CheckSums[.bz2] files (otherBackupDirs) and store it to dbm(md5sum). Always store the last copied file in the dbm file if two different files (e.g. from different backup series) are identical. This assures, that multiple versions of the same file in different backups are unified in future backups.
  • This item describes how storeBackup.pl works without sharing files from another backup series (simple backup).
    In a loop over all files to backup it will do:

    1. look into dbm(filename) — which contains all files from the previous backup — if the exact same file exists and has not changed. In this case, the needed information are the values of dbm(filename).
      If it existed in the previous backup(s), make a hard link
    2. calculate the md5 sum of the file to backup look into dbm(md5sum) for that md5 sum
      if it exists there, make a hard link
      if it doesn’t exist, copy or compress the file
    3. write the information of the new file to the corresponding .md5CheckSums[.bz2] file
  • This item describes how storeBackup works with sharing of files from another backup series.
    In a loop over all files to backup it will do:

    1. look into dbm(filename) — which contains all files from the previous backup — if the exact same file exists and has not changed. In this case, the needed information are the values of dbm(filename).
      (Now, because there are independent backups, it is possible, that a file with the same contents exists in another backup series. So storeBackup.pl has to look into the dbm(md5sum) to ensure linking to the same file from all different backup series.)
    2. calculate the md5 sum of the file to backup if not known from step 1
      look into dbm(md5sum) for that md5 sum
      if it exists there, make a hard link
      if it doesn’t exist, copy or compress the file
    3. write the information of the new file to the corresponding .md5CheckSums[.bz2] file
  • This item describes the usage of Option lateLinks.
    If you save your backup via NFS to a server, then most of the time will be spent for setting hard links. Setting a hard link is very fast, but if you have many thousands of them it takes some time. You can avoid waiting for hard linking if you use the option lateLinks:

    1. make a backup with storeBackup and set --lateLinks (or set lateLinks = yes) in the configuration file. Then storeBackup will not generate any hard links, only a file will be written with the information what has to be linked.
    2. In a separate step, call storeBackupUpdateBackup to set all the required hard links to make full backups out of these incomplete backups. Please also see section using option lateLinks for a more detailed explanation.

Conclusions:

  1. Do not delete a backup to which the hard links are not yet generated. Use storeBackupUpdateBackup.pl to set the hard links and check consistency. It’s a good idea to only use storeBackup.pl or storeBackupDel.pl for the deletion of old backups.
  2. All sharing of data in the backups is done via hard links. This means:
    • A backup series cannot be split over different partitions.
    • If you want to share data between different backup series, all backups must reside on the same partition.
  3. Every information of a backup in the .md5CheckSums is stored with relative paths. It does not matter if you change the absolute path to the backup or backup with a different machine (server makes backup from client via NFS — client makes backup to server via NFS).
    Unresolved hard links to to other backup series (via option lateLinks) are also stored with relative paths. This means: You can move backupDir around as you like, but you should never change the relative paths between backup series before resolving all the links with storeBackupUpdateBackup.pl.

If you have additional ideas or any questions, feel free to contact me (hjclaes(at)web.de).

It is a good idea to use a configuration file instead of command line options. Simply call:

# storeBackup.pl --generate <configFile>

Edit the configuration file and call storeBackup in the following way:

# storeBackup.pl -f <configFile>

You can override settings in the configuration file on the command line