Title: Backup software: borg vs restic
       Author: Solène
       Date: 21 May 2021
       Tags: backup openbsd unix
       Description: 
       
       # Introduction
       
       Backups are important, lot of our life is now related to digital data
       and it's important to take care of them because computers are
       unreliable, can be stolen and mistakes happen.  I really like two
       programs which are restic and borg, they have nearly the same features
       but it's hard to decide between both, this is an attempt to understand
       the differences for my use case.
       
       # Restic
       
       Restic is a backup software written in Go with a "push" workflow, it
       supports data deduplication within a repository and multiple systems
       using the same repository and also encryption.
       
       Restic can backup to a remote sftp server but also many network
       services storage like S3/Minio and even more when using with the
       program rclone (which can turn any supported backend into a compatible
       restic backend).  Restic seems compatible with Windows (I didn't try).
       
  HTML restic website
       
       # Borg
       
       Borg is a backup software written in Python with a "push" workflow, it
       supports encryption, data deduplication within a repository and
       compression.  You can backup to a remote server using ssh but the
       remote server requires borg to be installed.
       
       It's a very good and reliable backup software.  It has a companion app
       named "borgmatic" to automate the backup process and snapshots
       managements (daily/hourly/monthly ... and integrity checking).
       
       *BSD specific note: borg can honor the "nodump" flag in the filesystem
       to skip saving those files.
       
  HTML borgbackup website
  HTML borgmatic website
       
       # Experiment
       
       I've been making a backup of my /home/ partition (minus some
       directories that has been excluded in both cases) using borg and
       restic.  I always performed the restic backup and then the borg backup,
       measuring bandwidth for each and execution time for each.
       
       There are five steps: init for the first backup of lot of data, little
       changes twice, which is basically opening firefox, browsing a few
       pages, closing it, refreshing my emails in claws-mail (this changes a
       lot of small files) and use the computer for an hour.  There is a
       massive change as fourth step, I found a few game installers that I
       unzipped, producing lot of small files instead of one big file and
       finally, 24h of normal use between the fourth and last step which is a
       good representation of a daily backup.
       
       
       ## Data
       
       ```
                                       restic        borg
       Data transmitted (MB)
       ---------------------
       Backup 1 (init)                        62860        53730
       Backup 2 (little changes)        15        26
       Backup 3 (little changes)        168        171
       Backup 4 (massive changes)        4820        3910
       Backup 5 (typical day of use)        66        44
                       
       Local cache size (MB)
       ---------------------
       Backup 1 (init)                        161        45
       Backup 2 (little changes)        163        45
       Backup 3 (little changes)        207        46
       Backup 4 (massive changes)        211        47
       Backup 5 (typical day of use)        216        47
                       
       Backup time (seconds)
       ---------------------
       Backup 1 (init)                        2139        2999
       Backup 2 (little changes)        38        131
       Backup 3 (little changes)        43        114
       Backup 4 (massive changes)        201        355
       Backup 5 (typical day of use)        50        110
       
       Repository size (GB)                65        56
       ```
       
       ## Analysis
       
       Borg was a lot slower than restic but in my experiment the remote ssh
       server is a dual core atom system, borg is using a process on the other
       end to manage the data, so maybe that CPU was slowing the backup
       process. Nevertheless, in my real use case, borg is effectively slower.
       
       Most of the time, borg was more bandwidth effective than restic: it
       saved 15% of bandwidth for the first backup and 18% after some big
       changes, but in some cases it used a bit more bandwidth.  I have no
       explanation for this, I guess it depends how file chunks are
       calculated, if a big database file is changing then one may be able to
       save only the difference and not the whole file.  Borg is also
       compressing the data (using lz4 by default), this may explain the
       bandwidth saving that doesn't work for binary data.
       
       The local cache (typically in /root/.cache/) was a lot bigger for
       restic than for borg, and was increasing slightly at each new backup
       while borg cache never changed much.
       
       Finally, the whole repo size holding all the snapshots has a different
       size for restic and borg, respectively 65 GB and 56 GB, which makes a
       14% difference between each which may due to the compression done by
       borg.
       
       
       # Other backup software
       
       I tested Restic and Borg because they are both good software using the
       "push" workflow (local computer sends the data) making full snapshots
       of every backup, but there are many other backup solution available.
       
       - duplicity: fully scriptable, works over many remote protocols but
       requires a full snapshot and then incremental snapshots to work, when
       you need to make a new full snapshot it will take a lot of space which
       is not always convenient.  Supports GPG encrypted backup stored over
       FTP, this is useful for some dedicated server offering 100GB of free
       FTP.
       - burp: not very well known, the setup uses TLS certificates for
       encryption, requires a burp server and a burp client
       - rsnapshot: based on rsync, automate the rotation of backups, use hard
       links to avoid data duplication for files that didn't change between
       two backups, it pulls data from servers from a central backup system.
       - backuppc: a perl app that will pull data from servers to its
       repository, not really easy to use
       - bacula: enterprise grade solution that I never got to work because
       it's really complicated but can support many things, even saving on
       tapes
       
       # Conclusion
       
       In this benchmark, borg is clearly slower but was the most storage and
       bandwidth efficient.  On the other hand, restic is easier to deploy
       (static binary) and supports a simple sftp server while borg requires
       borg installed on both sides.
       
       A biggest difference between restic and borg, is that restic supports
       multiples systems backup in the same repository, allowing a massive
       data deduplication gain across machines, while a borg repository is for
       single system (it could work with multiples systems but they should not
       backup at the same time and they would have to rebuild the local cache
       every time which is slow).
       
       I'll stick with borg because the backup time isn't a real issue given
       it's not dramatically slower than restic and that I really enjoy using
       borgmatic to automatically manage the backups.
       
       For doing backups to a remote server over the Internet, the bandwidth
       efficiency would be my main concern of all the differences, borg seems
       a clear winner here.