dataswamp.org/1/~solene/article-secure-backups-with-s3

  URI:

       Title: Securing backups using S3 storage
       Author: Solène
       Date: 19 October 2024
       Tags: security network backup
       Description: In this guide, you will learn how S3 storage can help you
       securing your backups
       
       # Introduction
       
       In this blog post, you will learn how to make secure backups using
       Restic and a S3 compatible object storage.
       
       Backups are incredibly important, you may lose important files that
       only existed on your computer, you may lose access to some encrypted
       accounts or drives, when you need backups, you need them to be reliable
       and secure.
       
       There are two methods to handle backups:
       
       * pull backups: a central server connects to the system and pulls data
       to store it locally, this is how rsnapshot, backuppc or bacula work
       * push backups: each system run the backup software locally to store it
       on the backup repository (either locally or remotely), this is how most
       backups tool work
       
       Both workflows have pros and cons.  The pull backups are not encrypted,
       and a single central server owns everything, this is rather bad from a
       security point of view.  While push backups handle all encryption and
       accesses to the system where it runs, an attacker could destroy the
       backup using the backup tool.
       
       I will explain how to leverage S3 features to protect your backups from
       an attacker.
       
       # Quick intro to object storage
       
       S3 is the name of an AWS service used for Object Storage.  Basically,
       it is a huge key-value store in which you can put data and retrieve it,
       there are very little metadata associated with an object.  Objects are
       all stored in a "bucket", they have a path, and you can organize the
       bucket with directories and subdirectories.
       
       Buckets can be encrypted, which is an important feature if you do not
       want your S3 provider to be able to access your data, however most
       backup tools already encrypt their repository, so it is not really
       useful to add encryption to the bucket.  I will not explain how to use
       encryption in the bucket in this guide, although you can enable it if
       you want.  Using encryption requires more secrets to store outside of
       the backup system if you want to restore, and it does not provide real
       benefits because the repository is already encrypted.
       
       S3 was designed to be highly efficient for retrieving / storage data,
       but it is not a competitor to POSIX file systems.  A bucket can be
       public or private, you can host your website in a public bucket (and it
       is rather common!).  A bucket has permissions associated to it, you
       certainly do not want to allow random people to put files in your
       public bucket (or list the files), but you need to be able to do so.
       
       The protocol designed around S3 was reused for what we call
       "S3-compatible" services on which you can directly plug any
       "S3-compatible" client, so you are not stuck with AWS.
       
       This blog post exists because I wanted to share a cool S3 feature (not
       really S3 specific, but almost everyone implemented this feature) that
       goes well with backups: a bucket can be versioned.  So, every change
       happening on a bucket can be reverted.  Now, think about an attacker
       escalating to root privileges, they can access the backup repository
       and delete all the files there, then destroy the server.  With a backup
       on a versioned S3 storage, you could revert your bucket just before the
       deletion happened and recover your backup.  In order to prevent this,
       the attacker should also get access to the S3 storage credentials,
       which is different from the credentials required to use the bucket.
       
       Finally, restic supports S3 as a backend, and this is what we want.
       
       ## Open source S3-compatible storage implementations
       
       There is a list of open source and free S3-compatible storage, I played
       with them all, and they have different goals and purposes, they all
       worked well enough for me:
       
  HTML Seaweedfs GitHub project page
  HTML Garage official project page
  HTML Minio official project page
       
       A quick note about those:
       
       * I consider seaweedfs to be the Swiss army knife of storage, you can
       mix multiple storage backends and expose them over different protocols
       (like S3, HTTP, WebDAV), it can also replicate data over remote
       instances.  You can do tiering (based on last access time or speed) as
       well.
       * Garage is a relatively new project, it is quite bare bone in terms of
       features, but it works fine and support high availability with multiple
       instances, it only offers S3.
       * Minio is the big player, it has a paid version (which is extremely
       expensive) although the free version should be good enough for most
       users.
       
       # Configure your S3
       
       You need to pick a S3 provider, you can self-host it or use a paid
       service, it is up to you.  I like backblaze as it is super cheap, with
       $6/TB/month, but I also have a local minio instance for some needs.
       
       Create a bucket, enable the versioning on it and define the data
       retention, for the current scenario I think a few days is enough.
       
       Create an application key for your restic client with the following
       permissions: "GetObject", "PutObject", "DeleteObject",
       "GetBucketLocation", "ListBucket", the names can change, but it needs
       to be able to put/delete/list data in the bucket (and only this
       bucket!).  After this process done, you will get a pair of values: an
       identifier and a secret key
       
       Now, you will have to provide the following environment variables to
       restic when it runs:
       
       * `AWS_DEFAULT_REGION` which contains the region of the S3 storage,
       this information is given when you configure the bucket.
       * `AWS_ACCESS_KEY` which contains the access key generated when you
       created the application key.
       * `AWS_SECRET_ACCESS_KEY` which contains the secret key generated when
       you created the application key.
       * `RESTIC_REPOSITORY` which will look like
       `s3:https://$ENDPOINT/$BUCKET` with $ENDPOINT being the bucket endpoint
       address and $BUCKET the bucket name.
       * `RESTIC_PASSWORD` which contains your backup repository passphrase to
       encrypt it, make sure to write it down somewhere else because you need
       it to recover the backup.
       
       If you want a simple script to backup some directories, and remove old
       data after a retention of 5 hourly, 2 daily, 2 weekly and 2 monthly
       backups:
       
       ```
       restic backup -x /home /etc /root /var
       restic forget --prune -H 5 -d 2 -w 2 -m 2
       ```
       
       Do not forget to run `restic init` the first time, to initialize the
       restic repository.
       
       # Conclusion
       
       I really like this backup system as it is cheap, very efficient and
       provides a fallback in case of a problem with the repository (mistakes
       happen, there is not always need for an attacker to lose data ^_^').
       
       If you do not want to use S3 backends, you need to know Borg backup and
       Restic both support an "append-only" method, which prevents an attacker
       from doing damages or even read the backup, but I always found the use
       to be hard, and you need to have another system to do the prune/cleanup
       on a regular basis.
       
       # Going further
       
       This approach could work on any backend supporting snapshots, like
       BTRFS or ZFS.  If you can recover the backup repository to a previous
       point in time, you will be able to access to the working backup
       repository.
       
       You could also do a backup of the backup repository, on the backend
       side, but you would waste a lot of disk space.