Created Async Backup (markdown)

Chris Lu 2021-02-28 18:51:53 -08:00
parent 0a7edf8688
commit 335230f4f3

82
Async-Backup.md Normal file

@ -0,0 +1,82 @@
Cloud storage options, such as Amazon S3, Google Cloud Storage, Azure, Backblaze B2, etc, are ideal for backup purpose.
For example, for Amazon S3, the upload is free. You only pay for the storage.
So you have the benefit of:
* Extremely fast access to local SeaweedFS Filer
* Near-Real-Time Backup to Amazon S3 with zero-cost upload network traffic.
Of course, you can also backup to local disks on another machine.
# Architecture
```
Filer --> Metadata Change Logs --> `weed filer.backup` --> AWS S3
|
+-----> GCP
|
+-----> Azure
|
+-----> Backblaze B2
|
+-----> Local Disk
```
All file meta data changes in Filer are saved in the logs and can be subscribed. See [[Filer Change Data Capture]].
A "weed filer.backup" process will subscribe to this topic, and then read the actual file content, and send the update to the cloud sink or local disk sinks.
* Sinks can be: AWS S3, Google Cloud Storage, Microsoft Azure, Backblaze B2, or Local Disk.
# Configuration
This command replaced the previous `weed filer.replicate`, which requires an external message queue.
But for configuration, use the same `weed scaffold -config=replication` to generate a `replication.toml` file. Just need to keep the linkes of the sinks that you want to use.
```
[sink.s3]
# read credentials doc at https://docs.aws.amazon.com/sdk-for-go/v1/developer-guide/sessions.html
# default loads credentials from the shared credentials file (~/.aws/credentials).
enabled = false
aws_access_key_id = "" # if empty, loads from the shared credentials file (~/.aws/credentials).
aws_secret_access_key = "" # if empty, loads from the shared credentials file (~/.aws/credentials).
region = "us-east-2"
bucket = "backupbucket" # an existing bucket
directory = "/" # destination directory
endpoint = "http://localhost:8334"
is_incremental = false
```
# Running Backup
1. Make sure the `replication.toml` is in place.
1. Start the backup by running `weed filer.backup`.
# Incremental Mode
If `is_incremental = true`, all the files are backed up under the `YYYY-MM-DD` directories, which the timestamps are based on modified time.
So
* Each date directory contains all new and updated files.
* The deleted files in the source filer will not be deleted on the backup.
So if in this folder, on `2021-03-01`, these files are created in the source:
```
/dir1/file1
/dir1/file2
/dir1/file3
```
and on `2021-03-02`, these files are created, modified, deleted in the source:
```
/dir1/file1 // modified
/dir1/file2 // not changed
/dir1/file3 // deleted
/dir1/file4 // created
```
The backup destination will have the following directory structure.
```
/2021-03-01/dir1/file1
/2021-03-01/dir1/file2
/2021-03-01/dir1/file3
/2021-03-02/dir1/file1
/2021-03-02/dir1/file4
```