From 335230f4f343724d6a5816c2e8489450960289bd Mon Sep 17 00:00:00 2001 From: Chris Lu Date: Sun, 28 Feb 2021 18:51:53 -0800 Subject: [PATCH] Created Async Backup (markdown) --- Async-Backup.md | 82 +++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 82 insertions(+) create mode 100644 Async-Backup.md diff --git a/Async-Backup.md b/Async-Backup.md new file mode 100644 index 0000000..4394eb9 --- /dev/null +++ b/Async-Backup.md @@ -0,0 +1,82 @@ +Cloud storage options, such as Amazon S3, Google Cloud Storage, Azure, Backblaze B2, etc, are ideal for backup purpose. + +For example, for Amazon S3, the upload is free. You only pay for the storage. +So you have the benefit of: +* Extremely fast access to local SeaweedFS Filer +* Near-Real-Time Backup to Amazon S3 with zero-cost upload network traffic. + +Of course, you can also backup to local disks on another machine. + +# Architecture + +``` + Filer --> Metadata Change Logs --> `weed filer.backup` --> AWS S3 + | + +-----> GCP + | + +-----> Azure + | + +-----> Backblaze B2 + | + +-----> Local Disk +``` + +All file meta data changes in Filer are saved in the logs and can be subscribed. See [[Filer Change Data Capture]]. +A "weed filer.backup" process will subscribe to this topic, and then read the actual file content, and send the update to the cloud sink or local disk sinks. + +* Sinks can be: AWS S3, Google Cloud Storage, Microsoft Azure, Backblaze B2, or Local Disk. + + +# Configuration + +This command replaced the previous `weed filer.replicate`, which requires an external message queue. +But for configuration, use the same `weed scaffold -config=replication` to generate a `replication.toml` file. Just need to keep the linkes of the sinks that you want to use. + +``` +[sink.s3] +# read credentials doc at https://docs.aws.amazon.com/sdk-for-go/v1/developer-guide/sessions.html +# default loads credentials from the shared credentials file (~/.aws/credentials). +enabled = false +aws_access_key_id = "" # if empty, loads from the shared credentials file (~/.aws/credentials). +aws_secret_access_key = "" # if empty, loads from the shared credentials file (~/.aws/credentials). +region = "us-east-2" +bucket = "backupbucket" # an existing bucket +directory = "/" # destination directory +endpoint = "http://localhost:8334" +is_incremental = false + +``` + +# Running Backup +1. Make sure the `replication.toml` is in place. +1. Start the backup by running `weed filer.backup`. + +# Incremental Mode +If `is_incremental = true`, all the files are backed up under the `YYYY-MM-DD` directories, which the timestamps are based on modified time. +So +* Each date directory contains all new and updated files. +* The deleted files in the source filer will not be deleted on the backup. + +So if in this folder, on `2021-03-01`, these files are created in the source: +``` + /dir1/file1 + /dir1/file2 + /dir1/file3 +``` +and on `2021-03-02`, these files are created, modified, deleted in the source: +``` + /dir1/file1 // modified + /dir1/file2 // not changed + /dir1/file3 // deleted + /dir1/file4 // created +``` + +The backup destination will have the following directory structure. +``` + /2021-03-01/dir1/file1 + /2021-03-01/dir1/file2 + /2021-03-01/dir1/file3 + /2021-03-02/dir1/file1 + /2021-03-02/dir1/file4 +``` +