From 013d3d89ad366b59149aef7f76f6ba130e89292c Mon Sep 17 00:00:00 2001 From: Chris Lu Date: Tue, 3 Dec 2019 23:18:12 -0800 Subject: [PATCH] Created Cloud Tier (markdown) --- Cloud-Tier.md | 51 +++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 51 insertions(+) create mode 100644 Cloud-Tier.md diff --git a/Cloud-Tier.md b/Cloud-Tier.md new file mode 100644 index 0000000..5c99693 --- /dev/null +++ b/Cloud-Tier.md @@ -0,0 +1,51 @@ +## Motivation +Cloud storage is an ideal place to backup warm data. Its storage is scalable, and cost is usually low compared to on-premise storage servers. Uploading to the cloud is usually free. However, usually the cloud storage access is not free and slow. + +SeaweedFS is fast. However, it is limited by available number of volume servers. + +One good way is to combine SeaweedFS with the cloud storage. + +Assuming hot data is 20% and warm data is 80%. We can move the warm data to the cloud storage. The access for the warm data will be slower, but this can free up 80% servers, or repurpose them for faster local access, instead of just storing warm data with little access. This integration is all transparent to SeaweedFS users. + +This transparent cloud integration literally gives SeaweedFS unlimited capacity, in addition to its fast speed. Just add more local SeaweedFS volume servers to increase the throughput. + +## Design +If one volume is tiered to the cloud, +* The volume is marked as readonly. +* The index file is still local +* The `.dat` file is moved to the cloud. +* The same O(1) disk read is applied to the remote file. When requesting a file entry, a single range request retrieves the entry's content. + +## Usage +1. Use `weed scaffold -conf=master` to generate `master.toml`, tweak it, and start master server with the `master.toml`. +1. Use `volume.tier` in `weed shell` to move volumes to the cloud. + +## Configuring Storage Backend +(Currently only s3 is developed. More is coming soon.) +``` +[storage.backend] + [storage.backend.s3.default] + enabled = true + aws_access_key_id = "" # if empty, loads from the shared credentials file (~/.aws/credentials). + aws_secret_access_key = "" # if empty, loads from the shared credentials file (~/.aws/credentials). + region = "us-west-1" + bucket = "one_bucket" # an existing bucket +``` + +After this is configured, you can use this command. + +``` +// move the volume 37.dat to the s3 cloud +volume.tier -dest=s3 -collection=benchmark -volumeId=37 +// or +volume.tier -dest=s3.default -collection=benchmark -volumeId=37 + +``` + + +## Data Layout +The dat file on the cloud will be laid out following best practices. Especially, the name is a randomized UUID to ensure the dat file can be spread out evenly. + + + +