Created Cloud Tier (markdown)

Chris Lu 2019-12-03 23:18:12 -08:00
parent aa5069fd17
commit 013d3d89ad

51
Cloud-Tier.md Normal file

@ -0,0 +1,51 @@
## Motivation
Cloud storage is an ideal place to backup warm data. Its storage is scalable, and cost is usually low compared to on-premise storage servers. Uploading to the cloud is usually free. However, usually the cloud storage access is not free and slow.
SeaweedFS is fast. However, it is limited by available number of volume servers.
One good way is to combine SeaweedFS with the cloud storage.
Assuming hot data is 20% and warm data is 80%. We can move the warm data to the cloud storage. The access for the warm data will be slower, but this can free up 80% servers, or repurpose them for faster local access, instead of just storing warm data with little access. This integration is all transparent to SeaweedFS users.
This transparent cloud integration literally gives SeaweedFS unlimited capacity, in addition to its fast speed. Just add more local SeaweedFS volume servers to increase the throughput.
## Design
If one volume is tiered to the cloud,
* The volume is marked as readonly.
* The index file is still local
* The `.dat` file is moved to the cloud.
* The same O(1) disk read is applied to the remote file. When requesting a file entry, a single range request retrieves the entry's content.
## Usage
1. Use `weed scaffold -conf=master` to generate `master.toml`, tweak it, and start master server with the `master.toml`.
1. Use `volume.tier` in `weed shell` to move volumes to the cloud.
## Configuring Storage Backend
(Currently only s3 is developed. More is coming soon.)
```
[storage.backend]
[storage.backend.s3.default]
enabled = true
aws_access_key_id = "" # if empty, loads from the shared credentials file (~/.aws/credentials).
aws_secret_access_key = "" # if empty, loads from the shared credentials file (~/.aws/credentials).
region = "us-west-1"
bucket = "one_bucket" # an existing bucket
```
After this is configured, you can use this command.
```
// move the volume 37.dat to the s3 cloud
volume.tier -dest=s3 -collection=benchmark -volumeId=37
// or
volume.tier -dest=s3.default -collection=benchmark -volumeId=37
```
## Data Layout
The dat file on the cloud will be laid out following best practices. Especially, the name is a randomized UUID to ensure the dat file can be spread out evenly.