From 99f2775b9bf2fe867ee9293f805d6828c3d2d97a Mon Sep 17 00:00:00 2001 From: Chris Lu Date: Tue, 17 Aug 2021 00:19:36 -0700 Subject: [PATCH] Created Cloud Cache Quick Setup (markdown) --- Cloud-Cache-Quick-Setup.md | 167 +++++++++++++++++++++++++++++++++++++ 1 file changed, 167 insertions(+) create mode 100644 Cloud-Cache-Quick-Setup.md diff --git a/Cloud-Cache-Quick-Setup.md b/Cloud-Cache-Quick-Setup.md new file mode 100644 index 0000000..9962862 --- /dev/null +++ b/Cloud-Cache-Quick-Setup.md @@ -0,0 +1,167 @@ +To users not familiar with SeaweedFS, there seems many things to learn. +But for SeaweedFS Cloud Cache, the setup is easy. + +# Setup a simple SeaweedFS cluster + +To act as a cache, the high availability requirement is not that hard. You can start with a simple SeaweedFS cluster. + +Since very likely you may want to use S3, the following will have S3 setup. + +Just run this to have a SeaweedFS cluster: +``` + $ weed server -s3 +``` + +## Setup S3 credentials +Start a `weed shell` +``` +$ weed shell +master: localhost:9333 filer: localhost:8888 +> s3.configure -h +Usage of s3.configure: + -access_key string + specify the access key + -actions string + comma separated actions names: Read,Write,List,Tagging,Admin + -apply + update and apply s3 configuration + -buckets string + bucket name + -delete + delete users, actions or access keys + -secret_key string + specify the secret key + -user string + user name +> s3.configure -user me -access_key=any -secret_key=any -buckets=bucket1 -actions=Read,Write,List,Tagging,Admin +{ + "identities": [ + { + "name": "me", + "credentials": [ + { + "accessKey": "any", + "secretKey": "any" + } + ], + "actions": [ + "Read:bucket1", + "Write:bucket1", + "List:bucket1", + "Tagging:bucket1", + "Admin:bucket1" + ] + } + ] +} +``` + +# Configure Remote Storage + +This step will configure a remote storage and how to access it. + +For this particular demo, the following command created a remote storage named "s5", which actually uses the credential we just created locally. So this remote storage is actually just a loop back to another local s3 account. + +In `weed shell`: +``` +> remote.configure -h +Usage of remote.configure: + -delete + delete one remote storage by its name + -name string + a short name to identify the remote storage + -s3.access_key string + s3 access key + -s3.endpoint string + endpoint for s3-compatible local object store + -s3.region string + s3 region (default "us-east-2") + -s3.secret_key string + s3 secret key + -type string + storage type, currently only support s3 (default "s3") + +> remote.configure -name=s5 -type=s3 -s3.access_key=any -s3.secret_key=any -s3.endpoint=http://localhost:8333 + +> remote.configure +{ + "type": "s3", + "name": "s5", + "s3AccessKey": "any", + "s3Region": "us-east-2", + "s3Endpoint": "http://localhost:8333" +} + +``` + +# Mount Remote Storage + +The remote storage can be mounted to any directory. Here is an example: +``` +> remote.mount -dir=/buckets/b2 -remote=s5/bucket1 -nonempty + +``` + +# Test the setup + +In the example, the remote source folders are empty. +In reality, your remote folder should have some files already. + +Right now you can already try to read or write to folder `/buckets/b2`. + +# Setup write back + +This step is only needed if you want local changes go back to the remote storage. + +For this example, just start one process as this: +``` +$ weed filer.remote.sync -dir=/buckets/b2 +``` + +This command will continuously write back changes of this mounted directory to the cloud storage. + +This command is designed to run as a background process. It can be paused by `ctl+c`. It can also try to re-connect to filer if disconnected. + +# Setup cache and uncache process + +Since only metadata are pulled and there are no file content cache, reading remote files are somewhat slow. + +You may want to cache a group of files, to make sure the first read is always fast. + +You may want to uncache a group of files, to save some local storage. + +These cache or uncache jobs can vary wildly. Here are some examples: + +``` +# cache a whole folder +> remote.cache -dir=/buckets/b2/a/b/c +# cache all parquet files +> remote.cache -dir=/buckets/b2 -include=*.parquet +# cache file size between 1024 and 10240 bytes inclusively +> remote.cache -dir=/buckets/b2 -minSize=1024 -maxSize=10240 + +# uncache file size older than 3600 seconds +> remote.uncache -dir=/buckets/b2 -maxAge=3600 +# uncache file size more than 10240 bytes +> remote.cache -dir=/buckets/b2 -minSize=10240 + +``` + +These jobs can be setup as scheduled cron jobs also. + +# Detect Cloud Data Updates + +If the cloud storage has other processes writing to it, the mounted folder needs to know the new files. + +Very likely you want to setup cron jobs to run `remote.meta.sync` regularly. + +``` +> remote.meta.sync -h +Usage of remote.meta.sync: + -dir string + a directory in filer +> remote.meta.sync -dir=/buckets/b2 + +``` + +