Created Cloud Cache Quick Setup (markdown)

Chris Lu 2021-08-17 00:19:36 -07:00
parent 3654d84de6
commit 99f2775b9b

167
Cloud-Cache-Quick-Setup.md Normal file

@ -0,0 +1,167 @@
To users not familiar with SeaweedFS, there seems many things to learn.
But for SeaweedFS Cloud Cache, the setup is easy.
# Setup a simple SeaweedFS cluster
To act as a cache, the high availability requirement is not that hard. You can start with a simple SeaweedFS cluster.
Since very likely you may want to use S3, the following will have S3 setup.
Just run this to have a SeaweedFS cluster:
```
$ weed server -s3
```
## Setup S3 credentials
Start a `weed shell`
```
$ weed shell
master: localhost:9333 filer: localhost:8888
> s3.configure -h
Usage of s3.configure:
-access_key string
specify the access key
-actions string
comma separated actions names: Read,Write,List,Tagging,Admin
-apply
update and apply s3 configuration
-buckets string
bucket name
-delete
delete users, actions or access keys
-secret_key string
specify the secret key
-user string
user name
> s3.configure -user me -access_key=any -secret_key=any -buckets=bucket1 -actions=Read,Write,List,Tagging,Admin
{
"identities": [
{
"name": "me",
"credentials": [
{
"accessKey": "any",
"secretKey": "any"
}
],
"actions": [
"Read:bucket1",
"Write:bucket1",
"List:bucket1",
"Tagging:bucket1",
"Admin:bucket1"
]
}
]
}
```
# Configure Remote Storage
This step will configure a remote storage and how to access it.
For this particular demo, the following command created a remote storage named "s5", which actually uses the credential we just created locally. So this remote storage is actually just a loop back to another local s3 account.
In `weed shell`:
```
> remote.configure -h
Usage of remote.configure:
-delete
delete one remote storage by its name
-name string
a short name to identify the remote storage
-s3.access_key string
s3 access key
-s3.endpoint string
endpoint for s3-compatible local object store
-s3.region string
s3 region (default "us-east-2")
-s3.secret_key string
s3 secret key
-type string
storage type, currently only support s3 (default "s3")
> remote.configure -name=s5 -type=s3 -s3.access_key=any -s3.secret_key=any -s3.endpoint=http://localhost:8333
> remote.configure
{
"type": "s3",
"name": "s5",
"s3AccessKey": "any",
"s3Region": "us-east-2",
"s3Endpoint": "http://localhost:8333"
}
```
# Mount Remote Storage
The remote storage can be mounted to any directory. Here is an example:
```
> remote.mount -dir=/buckets/b2 -remote=s5/bucket1 -nonempty
```
# Test the setup
In the example, the remote source folders are empty.
In reality, your remote folder should have some files already.
Right now you can already try to read or write to folder `/buckets/b2`.
# Setup write back
This step is only needed if you want local changes go back to the remote storage.
For this example, just start one process as this:
```
$ weed filer.remote.sync -dir=/buckets/b2
```
This command will continuously write back changes of this mounted directory to the cloud storage.
This command is designed to run as a background process. It can be paused by `ctl+c`. It can also try to re-connect to filer if disconnected.
# Setup cache and uncache process
Since only metadata are pulled and there are no file content cache, reading remote files are somewhat slow.
You may want to cache a group of files, to make sure the first read is always fast.
You may want to uncache a group of files, to save some local storage.
These cache or uncache jobs can vary wildly. Here are some examples:
```
# cache a whole folder
> remote.cache -dir=/buckets/b2/a/b/c
# cache all parquet files
> remote.cache -dir=/buckets/b2 -include=*.parquet
# cache file size between 1024 and 10240 bytes inclusively
> remote.cache -dir=/buckets/b2 -minSize=1024 -maxSize=10240
# uncache file size older than 3600 seconds
> remote.uncache -dir=/buckets/b2 -maxAge=3600
# uncache file size more than 10240 bytes
> remote.cache -dir=/buckets/b2 -minSize=10240
```
These jobs can be setup as scheduled cron jobs also.
# Detect Cloud Data Updates
If the cloud storage has other processes writing to it, the mounted folder needs to know the new files.
Very likely you want to setup cron jobs to run `remote.meta.sync` regularly.
```
> remote.meta.sync -h
Usage of remote.meta.sync:
-dir string
a directory in filer
> remote.meta.sync -dir=/buckets/b2
```