mirror of
https://github.com/seaweedfs/seaweedfs.git
synced 2024-01-19 02:48:24 +00:00
Created Cloud Cache Benefits (markdown)
parent
dd766e91c6
commit
235fb7c73b
91
Cloud-Cache-Benefits.md
Normal file
91
Cloud-Cache-Benefits.md
Normal file
|
@ -0,0 +1,91 @@
|
|||
# Context
|
||||
Nowadays, the trend is to go to cloud storage, since "everybody is doing it".
|
||||
|
||||
## Cloud is not for everyone
|
||||
But after really using cloud storage, many users will find:
|
||||
|
||||
* The cloud cost is too high. On [[AWS S3|https://aws.amazon.com/s3/pricing/]], the storage cost is relatively cheap (but not really) around $0.023 per GB per month. But there are other costs which can add up quickly:
|
||||
* API cost for PUT, POST, LIST requests is $0.005 per 1000 requests
|
||||
* Transfer out cost is $0.09 per GB.
|
||||
* The network latency is high.
|
||||
* The response latency is not consistent.
|
||||
* Any code changes may increase your total cost.
|
||||
* It limits engineers' creativity and development speed in order to watch for cost.
|
||||
|
||||
## SeaweedFS can be a good choice
|
||||
|
||||
SeaweedFS can be good because:
|
||||
|
||||
* Freedom to read your own data! Any times that you want!
|
||||
* Freedom to develop new features with a fixed budget.
|
||||
* Faster high-capacity storage hardware is also getting cheaper.
|
||||
* Local access latency.
|
||||
* Avoid noisy neighbor problem.
|
||||
* Cross data center replication gives high data redundancy and availability.
|
||||
|
||||
However, how to make SeaweedFS work with data already on cloud?
|
||||
|
||||
# Design
|
||||
|
||||
![SeaweedFS Remote Storage](https://raw.githubusercontent.com/chrislusf/seaweedfs/master/note/SeaweedFS_RemoteMount.png)
|
||||
|
||||
# Benefits
|
||||
|
||||
* Cached Locally
|
||||
* Fast metadata operations.
|
||||
* Fast read and write at local network latency and throughput.
|
||||
* Fast and cheaper hardware.
|
||||
* Avoid noisy neighbors.
|
||||
* Minimum cost. Download data once.
|
||||
* Scalable Capacity
|
||||
* Just pre-cache everything. No more delay on first uncached read.
|
||||
* No need to try hard to find best caching strategy for different data access patterns.
|
||||
* Easy To Manage
|
||||
* Warm up cache for by folder, file name pattern, file size, file age, etc.
|
||||
* Uncache by folder, file name pattern, file size, file age, etc.
|
||||
* Optionally write data back to cloud storage.
|
||||
* Flexible
|
||||
* Can write data back to work with existing cloud ecosystems.
|
||||
* Can transparently switch to different cloud storage vendors.
|
||||
* Can detach from the cloud storage if decided to move off cloud.
|
||||
|
||||
# Possible Use Cases
|
||||
|
||||
* Machine learning
|
||||
* Problem
|
||||
* Training jobs need to repeatedly visit a large set of files.
|
||||
* The randomized access pattern is hard for caching.
|
||||
* With SeaweedFS Cloud Cache
|
||||
* Users can explicitly ask SeaweedFS Cloud Cache to cache one whole folder.
|
||||
* Increase training speed and reduce API cost and network cost.
|
||||
* Users can access data with FUSE mounted folders.
|
||||
* Data Hoarding
|
||||
* Problem
|
||||
* With cloud capacity and storage tiering, saving data files there may be a good idea.
|
||||
* Recently uploaded files very likely need to be accessed again.
|
||||
* With SeaweedFS Cloud Cache
|
||||
* Users can explicitly ask SeaweedFS Cloud Cache to uncache by file age.
|
||||
* Users can also choose to never uncache, basically treating cloud copy as a backup.
|
||||
* Big Data
|
||||
* Problem
|
||||
* Run MapReduce, Spark, and Flink jobs on mounted folders for faster computation.
|
||||
* With SeaweedFS Cloud Cache
|
||||
* Avoiding slow cloud storage metadata access.
|
||||
* Large amount of data access will not increase cost.
|
||||
* Write back data to work with cloud ecosystems.
|
||||
* Cloud Storage Vendor Agnostic
|
||||
* Problem
|
||||
* Different datasets may need to be on different vendors, based on access pattern, latency, cost, etc.
|
||||
* Transparently switch to from one vendor to another.
|
||||
* Move Off Cloud
|
||||
* Problem
|
||||
* Cloud storage is costly!
|
||||
* With SeaweedFS Cloud Cache
|
||||
* Help to transition between on-cloud to off-cloud.
|
||||
* When you are happy with it, just stop the write back process (and cancel the monthly payment to the cloud vendor!).
|
||||
* Support multiple access methods.
|
||||
* Problem
|
||||
* You may need to access cloud data by HDFS, or HTTP, or S3 API, or WebDav, or FUSE Mount.
|
||||
* With SeaweedFS Cloud Cache
|
||||
* Multiple ways to access remote storage.
|
||||
|
Loading…
Reference in a new issue