Created Cloud Cache Benefits (markdown)

2024-01-19 02:48:24 +00:00 · 2021-08-16 01:14:40 -07:00 · 2021-08-16 01:14:40 -07:00 · 235fb7c73b
parent dd766e91c6
commit 235fb7c73b
1 changed files with 91 additions and 0 deletions
--- a/Cloud-Cache-Benefits.md
+++ b/Cloud-Cache-Benefits.md
@ -0,0 +1,91 @@
+# Context
+Nowadays, the trend is to go to cloud storage, since "everybody is doing it".
+
+## Cloud is not for everyone
+But after really using cloud storage, many users will find:
+
+* The cloud cost is too high. On [[AWS S3|https://aws.amazon.com/s3/pricing/]], the storage cost is relatively cheap (but not really) around $0.023 per GB per month. But there are other costs which can add up quickly:
+  * API cost for PUT, POST, LIST requests is $0.005 per 1000 requests
+  * Transfer out cost is $0.09 per GB.
+* The network latency is high.
+* The response latency is not consistent.
+* Any code changes may increase your total cost.
+* It limits engineers' creativity and development speed in order to watch for cost.
+
+## SeaweedFS can be a good choice
+
+SeaweedFS can be good because:
+
+* Freedom to read your own data! Any times that you want!
+* Freedom to develop new features with a fixed budget.
+* Faster high-capacity storage hardware is also getting cheaper.
+* Local access latency.
+* Avoid noisy neighbor problem.
+* Cross data center replication gives high data redundancy and availability.
+
+However, how to make SeaweedFS work with data already on cloud?
+
+# Design
+
+![SeaweedFS Remote Storage](https://raw.githubusercontent.com/chrislusf/seaweedfs/master/note/SeaweedFS_RemoteMount.png)
+
+# Benefits
+
+* Cached Locally
+  * Fast metadata operations.
+  * Fast read and write at local network latency and throughput.
+  * Fast and cheaper hardware.
+  * Avoid noisy neighbors.
+  * Minimum cost. Download data once.
+* Scalable Capacity
+  * Just pre-cache everything. No more delay on first uncached read.
+  * No need to try hard to find best caching strategy for different data access patterns.
+* Easy To Manage
+  * Warm up cache for by folder, file name pattern, file size, file age, etc.
+  * Uncache by folder, file name pattern, file size, file age, etc.
+  * Optionally write data back to cloud storage.
+* Flexible
+  * Can write data back to work with existing cloud ecosystems.
+  * Can transparently switch to different cloud storage vendors.
+  * Can detach from the cloud storage if decided to move off cloud.
+
+# Possible Use Cases
+
+* Machine learning
+  * Problem
+    * Training jobs need to repeatedly visit a large set of files. 
+    * The randomized access pattern is hard for caching.
+  * With SeaweedFS Cloud Cache
+    * Users can explicitly ask SeaweedFS Cloud Cache to cache one whole folder.
+    * Increase training speed and reduce API cost and network cost.
+    * Users can access data with FUSE mounted folders.
+* Data Hoarding
+  * Problem
+    * With cloud capacity and storage tiering, saving data files there may be a good idea. 
+    * Recently uploaded files very likely need to be accessed again.
+  * With SeaweedFS Cloud Cache
+    * Users can explicitly ask SeaweedFS Cloud Cache to uncache by file age.
+    * Users can also choose to never uncache, basically treating cloud copy as a backup.
+* Big Data
+  * Problem
+    * Run MapReduce, Spark, and Flink jobs on mounted folders for faster computation.
+  * With SeaweedFS Cloud Cache
+    * Avoiding slow cloud storage metadata access.
+    * Large amount of data access will not increase cost.
+    * Write back data to work with cloud ecosystems.
+* Cloud Storage Vendor Agnostic
+  * Problem
+  * Different datasets may need to be on different vendors, based on access pattern, latency, cost, etc.
+  * Transparently switch to from one vendor to another.
+* Move Off Cloud
+  * Problem
+    * Cloud storage is costly!
+  * With SeaweedFS Cloud Cache
+    * Help to transition between on-cloud to off-cloud.
+    * When you are happy with it, just stop the write back process (and cancel the monthly payment to the cloud vendor!).
+* Support multiple access methods.
+  * Problem
+    * You may need to access cloud data by HDFS, or HTTP, or S3 API, or WebDav, or FUSE Mount. 
+  * With SeaweedFS Cloud Cache
+    * Multiple ways to access remote storage.
+