diff --git a/Remote-Storage-Architecture.md b/Remote-Storage-Architecture.md index 4d84a40..77cfb91 100644 --- a/Remote-Storage-Architecture.md +++ b/Remote-Storage-Architecture.md @@ -1,29 +1,3 @@ -Nowadays, the trend is to go to cloud storage, since "everybody is doing it". - -## Cloud is not for everyone -But after really using cloud storage, many users will find: - -* The cloud cost is too high. On [[AWS S3|https://aws.amazon.com/s3/pricing/]], the storage cost is relatively cheap (but not really) around $0.023 per GB per month. But there are other costs which can add up quickly: - * API cost for PUT, POST, LIST requests is $0.005 per 1000 requests - * Transfer out cost is $0.09 per GB. -* The network latency is high. -* The response latency is not consistent. -* Any code changes may increase your total cost. -* It limits engineers' creativity and development speed in order to watch for cost. - -## SeaweedFS can be a good choice - -SeaweedFS can be good because: - -* Freedom to read your own data! Any times that you want! -* Freedom to develop new features with a fixed budget. -* Faster high-capacity storage hardware is also getting cheaper. -* Local access latency. -* Avoid noisy neighbor problem. -* Cross data center replication gives high data redundancy and availability. - -However, how to make SeaweedFS work with data already on cloud? - # SeaweedFS Remote Storage Cache With this feature, SeaweedFS can cache data that is on cloud. It can cache metadata and file content. Given SeaweedFS unlimited scalability, the cache size is actually unlimited. Any local changes can be write back to the cloud asynchronously. @@ -74,15 +48,9 @@ Local changes are write back by the `weed filer.remote.sync` process, which is a If not starting `weed filer.remote.sync`, the data changes will not be propagated back to the cloud. -# Possible Use Cases - -* Machine learning training jobs need to repeatedly visit a large set of files. Increase training speed and reduce API cost and network cost. -* Saving data files. With cloud capacity and storage tiering, saving data files there may be a good idea. This feature can save the programming effort. -* Run Spark/Flink jobs on mounted folders for faster computation. -* Multiple access methods, HDFS/HTTP/S3/WebDav/Mount, to access remote storage. No need to use one specific way to access remote storage. -* If you plan to move off cloud, you can start with SeaweedFS Remote Storage Cache. When you are happy with it, just stop the write back process (and cancel the monthly payment to the cloud vendor!). # Continue to read + * [[Cloud Cache Benefits]] * [[Configure Remote Storage]] * [[Mount Remote Storage]] * [[Cache Remote Storage]]