Updated Filer as a Key Large Value Store (markdown)

Chris Lu 2020-12-29 02:53:41 -08:00
parent a90b04cceb
commit fc1f5e68aa

@ -4,19 +4,17 @@ Usually key-value stores are limited by the storage capacity. As the data size g
Instead, we can move the large value data outside of KV stores, and only store the reference in KV stores.
With SeaweedFS, the large value can be stored as a chunk on volume servers and can be referenced by chunk id, which is a small string. In addition, the smaller values can be stored directly in KV stores, saving one network hop. So you can get a key value store with almost unlimited size, while still have the same access speed for small values and reasonable speed for larger values.
With SeaweedFS, a large value can be stored as a chunk on volume servers and can be referenced by a chunk id, which is a small string. In addition, the smaller values can be stored directly in KV stores, saving one network hop. So you can get an efficient KV store with almost unlimited size, while still have the same access speed for small values and reasonable speed for larger values.
## How to do it?
There are two tricks:
* Configure to store small values in filer store. E.g., `weed filer -cacheToFilerLimit=1024` will store values less than 1KB to filer store itself. Accessing these entries are basically no difference than accessing the underlying key-value stores directly.
* Configure for [[Super Large Directories]]. It will avoid the additional work to maintain the file list in the large folder.
With the larger values offloaded, the underlying key-value store will also perform better with less bytes to move around.
* Optionally configure to store small values in filer store. E.g., `weed filer -cacheToFilerLimit=1024` will store values less than 1KB to filer store itself. Accessing these entries are basically no difference than accessing the underlying key-value stores directly.
## AWS S3 is a poor key-value store
Using AWS S3 as a key-value store is tempting with its unlimited capacity. However, AWS S3 has no SLA for its access latency since there are a lot of noisy neighbors.
Using AWS S3 as a key-value store is tempting due to its unlimited capacity. However, AWS S3 has no SLA for its access latency since there are a lot of noisy neighbors.
What is more, the API cost is a big concern. AWS S3 seems cheap for storage, but for small objects which requires frequent access, the API cost can quickly add up at $0.005 for 1 thousand PUT/DELETE requests.
@ -33,7 +31,7 @@ So not only it is fairly slow, but also it is expensive to use S3 as a key-value
### SeaweedFS Benefits
The databases usually is much more expensive than SeaweedFS for the same capacity. It would be nice to to quickly store and access files with unlimited disk space, and only store a file name in the main database.
SeaweedFS has no additional costs for API access, as long as you setup the SeaweedFS cluster. For example, it is a fixed monthly cost if running on EC2 instances. You do not need to worry about unmanageable API costs.
SeaweedFS has no additional costs for API access. For example, it is a fixed monthly cost if running on EC2 instances. You do not need to worry about unmanageable API costs.
And SeaweedFS can offload less-accessed data to S3 with [[Cloud Tier]], so it literally has unlimited storage space.