Updated Filer as a Key Large Value Store (markdown)

Chris Lu 2020-12-29 22:41:02 -08:00
parent 1e4d08dee2
commit 837e5225f8

@ -1,8 +1,12 @@
# Key-Large-Value Store
Usually key-value stores are limited by the storage capacity. As the data size grows, KV stores usually get slower due to more IO cost. For example, Cassandra needs to compact its log-structured-merge tree periodically, when all key and value data are sorted, merged, and persisted to disk multiple times. The larger data size can only slow down the operations.
Usually a distributed file system is considered slow but with large capacity, while a distributed key-value store is not advised to store large binary objects.
Instead, we can move the large value data outside of KV stores, and only store the reference in KV stores.
What about a key value system that can store small and large objects with almost unlimited capacity?
It is true that key-value stores are limited by the storage capacity. As the data size grows, KV stores usually get slower due to more IO cost. For example, Cassandra needs to compact its log-structured-merge tree periodically, when all key and value data are sorted, merged, and persisted to disk multiple times. The larger data size can only slow down the operations.
We can store the large value data outside of KV stores, and only store the reference in KV stores.
With SeaweedFS, a large value can be stored as a chunk on volume servers and can be referenced by a 16-byte chunk id. In addition, the smaller values can be stored directly in KV stores, saving one network hop. So you can get an efficient KV store with almost unlimited size, while still have the same access speed for small values and reasonable speed for larger values.