Updated Filer as a Key Large Value Store (markdown)

Chris Lu 2020-12-27 15:32:53 -08:00
parent e7101f406f
commit 076ac22e67

@ -12,16 +12,11 @@ This way, values less than 1KB is basically the same as the underlying key-value
With the larger values offloaded, the underlying key-value store will also perform better with less bytes to move around.
# Benchmark with YCSB as a Key-Value store
## AWS S3 is a poor key-value store
[Yahoo! Cloud Serving Benchmark(YCSB)](https://github.com/brianfrankcooper/YCSB) is a framework for evaluating the performance of different “key-value” and “cloud” serving stores.
Using AWS S3 as a key-value store is tempting with its unlimited capacity. However, AWS S3 has no SLA for its access latency since there are a lot of noisy neighbors.
## S3 is a poor key-value store
It is a bit strange to evaluate a file system as a key-value store. However, I noticed AWS S3 is one of the stores supported. So I decided to try it out.
But after digging deeper into it, I found the S3 is poorly implemented and not maintained. It is only single threaded, the serialization/deserialization seems wrong, and the library is about 5 years old.
What is more, the API cost is a big concern. S3 seems cheap for storage, but for small objects which requires frequent access, the API cost can quickly add up at $0.005 for 1 thousand PUT/DELETE requests.
What is more, the API cost is a big concern. AWS S3 seems cheap for storage, but for small objects which requires frequent access, the API cost can quickly add up at $0.005 for 1 thousand PUT/DELETE requests.
If we need to test with 1 million objects:
* 1 million write operations cost $5, or $150/month for 12 operations/second in production.
@ -29,7 +24,9 @@ If we need to test with 1 million objects:
So not only it is fairly slow, but also it is expensive to use S3 as a key-value store.
## SeaweedFS with YCSB
# Benchmark with YCSB as a Key-Value store
[Yahoo! Cloud Serving Benchmark(YCSB)](https://github.com/brianfrankcooper/YCSB) is a framework for evaluating the performance of different “key-value” and “cloud” serving stores.
### SeaweedFS Benefits
The databases usually is much more expensive than SeaweedFS for the same capacity. It would be nice to to quickly store and access files with unlimited disk space, and only store a file name in the main database.
@ -40,7 +37,7 @@ And SeaweedFS can offload less-accessed data to S3 with [[Cloud Tier]], so it li
### SeaweedFS on YCSB
I forked YCSB to https://github.com/chrislusf/YCSB and added SeaweedFS.
SeaweedFS has been added to YCSB https://github.com/brianfrankcooper/YCSB/tree/master/seaweedfs
To run SeaweedFS benchmark with YCSB, just checkout the repo, and run
```