From 483dbaf0740524047e40ac827fe1b24464d23ec7 Mon Sep 17 00:00:00 2001 From: Chris Lu Date: Fri, 16 Sep 2022 03:15:44 -0700 Subject: [PATCH] Updated Words from SeaweedFS Users (markdown) --- Words-from-SeaweedFS-Users.md | 1 + 1 file changed, 1 insertion(+) diff --git a/Words-from-SeaweedFS-Users.md b/Words-from-SeaweedFS-Users.md index d6386f8..c15b74a 100644 --- a/Words-from-SeaweedFS-Users.md +++ b/Words-from-SeaweedFS-Users.md @@ -1,5 +1,6 @@ | Use cases | Details | Comments | | ---- | -- | -- | +| FileCoin storage. | Cluster/Server configuration we're going with is: Masters on servers 1-3, Filers on servers 4-6, Redis Cluster on servers 1-6, 42 Volume services per server on servers 1-15. Write all data in using replication 002 and will allow EC transition in the background. It would be nice if we could go straight to EC and if the K+M values were configurable without recompiling but should be ok for now. Since each server has 168 drives we've opted to go with four 18TB drives in a RAID0 on each server. We've decided on this because we have seen some instances were single threaded transfers were throughput limited by going to a single disk and would like to avoid that going forward. This cluster will be used to process high volumes of randomly sized files into standard 32GiB files that we then process into our Filecoin sealing and long-term storage environments. | A large SeaweedFS cluster using 15 servers, each with 168x 18TB drives attached. Which works out to 45.3 PB of raw capacity. | | Machine learning training in UCSD | Lots of small random reads via S3. | Spun up 75 GPUs and it's ticking along happily. I can not get more than about 20 GPUS going in parallel on Ceph with these IO intensive jobs. ![](https://pbs.twimg.com/media/FYJQOXxUsAAuK64?format=png&name=4096x4096) | | Using SeaweedFS as part of the startup [OroraTech](https://ororatech.com/) where we are processing large amounts of infrared and visual satellite data to detect wildfires worldwide. Along with the actual hotspot detections we are generating large amounts of prerendered tile data (Slippy Maps) from the near realtime satellite images.| Right now running two identical dedicated servers with 140TB storage each for the staging and prod deployments of the tile storage. On these servers SeaweedFS is deployed through docker-compose and with a reverse proxy in front of it. This setup should be sufficient for quite some time, but the option of scaling to a distributed deployment in the future is helpful. | We were using AWS S3 for this purpose but wanted to find a scalable solution to handle our rapidly increasing S3 costs ($400/day mainly caused by the number of PUT operations, 40TB in 1 billion objects in the end). | | [Source Code](https://github.com/EVERYGO111/OStoreBench), [Paper](https://github.com/EVERYGO111/OStoreBench/blob/master/research%20paper-OStoreBench.pdf) from Chinese Academy of Science, ByteDance | OStoreBench: Open source Benchmarking Distributed Object Storage Systems Using Real-word Application Scenarios, Benchmark SeaweedFS with CEPH, Swift | OStoreBench: The performance of SeaweedFS is the best in three typical scenarios compared to Ceph and Swift. |