Updated Words from SeaweedFS Users (markdown)

Chris Lu 2021-10-14 21:40:44 -07:00
parent f950bd79ce
commit 51755f94af

@ -1,7 +1,7 @@
| Use cases | Details | Comments | | Use cases | Details | Comments |
| ---- | -- | -- | | ---- | -- | -- |
| Using SeaweedFS as part of the startup [OroraTech](https://ororatech.com/) where we are processing large amounts of infrared and visual satellite data to detect wildfires worldwide. Along with the actual hotspot detections we are generating large amounts of prerendered tile data (Slippy Maps) from the near realtime satellite images.| Right now running two identical dedicated servers with 140TB storage each for the staging and prod deployments of the tile storage. On these servers SeaweedFS is deployed through docker-compose and with a reverse proxy in front of it. This setup should be sufficient for quite some time, but the option of scaling to a distributed deployment in the future is helpful. | We were using AWS S3 for this purpose but wanted to find a scalable solution to handle our rapidly increasing S3 costs ($400/day mainly caused by the number of PUT operations, 40TB in 1 billion objects in the end). | | Using SeaweedFS as part of the startup [OroraTech](https://ororatech.com/) where we are processing large amounts of infrared and visual satellite data to detect wildfires worldwide. Along with the actual hotspot detections we are generating large amounts of prerendered tile data (Slippy Maps) from the near realtime satellite images.| Right now running two identical dedicated servers with 140TB storage each for the staging and prod deployments of the tile storage. On these servers SeaweedFS is deployed through docker-compose and with a reverse proxy in front of it. This setup should be sufficient for quite some time, but the option of scaling to a distributed deployment in the future is helpful. | We were using AWS S3 for this purpose but wanted to find a scalable solution to handle our rapidly increasing S3 costs ($400/day mainly caused by the number of PUT operations, 40TB in 1 billion objects in the end). |
| [Source Code](https://github.com/EVERYGO111/OStoreBench), [Paper] (https://github.com/EVERYGO111/OStoreBench/blob/master/research%20paper-OStoreBench.pdf) from Chinese Academy of Science, ByteDance | OStoreBench: Open source Benchmarking Distributed Object Storage Systems Using Real-word Application Scenarios, Benchmark SeaweedFS with CEPH, Swift | OStoreBench: The performance of SeaweedFS is the best in three typical scenarios compared to Ceph and Swift. | | [Source Code](https://github.com/EVERYGO111/OStoreBench), [Paper](https://github.com/EVERYGO111/OStoreBench/blob/master/research%20paper-OStoreBench.pdf) from Chinese Academy of Science, ByteDance | OStoreBench: Open source Benchmarking Distributed Object Storage Systems Using Real-word Application Scenarios, Benchmark SeaweedFS with CEPH, Swift | OStoreBench: The performance of SeaweedFS is the best in three typical scenarios compared to Ceph and Swift. |
| replaced ceph with a seaweedfs under the docker registry in production | Under the registry half a million files. Not big but have intensive exchange. | Killer feature of seaweedfs is that it disign like S3 in yandex and can work in k8s and spread between data centers. Ceph has a bad design in the case of using a huge number of small files over 10 million, cluster recovery takes several days. The next step is to use instead of Glusterfs, which is now barely alive and is bent from 10 million files. | | replaced ceph with a seaweedfs under the docker registry in production | Under the registry half a million files. Not big but have intensive exchange. | Killer feature of seaweedfs is that it disign like S3 in yandex and can work in k8s and spread between data centers. Ceph has a bad design in the case of using a huge number of small files over 10 million, cluster recovery takes several days. The next step is to use instead of Glusterfs, which is now barely alive and is bent from 10 million files. |
| we use seaweedfs embedded in our AI products that are deployed on client site (usually AirGapped because of the sensitivity of the data)| clusters ranging from 3-10 servers (and now starting to get bigger and bigger), usually retaining 7-14 days video and 30-60 days of thumbnails | we comared CEPH & Minio, we checked deployment procedure & maintenance and especially performance of writes and especially single server performance and easy scale out. we went and found that seaweedfs always won. we mainly write intensive and rarely read (usually reading as soon as write, so no real disk access) and 95% of the data is not missing critical, so the easiness of seaweedfs and the amazing performance (all writes are sequential as possible) | | we use seaweedfs embedded in our AI products that are deployed on client site (usually AirGapped because of the sensitivity of the data)| clusters ranging from 3-10 servers (and now starting to get bigger and bigger), usually retaining 7-14 days video and 30-60 days of thumbnails | we comared CEPH & Minio, we checked deployment procedure & maintenance and especially performance of writes and especially single server performance and easy scale out. we went and found that seaweedfs always won. we mainly write intensive and rarely read (usually reading as soon as write, so no real disk access) and 95% of the data is not missing critical, so the easiness of seaweedfs and the amazing performance (all writes are sequential as possible) |
| [Holding lots of files](https://hypixel.net/threads/dev-blog-5-storing-your-skyblock-island.2190753/) | We've had to develop our own backup script and monitoring, interfacing with SeaweedFS. Backups of the whole dataset are done twice a day and stored in S3 for a few weeks. We run SeaweedFS across 3 volume servers which all use very low resources, always replicating volumes on the 3 servers for availability and peace of mind. The Seaweed FID are stored in Mongo. | It is basically Amazon S3, but self-hosted.| | [Holding lots of files](https://hypixel.net/threads/dev-blog-5-storing-your-skyblock-island.2190753/) | We've had to develop our own backup script and monitoring, interfacing with SeaweedFS. Backups of the whole dataset are done twice a day and stored in S3 for a few weeks. We run SeaweedFS across 3 volume servers which all use very low resources, always replicating volumes on the 3 servers for availability and peace of mind. The Seaweed FID are stored in Mongo. | It is basically Amazon S3, but self-hosted.|