mirror of
https://github.com/seaweedfs/seaweedfs.git
synced 2024-01-19 02:48:24 +00:00
Page:
Words from SeaweedFS Users
Pages
AWS CLI with SeaweedFS
AWS IAM CLI
Actual Users
Amazon IAM API
Amazon S3 API
Applications
Async Backup
Async Filer Metadata Backup
Async Replication to Cloud
Async Replication to another Filer
Benchmark SeaweedFS as a GlusterFS replacement
Benchmarks from jinleileiking
Benchmarks
Cache Remote Storage
Choosing a Filer Store
Client Libraries
Cloud Drive Architecture
Cloud Drive Benefits
Cloud Drive Quick Setup
Cloud Monitoring
Cloud Tier
Components
Configure Remote Storage
Customize Filer Store
Data Backup
Data Structure for Large Files
Deployment to Kubernetes and Minikube
Directories and Files
Docker Image Registry with SeaweedFS
Environment Variables
Erasure Coding for warm storage
FAQ
FIO benchmark
FUSE Mount
Failover Master Server
Filer Active Active cross cluster continuous synchronization
Filer Cassandra Setup
Filer Change Data Capture
Filer Commands and Operations
Filer Data Encryption
Filer JWT Use
Filer Metadata Events
Filer Redis Setup
Filer Server API
Filer Setup
Filer Store Replication
Filer Stores
Filer as a Key Large Value Store
Gateway to Remote Object Storage
Getting Started
HDFS via S3 connector
Hadoop Benchmark
Hadoop Compatible File System
Hardware
Home
Independent Benchmarks
Kubernetes Backups and Recovery with K8up
Large File Handling
Load Command Line Options from a file
Master Server API
Migrate to Filer Store
Mount Remote Storage
Optimization
Path Specific Configuration
Path Specific Filer Store
Production Setup
Replication
Run Blob Storage on Public Internet
Run Presto on SeaweedFS
S3 API Audit log
S3 API Benchmark
S3 API FAQ
S3 Bucket Quota
S3 Nginx Proxy
SRV Service Discovery
SeaweedFS Java Client
SeaweedFS in Docker Swarm
Security Configuration
Security Overview
Server Startup Setup
Store file with a Time To Live
Super Large Directories
System Metrics
TensorFlow with SeaweedFS
Tiered Storage
UrBackup with SeaweedFS
Use Cases
Volume Management
Volume Server API
WebDAV
Words from SeaweedFS Users
nodejs with Seaweed S3
rclone with SeaweedFS
restic with SeaweedFS
run HBase on SeaweedFS
run Spark on SeaweedFS
s3cmd with SeaweedFS
weed shell
This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
Use cases | Details | Comments |
---|---|---|
FileCoin storage. | Cluster/Server configuration we're going with is: Masters on servers 1-3, Filers on servers 4-6, Redis Cluster on servers 1-6, 42 Volume services per server on servers 1-15. Write all data in using replication 002 and will allow EC transition in the background. It would be nice if we could go straight to EC and if the K+M values were configurable without recompiling but should be ok for now. Since each server has 168 drives we've opted to go with four 18TB drives in a RAID0 on each server. We've decided on this because we have seen some instances were single threaded transfers were throughput limited by going to a single disk and would like to avoid that going forward. This cluster will be used to process high volumes of randomly sized files into standard 32GiB files that we then process into our Filecoin sealing and long-term storage environments. | A large SeaweedFS cluster using 15 servers, each with 168x 18TB drives attached. Which works out to 45.3 PB of raw capacity. |
Machine learning training in UCSD | Lots of small random reads via S3. | Spun up 75 GPUs and it's ticking along happily. I can not get more than about 20 GPUS going in parallel on Ceph with these IO intensive jobs. |
Using SeaweedFS as part of the startup OroraTech where we are processing large amounts of infrared and visual satellite data to detect wildfires worldwide. Along with the actual hotspot detections we are generating large amounts of prerendered tile data (Slippy Maps) from the near realtime satellite images. | Right now running two identical dedicated servers with 140TB storage each for the staging and prod deployments of the tile storage. On these servers SeaweedFS is deployed through docker-compose and with a reverse proxy in front of it. This setup should be sufficient for quite some time, but the option of scaling to a distributed deployment in the future is helpful. | We were using AWS S3 for this purpose but wanted to find a scalable solution to handle our rapidly increasing S3 costs ($400/day mainly caused by the number of PUT operations, 40TB in 1 billion objects in the end). |
Source Code, Paper from Chinese Academy of Science, ByteDance | OStoreBench: Open source Benchmarking Distributed Object Storage Systems Using Real-word Application Scenarios, Benchmark SeaweedFS with CEPH, Swift | OStoreBench: The performance of SeaweedFS is the best in three typical scenarios compared to Ceph and Swift. |
replaced ceph with a seaweedfs under the docker registry in production | Under the registry half a million files. Not big but have intensive exchange. | Killer feature of seaweedfs is that it disign like S3 in yandex and can work in k8s and spread between data centers. Ceph has a bad design in the case of using a huge number of small files over 10 million, cluster recovery takes several days. The next step is to use instead of Glusterfs, which is now barely alive and is bent from 10 million files. |
we use seaweedfs embedded in our AI products that are deployed on client site (usually AirGapped because of the sensitivity of the data) | clusters ranging from 3-10 servers (and now starting to get bigger and bigger), usually retaining 7-14 days video and 30-60 days of thumbnails | we comared CEPH & Minio, we checked deployment procedure & maintenance and especially performance of writes and especially single server performance and easy scale out. we went and found that seaweedfs always won. we mainly write intensive and rarely read (usually reading as soon as write, so no real disk access) and 95% of the data is not missing critical, so the easiness of seaweedfs and the amazing performance (all writes are sequential as possible) |
Holding lots of files | We've had to develop our own backup script and monitoring, interfacing with SeaweedFS. Backups of the whole dataset are done twice a day and stored in S3 for a few weeks. We run SeaweedFS across 3 volume servers which all use very low resources, always replicating volumes on the 3 servers for availability and peace of mind. The Seaweed FID are stored in Mongo. | It is basically Amazon S3, but self-hosted. |
InternetArchive compares SeaweedFS with Minio | SeaweedFS 200M object upload via Python script sucessfully in about 6 days, memory usage was at a moderate 400M (~10% of RAM). Relatively constant performance at about 400 PutObject requests/s (over 5 threads, each thread was around 80 requests/s; then testing with 4 threads, each thread got to around 100 requests/s) | Problem: minio inserts slowed down after inserting 80M or more objects. |
Key-Value Store | Internet Archive built scholar citation graph with SeaweedFS as a key-value store accessible with an S3 API. Crawler Readme | we use it as component in our infrastructure for https://scholar.archive.org/ (serving e.g. thumbnails, cached compute results, etc; we've just recently published a tech report on a sub-project: https://arxiv.org/pdf/2110.06595.pdf. minio was used initially, but did not scale well in number of files. |
Store images | Evercam has used Seaweed for a few years. We've 1344TB of mostly jpegs and use the filer for folder structure. It's worked well for us, especially with low cost Hetzner SX boxes. | Question: What about your recovery times when a server fails on 1Gbs port? Answer: Also in almost 5 years we only had one server crash which was due to file-system corruption, and we overcome that as well, it was a few leveldb files which got corrupt due to which the whole XFS file-system was went down, but we recovered it. Just one drawback was: We never used the same filer for saving files, and Get speed was also quite slow on that one, but with time, the volume compaction and vacuum, everything works fine on GET requests. |
We've been running SeaweedFS in production serving images and other small files. | We're not using Filer functionality just the underlying volume storage. We wrote our own asynchronous replication on top of the volume servers since we couldn't rely on synchronous replication across datacenters. | The maintainer is super responsive and is quick to review our PRs. |
It is archiving and serving more than 40,000 images on a webapp I built for the small team I work with. | I am not a large user whatsoever but I've been using SeaweedFS for a few years now. I run SeaweedFS on two machines and it serves all images I host. | It has been simple, reliable, and robust. I really like it and hope if one of my side projects ever take off at some point, I get to test it with a much bigger load. |
We are serving and storing mostly user-uploaded images. | We are running SeaweedFS successfully in production for a few years. around 100TB. we scale regularly, though we usually only add nodes. We are slowly approaching 100 seaweed nodes. We are running in k8s on local SSD storage, managing failures is easy this way. | It works surprisingly stable and the maintainer is usually responsive when we encounter issues. We're running across multiple nodes. Removing and adding volume servers is pretty simple. You can manually fix replication via a cli command after adding/removing a node. |
京东登月平台基础架构 将图片和识别结果保存下来,用作训练数据 | SeaweedFS的设计思想源于Facebook的Haystack论文,架构和原理都很简单,性能极好,部署和维护也很方便。SeaweedFS对外提供REST接口,结合它的filer服务可实现目录管理,我们在此基础上实现了批量上传和下载功能。SeaweedFS具有rack-aware和datacenter-aware功能,可根据集群的拓扑结构(节点在机架和数据中心的分布情况)实现更可靠的数据冗余策略。目前登月平台上很多图像服务已经接入SeaweedFS,每天存储的图片数量达到600万张,存储量以每天30G的速度增长。 | Glusterfs虽然性能很好,却不适合存储海量小文件,因为它只在宏观上对数据分布作了优化,却没在微观上对文件IO作优化。登月平台上大多数前向服务都是图像识别应用,需要将图片和识别结果保存下来,用作训练数据,进行算法的迭代优化。我们在调研之后采用了SeaweedFS作为小文件存储系统。 |
My use case is pretty small-scale and it is basically used as a storage backend of a few of my personal self-hosted services. | I have 6 nodes in a cluster, most of which are cheap storage VPSes I got on sale during the holiday season, and thanks to replication and erasure coding I don't need to worry too much about one or two of them breaking or stopping service. In terms of traffic, I think most traffic is handled via a CDN frontend so I don't think a lot of requests actually hit the cluster. | I migrated to seaweedfs as a replacement of S3-compatible storage services (previously I used Wasabi) due to their cost (and some, like Scaleway Object Storage, perform badly when there are a lot of small objects, which is my case). |
Introduction
API
Configuration
- Replication
- Store file with a Time To Live
- Failover Master Server
- Erasure coding for warm storage
- Server Startup Setup
- Environment Variables
Filer
- Filer Setup
- Directories and Files
- Data Structure for Large Files
- Filer Data Encryption
- Filer Commands and Operations
- Filer JWT Use
Filer Stores
- Filer Cassandra Setup
- Filer Redis Setup
- Super Large Directories
- Path-Specific Filer Store
- Choosing a Filer Store
- Customize Filer Store
Advanced Filer Configurations
- Migrate to Filer Store
- Add New Filer Store
- Filer Store Replication
- Filer Active Active cross cluster continuous synchronization
- Filer as a Key-Large-Value Store
- Path Specific Configuration
- Filer Change Data Capture
FUSE Mount
WebDAV
Cloud Drive
- Cloud Drive Benefits
- Cloud Drive Architecture
- Configure Remote Storage
- Mount Remote Storage
- Cache Remote Storage
- Cloud Drive Quick Setup
- Gateway to Remote Object Storage
AWS S3 API
- Amazon S3 API
- AWS CLI with SeaweedFS
- s3cmd with SeaweedFS
- rclone with SeaweedFS
- restic with SeaweedFS
- nodejs with Seaweed S3
- S3 API Benchmark
- S3 API FAQ
- S3 Bucket Quota
- S3 API Audit log
- S3 Nginx Proxy
AWS IAM
Machine Learning
HDFS
- Hadoop Compatible File System
- run Spark on SeaweedFS
- run HBase on SeaweedFS
- run Presto on SeaweedFS
- Hadoop Benchmark
- HDFS via S3 connector
Replication and Backup
- Async Replication to another Filer [Deprecated]
- Async Backup
- Async Filer Metadata Backup
- Async Replication to Cloud [Deprecated]
- Kubernetes Backups and Recovery with K8up
Messaging
Use Cases
Operations
Advanced
- Large File Handling
- Optimization
- Volume Management
- Tiered Storage
- Cloud Tier
- Cloud Monitoring
- Load Command Line Options from a file
- SRV Service Discovery