Merge pull request #3060 from natmaka/master

This commit is contained in:
Chris Lu 2022-05-16 07:42:58 -07:00 committed by GitHub
commit 72e7dcde51
No known key found for this signature in database
GPG key ID: 4AEE18F83AFDEB23
5 changed files with 22 additions and 24 deletions

View file

@ -31,13 +31,11 @@ Your support will be really appreciated by me and other supporters!
</p>
-->
### Gold Sponsors
- [![nodion](https://www.nodion.com/img/logo.svg)](https://www.nodion.com)
---
- [Download Binaries for different platforms](https://github.com/chrislusf/seaweedfs/releases/latest)
- [SeaweedFS on Slack](https://join.slack.com/t/seaweedfs/shared_invite/enQtMzI4MTMwMjU2MzA3LTEyYzZmZWYzOGQ3MDJlZWMzYmI0OTE4OTJiZjJjODBmMzUxNmYwODg0YjY3MTNlMjBmZDQ1NzQ5NDJhZWI2ZmY)
- [SeaweedFS on Twitter](https://twitter.com/SeaweedFS)
@ -61,7 +59,7 @@ Table of Contents
* [Additional Features](#additional-features)
* [Filer Features](#filer-features)
* [Example: Using Seaweed Object Store](#example-Using-Seaweed-Object-Store)
* [Architecture](#architecture)
* [Architecture](#Object-Store-Architecture)
* [Compared to Other File Systems](#compared-to-other-file-systems)
* [Compared to HDFS](#compared-to-hdfs)
* [Compared to GlusterFS, Ceph](#compared-to-glusterfs-ceph)
@ -127,7 +125,7 @@ Faster and Cheaper than direct cloud storage!
## Additional Features ##
* Can choose no replication or different replication levels, rack and data center aware.
* Automatic master servers failover - no single point of failure (SPOF).
* Automatic Gzip compression depending on file mime type.
* Automatic Gzip compression depending on file MIME type.
* Automatic compaction to reclaim disk space after deletion or update.
* [Automatic entry TTL expiration][VolumeServerTTL].
* Any server with some disk spaces can add to the total storage space.
@ -206,7 +204,7 @@ SeaweedFS uses HTTP REST operations to read, write, and delete. The responses ar
### Write File ###
To upload a file: first, send a HTTP POST, PUT, or GET request to `/dir/assign` to get an `fid` and a volume server url:
To upload a file: first, send a HTTP POST, PUT, or GET request to `/dir/assign` to get an `fid` and a volume server URL:
```
> curl http://localhost:9333/dir/assign
@ -255,7 +253,7 @@ First look up the volume server's URLs by the file's volumeId:
Since (usually) there are not too many volume servers, and volumes don't move often, you can cache the results most of the time. Depending on the replication type, one volume can have multiple replica locations. Just randomly pick one location to read.
Now you can take the public url, render the url or directly read from the volume server via url:
Now you can take the public URL, render the URL or directly read from the volume server via URL:
```
http://localhost:8080/3,01637037d6.jpg
@ -356,9 +354,9 @@ On each write request, the master server also generates a file key, which is a g
### Write and Read files ###
When a client sends a write request, the master server returns (volume id, file key, file cookie, volume node url) for the file. The client then contacts the volume node and POSTs the file content.
When a client sends a write request, the master server returns (volume id, file key, file cookie, volume node URL) for the file. The client then contacts the volume node and POSTs the file content.
When a client needs to read a file based on (volume id, file key, file cookie), it asks the master server by the volume id for the (volume node url, volume node public url), or retrieves this from a cache. Then the client can GET the content, or just render the URL on web pages and let browsers fetch the content.
When a client needs to read a file based on (volume id, file key, file cookie), it asks the master server by the volume id for the (volume node URL, volume node public URL), or retrieves this from a cache. Then the client can GET the content, or just render the URL on web pages and let browsers fetch the content.
Please see the example for details on the write-read process.
@ -412,7 +410,7 @@ The architectures are mostly the same. SeaweedFS aims to store and read files fa
* SeaweedFS optimizes for small files, ensuring O(1) disk seek operation, and can also handle large files.
* SeaweedFS statically assigns a volume id for a file. Locating file content becomes just a lookup of the volume id, which can be easily cached.
* SeaweedFS Filer metadata store can be any well-known and proven data stores, e.g., Redis, Cassandra, HBase, Mongodb, Elastic Search, MySql, Postgres, Sqlite, MemSql, TiDB, CockroachDB, Etcd, YDB etc, and is easy to customized.
* SeaweedFS Filer metadata store can be any well-known and proven data store, e.g., Redis, Cassandra, HBase, Mongodb, Elastic Search, MySql, Postgres, Sqlite, MemSql, TiDB, CockroachDB, Etcd, YDB etc, and is easy to customize.
* SeaweedFS Volume server also communicates directly with clients via HTTP, supporting range queries, direct uploads, etc.
| System | File Metadata | File Content Read| POSIX | REST API | Optimized for large number of small files |
@ -448,9 +446,9 @@ Ceph can be setup similar to SeaweedFS as a key->blob store. It is much more com
SeaweedFS has a centralized master group to look up free volumes, while Ceph uses hashing and metadata servers to locate its objects. Having a centralized master makes it easy to code and manage.
Same as SeaweedFS, Ceph is also based on the object store RADOS. Ceph is rather complicated with mixed reviews.
Ceph, like SeaweedFS, is based on the object store RADOS. Ceph is rather complicated with mixed reviews.
Ceph uses CRUSH hashing to automatically manage the data placement, which is efficient to locate the data. But the data has to be placed according to the CRUSH algorithm. Any wrong configuration would cause data loss. Topology changes, such as adding new servers to increase capacity, will cause data migration with high IO cost to fit the CRUSH algorithm. SeaweedFS places data by assigning them to any writable volumes. If writes to one volume failed, just pick another volume to write. Adding more volumes are also as simple as it can be.
Ceph uses CRUSH hashing to automatically manage data placement, which is efficient to locate the data. But the data has to be placed according to the CRUSH algorithm. Any wrong configuration would cause data loss. Topology changes, such as adding new servers to increase capacity, will cause data migration with high IO cost to fit the CRUSH algorithm. SeaweedFS places data by assigning them to any writable volumes. If writes to one volume failed, just pick another volume to write. Adding more volumes is also as simple as it can be.
SeaweedFS is optimized for small files. Small files are stored as one continuous block of content, with at most 8 unused bytes between files. Small file access is O(1) disk read.
@ -499,7 +497,7 @@ Step 1: install go on your machine and setup the environment by following the in
https://golang.org/doc/install
make sure you set up your $GOPATH
make sure to define your $GOPATH
Step 2: checkout this repo:
@ -536,7 +534,7 @@ Write 1 million 1KB file:
```
Concurrency Level: 16
Time taken for tests: 66.753 seconds
Complete requests: 1048576
Completed requests: 1048576
Failed requests: 0
Total transferred: 1106789009 bytes
Requests per second: 15708.23 [#/sec]
@ -562,7 +560,7 @@ Randomly read 1 million files:
```
Concurrency Level: 16
Time taken for tests: 22.301 seconds
Complete requests: 1048576
Completed requests: 1048576
Failed requests: 0
Total transferred: 1106812873 bytes
Requests per second: 47019.38 [#/sec]

View file

@ -8,9 +8,9 @@ mkdir tmp
docker stop s3test-instance || echo "already stopped"
ulimit -n 10000
../../../weed/weed server -filer -s3 -volume.max 0 -master.volumeSizeLimitMB 5 -dir "$(pwd)/tmp" 1>&2>weed.log &
../../../weed/weed server -filer -s3 -volume.max 0 -master.volumeSizeLimitMB 5 -dir "$(pwd)/tmp" 1>&2>weed.log &
until $(curl --output /dev/null --silent --head --fail http://127.0.0.1:9333); do
until curl --output /dev/null --silent --head --fail http://127.0.0.1:9333; do
printf '.'
sleep 5
done
@ -18,7 +18,7 @@ sleep 3
rm -Rf logs-full.txt logs-summary.txt
# docker run --name s3test-instance --rm -e S3TEST_CONF=s3tests.conf -v `pwd`/s3tests.conf:/s3-tests/s3tests.conf -it s3tests ./virtualenv/bin/nosetests s3tests_boto3/functional/test_s3.py:test_get_obj_tagging -v -a 'resource=object,!bucket-policy,!versioning,!encryption'
docker run --name s3test-instance --rm -e S3TEST_CONF=s3tests.conf -v `pwd`/s3tests.conf:/s3-tests/s3tests.conf -it s3tests ./virtualenv/bin/nosetests s3tests_boto3/functional/test_s3.py -v -a 'resource=object,!bucket-policy,!versioning,!encryption' | sed -n -e '/botocore.hooks/!p;//q' | tee logs-summary.txt
docker run --name s3test-instance --rm -e S3TEST_CONF=s3tests.conf -v "$(pwd)"/s3tests.conf:/s3-tests/s3tests.conf -it s3tests ./virtualenv/bin/nosetests s3tests_boto3/functional/test_s3.py -v -a 'resource=object,!bucket-policy,!versioning,!encryption' | sed -n -e '/botocore.hooks/!p;//q' | tee logs-summary.txt
docker stop s3test-instance || echo "already stopped"
killall -9 weed

View file

@ -24,7 +24,7 @@ var (
/*
This is to resolve an one-time issue that caused inconsistency with .dat and .idx files.
In this case, the .dat file contains all data, but some of deletion caused incorrect offset.
In this case, the .dat file contains all data, but some deletion caused incorrect offset.
The .idx has all correct offsets.
1. fix the .dat file, a new .dat_fixed file will be generated.

View file

@ -41,7 +41,7 @@ func AutocompleteMain(commands []*Command) bool {
func installAutoCompletion() bool {
if runtime.GOOS == "windows" {
fmt.Println("windows is not supported")
fmt.Println("Windows is not supported")
return false
}
@ -56,7 +56,7 @@ func installAutoCompletion() bool {
func uninstallAutoCompletion() bool {
if runtime.GOOS == "windows" {
fmt.Println("windows is not supported")
fmt.Println("Windows is not supported")
return false
}
@ -65,7 +65,7 @@ func uninstallAutoCompletion() bool {
fmt.Printf("uninstall failed! %s\n", err)
return false
}
fmt.Printf("autocompletion is disable. Please restart your shell.\n")
fmt.Printf("autocompletion is disabled. Please restart your shell.\n")
return true
}

View file

@ -74,14 +74,14 @@ func init() {
var cmdBenchmark = &Command{
UsageLine: "benchmark -master=localhost:9333 -c=10 -n=100000",
Short: "benchmark on writing millions of files and read out",
Short: "benchmark by writing millions of files and reading them out",
Long: `benchmark on an empty SeaweedFS file system.
Two tests during benchmark:
1) write lots of small files to the system
2) read the files out
The file content is mostly zero, but no compression is done.
The file content is mostly zeros, but no compression is done.
You can choose to only benchmark read or write.
During write, the list of uploaded file ids is stored in "-list" specified file.
@ -468,7 +468,7 @@ func (s *stats) printStats() {
timeTaken := float64(int64(s.end.Sub(s.start))) / 1000000000
fmt.Printf("\nConcurrency Level: %d\n", *b.concurrency)
fmt.Printf("Time taken for tests: %.3f seconds\n", timeTaken)
fmt.Printf("Complete requests: %d\n", completed)
fmt.Printf("Completed requests: %d\n", completed)
fmt.Printf("Failed requests: %d\n", failed)
fmt.Printf("Total transferred: %d bytes\n", transferred)
fmt.Printf("Requests per second: %.2f [#/sec]\n", float64(completed)/timeTaken)