misc doc changes

This commit is contained in:
chrislusf 2014-12-21 23:06:42 -08:00
parent 986797c1c8
commit 33602a03a4
4 changed files with 33 additions and 36 deletions

View file

@ -1,7 +1,10 @@
Benchmarks
======================
Do we really need the benchmark? People always use benchmark to compare systems. But benchmarks are misleading. The resources, e.g., CPU, disk, memory, network, all matter a lot. And with Seaweed File System, single node vs multiple nodes, benchmarking on one machine vs several multiple machines, all matter a lot.
Do we really need the benchmark? People always use benchmark to compare systems.
But benchmarks are misleading. The resources, e.g., CPU, disk, memory, network,
all matter a lot. And with Seaweed File System, single node vs multiple nodes,
benchmarking on one machine vs several multiple machines, all matter a lot.
Here is the steps on how to run benchmark if you really need some numbers.
@ -25,9 +28,13 @@ For more realistic tests, please start them on different machines.
What does the test do?
#############################
By default, the benchmark command would start writing 1 million files, each having 1KB size, uncompressed. For each file, one request is sent to assign a file key, and a second request is sent to post the file to the volume server. The written file keys are stored in a temp file.
By default, the benchmark command would start writing 1 million files, each having 1KB size, uncompressed.
For each file, one request is sent to assign a file key, and a second request is sent to post the file to the volume server.
The written file keys are stored in a temp file.
Then the benchmark command would read the list of file keys, randomly read 1 million files. For each volume, the volume id is cached, so there is several request to lookup the volume id, and all the rest requests are to get the file content.
Then the benchmark command would read the list of file keys, randomly read 1 million files.
For each volume, the volume id is cached, so there is several request to lookup the volume id,
and all the rest requests are to get the file content.
Many options are options are configurable. Please check the help content:
@ -35,25 +42,11 @@ Many options are options are configurable. Please check the help content:
./weed benchmark -h
Common Problems
Different Benchmark Target
###############################
The most common
I start weed servers in one console for simplicity. Better run servers on different consoles.
For more realistic tests, please start them on different machines.
.. code-block:: bash
# prepare directories
mkdir 3 4 5
# start 3 servers
./weed server -dir=./3 -master.port=9333 -volume.port=8083 &
./weed volume -dir=./4 -port=8084 &
./weed volume -dir=./5 -port=8085 &
./weed benchmark -server=localhost:9333
problem is "too many open files" error. This is because the test itself starts too many network connections on one single machine. In my local macbook, if I ran "random read" following writing right away, the error happens always. I have to run "weed benchmark -write=false" to run the reading test only. Also, changing the concurrency level to "-c=16" would also help.
The default "weed benchmark" uses 1 million 1KB file. This is to stress the number of files per second.
Increasing the file size to 100KB or more can show much larger number of IO throughput in KB/second.
My own unscientific single machine results
###################################################
@ -171,4 +164,6 @@ Create benchmark volumes directly
99% 9.4 ms
100% 256.9 ms
How can the replication 001 writes faster than no replication?
I could not tell. Very likely, the computer was in turbo mode. I can not reproduce it consistently either. Posted the number here just to illustrate that number lies. Don't quote on the exact number, just get an idea of the performance would be good enough.
I could not tell. Very likely, the computer was in turbo mode.
I can not reproduce it consistently either. Posted the number here just to illustrate that number lies.
Don't quote on the exact number, just get an idea of the performance would be good enough.

View file

@ -5,20 +5,20 @@ Clients
###################################
+---------------------------------------------------------------------------------+--------------+-----------+
| Name | Author | Language |
+---------------------------------------------------------------------------------+--------------+-----------+
+=================================================================================+==============+===========+
| `WeedPHP <https://github.com/micjohnson/weed-php/>`_ | Mic Johnson | PHP |
+---------------------------------------------------------------------------------+--------------+-----------+
| `Seaweed-FS Symfony bundle <https://github.com/micjohnson/weed-php-bundle>`_ | Mic Johnson | PHP |
| `Seaweed-FS Symfony bundle <https://github.com/micjohnson/weed-php-bundle>`_ | Mic Johnson | PHP |
+---------------------------------------------------------------------------------+--------------+-----------+
| `Seaweed-FS Node.js client <https://github.com/cruzrr/node-weedfs>`_ | Aaron Blakely| Javascript|
| `Seaweed-FS Node.js client <https://github.com/cruzrr/node-weedfs>`_ | Aaron Blakely| Javascript|
+---------------------------------------------------------------------------------+--------------+-----------+
| `Amazon S3 API for Seaweed-FS <https://github.com/tgulacsi/s3weed>`_ | Tamás Gulácsi| Go |
| `Amazon S3 API for Seaweed-FS <https://github.com/tgulacsi/s3weed>`_ | Tamás Gulácsi| Go |
+---------------------------------------------------------------------------------+--------------+-----------+
| `File store upload test <https://github.com/tgulacsi/filestore-upload-test>`_ | Tamás Gulácsi| Go |
+---------------------------------------------------------------------------------+--------------+-----------+
| `Java Seaweed-FS client <https://github.com/simplebread/WeedFSClient>`_ | Xu Zhang | Java |
| `Java Seaweed-FS client <https://github.com/simplebread/WeedFSClient>`_ | Xu Zhang | Java |
+---------------------------------------------------------------------------------+--------------+-----------+
| `Java Seaweed-FS client 2 <https://github.com/zenria/Weed-FS-Java-Client>`_ | Zenria | Java |
| `Java Seaweed-FS client 2 <https://github.com/zenria/Weed-FS-Java-Client>`_ | Zenria | Java |
+---------------------------------------------------------------------------------+--------------+-----------+
| `Python-weed <https://github.com/darkdarkfruit/python-weed>`_ | Darkdarkfruit| Python |
+---------------------------------------------------------------------------------+--------------+-----------+
@ -26,7 +26,7 @@ Clients
+---------------------------------------------------------------------------------+--------------+-----------+
| `Camlistore blobserver Storage <https://github.com/tgulacsi/camli-weed>`_ | Tamás Gulácsi| Go |
+---------------------------------------------------------------------------------+--------------+-----------+
| `Scala Seaweed-FS client <https://github.com/chiradip/WeedFsScalaClient>`_ | Chiradip | Scala |
| `Scala Seaweed-FS client <https://github.com/chiradip/WeedFsScalaClient>`_ | Chiradip | Scala |
+---------------------------------------------------------------------------------+--------------+-----------+
| `Module for kohana <https://github.com/bububa/kohanaphp-weedfs>`_ | Bububa | PHP |
+---------------------------------------------------------------------------------+--------------+-----------+

View file

@ -39,7 +39,7 @@ A common file system would use inode to store meta data for each folder and file
Seaweed-FS wants to make as small number of disk access as possible, yet still be able to store a lot of file metadata. So we need to think very differently.
From a full file path to get to the file content, there are several steps:
We can take the following steps to map a full file path to the actual data block:
.. code-block:: bash
@ -48,7 +48,7 @@ From a full file path to get to the file content, there are several steps:
file_id => data_block
Because default Seaweed-FS only provides file_id=>data_block mapping, the first 2 steps need to be implemented.
Because default Seaweed-FS only provides file_id=>data_block mapping, only the first 2 steps need to be implemented.
There are several data features I noticed:
@ -72,7 +72,7 @@ I believe these are reasonable assumptions:
Data structure
#################
This difference lead to the design that the metadata for directories and files should have different data structure.
This assumed differences between directories and files lead to the design that the metadata for directories and files should have different data structure.
* Store directories in memory
@ -100,16 +100,18 @@ For file renaming, it's just trivially delete and then add a row in leveldb.
Details
########################
In the current first version, the path_to_file=>file_id mapping is stored with an efficient embedded leveldb. Being embedded, it runs on single machine. So it's not linearly scalable yet. However, it can handle LOTS AND LOTS of files on weed-fs on other servers. Using an external distributed database is possible. Your contribution is welcome!
In the current first version, the path_to_file=>file_id mapping is stored with an efficient embedded leveldb. Being embedded, it runs on single machine. So it's not linearly scalable yet. However, it can handle LOTS AND LOTS of files on Seaweed-FS on other master/volume servers.
The in-memory directory structure can improve on memory efficiency. Current simple map in memory works when the number of directories is less than 1 million, which will use about 500MB memory. But I would highly doubt any common use case would have more than 100 directories.
Switching from the embedded leveldb to an external distributed database is very feasible. Your contribution is welcome!
The in-memory directory structure can improve on memory efficiency. Current simple map in memory works when the number of directories is less than 1 million, which will use about 500MB memory. But I would expect common use case would have a few, not even more than 100 directories.
Use Cases
#########################
Clients can assess one "weed filer" via HTTP, list files under a directory, create files via HTTP POST, read files via HTTP POST directly.
Although one "weed filer" can only sits in one machine, you can start multiple "weed filer" on several machines, each "weed filer" instance running in its own collection, having its own namespace, but sharing the same weed-fs.
Although one "weed filer" can only sits in one machine, you can start multiple "weed filer" on several machines, each "weed filer" instance running in its own collection, having its own namespace, but sharing the same Seaweed-FS storage.
Future
###################
@ -127,6 +129,6 @@ Later, FUSE or HCFS plugins will be created, to really integrate Seaweed-FS to e
Helps Wanted
########################
This is a big step towards more interesting weed-fs usage and integration with existing systems.
This is a big step towards more interesting Seaweed-FS usage and integration with existing systems.
If you can help to refactor and implement other directory meta data, or file meta data storage, please do so.

View file

@ -29,8 +29,8 @@ Contents:
replication
ttl
failover
usecases
directories
usecases
optimization
benchmarks