Updated Optimization (markdown)

Stuart P. Bentley 2015-03-04 14:18:43 -08:00
parent 5ce13bd5b9
commit 0914619e42

@ -1,30 +1,26 @@
Optimization Strategy
## Introduction
Here are the strategies or best ways to optimize SeaweedFS. Here are the strategies or best ways to optimize SeaweedFS.
## Increase Concurrent Writes ## Increase concurrent writes
By default, SeaweedFS grows the volumes automatically. For example, for no-replication volumes, there will be concurrently 7 writable volumes allocated. By default, SeaweedFS grows the volumes automatically. For example, for no-replication volumes, there will be concurrently 7 writable volumes allocated.
If you want to distribute writes to more volumes, you can do so by instructing SeaweedFS master via this URL. If you want to distribute writes to more volumes, you can do so by instructing SeaweedFS master via this URL.
``` ```bash
curl http://localhost:9333/vol/grow?count=12&replication=001 curl http://localhost:9333/vol/grow?count=12&replication=001
``` ```
This will assign 12 volumes with 001 replication. Since 001 replication means 2 copies for the same data, this will actually consumes 24 physical volumes. This will assign 12 volumes with 001 replication. Since 001 replication means 2 copies for the same data, this will actually consumes 24 physical volumes.
## Increase Concurrent Reads ## Increase concurrent reads
Same as above, more volumes will increase read concurrency. Same as above, more volumes will increase read concurrency.
In addition, increase the replication will also help. Having the same data stored on multiple servers will surely increase read concurrency. In addition, increase the replication will also help. Having the same data stored on multiple servers will surely increase read concurrency.
## Add more Hard Drive ## Add more hard drives
More hard drive will give you better write/read throughput. More hard drives will give you better write/read throughput.
## Gzip content ## Gzip content
@ -32,7 +28,7 @@ SeaweedFS determines the file can be gzipped based on the file name extension. S
You can also manually gzip content before submission. If you do so, make sure the submitted file has file name with ends with ".gz". For example, "my.css" can be gzipped to "my.css.gz" and sent to SeaweedFS. When retrieving the content, if the http client supports "gzip" encoding, the gzipped content would be sent back. Otherwise, the unzipped content would be sent back. You can also manually gzip content before submission. If you do so, make sure the submitted file has file name with ends with ".gz". For example, "my.css" can be gzipped to "my.css.gz" and sent to SeaweedFS. When retrieving the content, if the http client supports "gzip" encoding, the gzipped content would be sent back. Otherwise, the unzipped content would be sent back.
## Memory Consumption ## Memory consumption
For volume servers, the memory consumption is tightly related to the number of files. For example, one 32G volume can easily have 1.5 million files if each file is only 20KB. To store the 1.5 million entries of meta data in memory, currently SeaweedFS consumes 36MB memory, about 24bytes per entry in memory. So if you allocate 64 volumes(2TB), you would need 2~3GB memory. However, if the average file size is larger, say 200KB, only 200~300MB memory is needed. For volume servers, the memory consumption is tightly related to the number of files. For example, one 32G volume can easily have 1.5 million files if each file is only 20KB. To store the 1.5 million entries of meta data in memory, currently SeaweedFS consumes 36MB memory, about 24bytes per entry in memory. So if you allocate 64 volumes(2TB), you would need 2~3GB memory. However, if the average file size is larger, say 200KB, only 200~300MB memory is needed.
@ -44,15 +40,15 @@ The file id generation is actually pretty trivial and you could use your own way
A file key has 3 parts: A file key has 3 parts:
1. volume id: a volume with free spaces - volume id: a volume with free spaces
2. file id: a monotonously increasing and unique number - file id: a monotonously increasing and unique number
3. file cookie: a random number, you can customize it in whichever way you want - file cookie: a random number, you can customize it in whichever way you want
You can directly ask master server to assign a file key, and replace the file id part to your own unique id, e.g., user id. You can directly ask master server to assign a file key, and replace the file id part to your own unique id, e.g., user id.
Also you can get each volume's free space from the server status. Also you can get each volume's free space from the server status.
``` ```bash
curl "http://localhost:9333/dir/status?pretty=y" curl "http://localhost:9333/dir/status?pretty=y"
``` ```
@ -66,11 +62,11 @@ Customizing the file id and/or file cookie is an acceptable behavior. "strict mo
If files are large and network is slow, the server will take time to read the file. Please increase the "-readTimeout=3" limit setting for volume server. It cut off the connection if uploading takes a longer time than the limit. If files are large and network is slow, the server will take time to read the file. Please increase the "-readTimeout=3" limit setting for volume server. It cut off the connection if uploading takes a longer time than the limit.
## Upload large files with Auto Split/Merge ### Upload large files with Auto Split/Merge
If the file is large, it's better to upload this way: If the file is large, it's better to upload this way:
``` ```bash
weed upload -maxMB=64 the_file_name weed upload -maxMB=64 the_file_name
``` ```
@ -78,16 +74,17 @@ This will split the file into data chunks of 64MB each, and upload them separate
When downloading the file, just When downloading the file, just
``` ```bash
weed download the_meta_chunk_file_id weed download the_meta_chunk_file_id
``` ```
The meta chunk has the list of file ids, with each file id on each line. So if you want to process them in parallel, you can download the meta chunk and deal with each data chunk directly. The meta chunk has the list of file ids, with each file id on each line. So if you want to process them in parallel, you can download the meta chunk and deal with each data chunk directly.
## Collection as a Simple Name Space ### Collection as a Simple Name Space
When assigning file ids, When assigning file ids,
``` ```bash
curl http://master:9333/dir/assign?collection=pictures curl http://master:9333/dir/assign?collection=pictures
curl http://master:9333/dir/assign?collection=documents curl http://master:9333/dir/assign?collection=documents
``` ```
@ -102,7 +99,7 @@ In case you need to delete them later, you can go to the volume servers and dele
When going to production, you will want to collect the logs. SeaweedFS uses glog. Here are some examples: When going to production, you will want to collect the logs. SeaweedFS uses glog. Here are some examples:
``` ```bash
> weed -v=2 master weed -v=2 master
> weed -log_dir=. volume weed -log_dir=. volume
``` ```