Updated Optimization (markdown)

Chris Lu 2020-02-27 12:21:10 -08:00
parent 0fab1c9dbb
commit 243e186c1f

@ -35,6 +35,16 @@ curl http://localhost:9333/vol/grow?count=12&collection=benchmark
curl http://localhost:9333/vol/grow?count=12&dataCenter=dc1 curl http://localhost:9333/vol/grow?count=12&dataCenter=dc1
``` ```
Another way to change the volume growth strategy is to use `master.toml` generated by `weed scaffold -conf=master`. Adjust the following section:
```
[master.volume_growth]
count_1 = 7 # create 1 x 7 = 7 actual volumes
count_2 = 6 # create 2 x 6 = 12 actual volumes
count_3 = 3 # create 3 x 3 = 9 actual volumes
count_other = 1 # create n x 1 = n actual volumes
```
## Increase concurrent reads ## Increase concurrent reads
Same as above, more volumes will increase read concurrency. Same as above, more volumes will increase read concurrency.
@ -53,27 +63,14 @@ The SeaweedFS usually only open a few actual disk files. But the network file re
For volume servers, the memory consumption is tightly related to the number of files. For example, one 32G volume can easily have 1.5 million files if each file is only 20KB. To store the 1.5 million entries of meta data in memory, currently SeaweedFS consumes 36MB memory, about 24bytes per entry in memory. So if you allocate 64 volumes(2TB), you would need 2~3GB memory. However, if the average file size is larger, say 200KB, only 200~300MB memory is needed. For volume servers, the memory consumption is tightly related to the number of files. For example, one 32G volume can easily have 1.5 million files if each file is only 20KB. To store the 1.5 million entries of meta data in memory, currently SeaweedFS consumes 36MB memory, about 24bytes per entry in memory. So if you allocate 64 volumes(2TB), you would need 2~3GB memory. However, if the average file size is larger, say 200KB, only 200~300MB memory is needed.
SeaweedFS also has leveldb, boltdb, and btree mode support, which reduces memory consumption even more. SeaweedFS also has leveldb mode support, which reduces memory consumption even more.
To use it, "weed server -volume.index=[memory|leveldb|boltdb|btree]", or "weed volume -index=[memory|leveldb|boltdb|btree]". You can switch between the 4 modes any time, as often as possible. If the files for leveldb or boltdb is outdated or missing, they will be re-generated as needed. To use it, "weed server -volume.index=[memory|leveldb|leveldbMedium|leveldbLarge]", or "weed volume -index=[memory|leveldb|leveldbMedium|leveldbLarge]". You can switch between the 4 modes any time, as often as possible. If the files for leveldb is outdated or missing, they will be re-generated as needed.
boltdb is fairly slow to write, about 6 minutes for recreating index for 1553934 files. Boltdb loads 1,553,934 x 16 = 24,862,944bytes from disk, and generate the boltdb as large as 134,217,728 bytes in 6 minutes. To test the memory consumption, the leveldb index are created. There are 7 volumes in benchmark collection, each with about 1553K files. The server is restarted, then I start the benchmark tool to read lots of files.
To compare, leveldb recreates index as large as 27,188,148 bytes in 8 seconds.
To test the memory consumption, the leveldb or boltdb index are created. There are 7 volumes in benchmark collection, each with about 1553K files. The server is restarted, then I start the benchmark tool to read lots of files.
For leveldb, server memory starts at 142,884KB, and stays at 179,340KB. For leveldb, server memory starts at 142,884KB, and stays at 179,340KB.
For boltdb, server memory starts at 73,756KB, and stays at 144,564KB.
For in-memory, server memory starts at 368,152KB, and stays at 448,032KB. For in-memory, server memory starts at 368,152KB, and stays at 448,032KB.
To test the write speed, I use the benchmark tool with default parameters.
For boltdb, the write is about 4.1MB/s, 4.1K files/s
For leveldb, the writes is about 10.4MB/s, 10.4K files/s
For in-memory, it is a tiny bit faster, not statistically different. But I am using SSD, and os buffer cache also affect the numbers. So your results may be different.
Btree mode is added in v0.75, to optimize memory for out-of-order customized file key. Btree mode can cost more memory for normal file key assigned by SeaweedFS master, but are usually more efficient than customized file key. Please test for your cases.
Note: BoltDB has a limit that the max db size is 256MB on 32bit systems.
## Insert with your own keys ## Insert with your own keys
The file id generation is actually pretty trivial and you could use your own way to generate the file keys. The file id generation is actually pretty trivial and you could use your own way to generate the file keys.