Updated Erasure Coding for warm storage (markdown)

Chris Lu 2019-06-10 21:52:05 -07:00
parent 1d6a228d05
commit f5f11a63df

@ -1,13 +1,14 @@
Warm data are accessed less frequently. To store them more efficiently, you can enable erasure coding.
## Benefit
* Storage Efficiency: SeaweedFS implemented RS(10,4), which allows loss of 4 shards of data with 1.4x data size. Compared to replicating data 5 times to achieve the same robustness, it saves 3.6x disk space.
* Fast read speed: SeaweedFS uses continuous 1GB block layout with 1MB block sizes for edge cases, optimized for both small file reads and storage efficiency.
* High availability: If up to 4 shards are down, the data is still accessible with reasonable speed.
* Efficient memory usage and faster startup time. The volume server does not load index data into memory.
* Rack-Aware data placement to minimize impact of volume server and rack failures.
* No requirement for large amount of servers. SeaweedFS manage erasure coding data via volumes. If the number of servers is less than 4, this can protect against hard drive failures. If the number of servers is greater than 4, this can protect against server failures. If the number of racks is greater than 4, this can protect against rack failures.
* Optimized for small files: there are no file size requirement for EC to be effective.
* **Storage Efficiency**: SeaweedFS implemented RS(10,4), which allows loss of 4 shards of data with 1.4x data size. Compared to replicating data 5 times to achieve the same robustness, it saves 3.6x disk space.
* **Fast Read Speed**: SeaweedFS uses continuous 1GB block layout with 1MB block sizes for edge cases, optimized for both small file reads and storage efficiency.
* **Optimized for Small Files**: there are no file size requirement for EC to be effective.
* **High Availability**: If up to 4 shards are down, the data is still accessible with reasonable speed.
* **Memory Efficiency** Minimum memory usage. The volume server does not load index data into memory.
* **Fast Startup** Startup time is much shorter by skip loading index data into memory.
* **Rack-Aware** data placement to minimize impact of volume server and rack failures.
* **No Minimum Server Limit** No requirement for large amount of servers. SeaweedFS manage erasure coding data via volumes. If the number of servers is less than 4, this can protect against hard drive failures. If the number of servers is greater than 4, this can protect against server failures. If the number of racks is greater than 4, this can protect against rack failures.
## How to enable it?
Run `weed scaffold -conf=master` to generate a `master.toml` file, put it in current directory, `~/.seaweedfs/`, or `/etc/seaweedfs/`.