diff --git a/Filer-Store-Replication.md b/Filer-Store-Replication.md index 3cdfa25..dd19e32 100644 --- a/Filer-Store-Replication.md +++ b/Filer-Store-Replication.md @@ -1,12 +1,27 @@ -Here we talk about using `weed filer -peers=...`, which achieves two purposes: +# Parallel filers with embedded filer store + +If one filer is not enough, you can add more filers. This seems easy with shared filer stores, such as Redis, MySql, Postgres, Cassandra, HBase, etc. + +But did you notice this also works for embedded filer stores, such as LevelDB, RocksDB, SQLite, etc? + +How is it possible? + +# Automatic Peer Discovery + +When a filer starts up, it will report itself to the master. So the master knows all the filers. It will keep each filer updated about its peers (Since version 2.77). + +# Metadata synchronization + +Knowing all the peers, one filer will keep its own metadata updated: + 1. Aggregate filer meta data changes from peers -2. Replay filer meta data changes to local filer store +2. Replay filer meta data changes to local filer store, if it is an embedded store. -# FUSE mount with multiple filers +## Aggregate metadata updates -The first point is tightly related to FUSE Mount, which streams filer meta data changes from one filer. +This is tightly related to FUSE Mount, which streams filer meta data changes from one filer. When using multiple filers but without peer file metadata updates, a FUSE mount can only see the changes applied to the connected filer. -So when using multiple filers, this `-peers=xxx` option is needed. If not, a FUSE mount can only see the changes applied to the connected filer. This is required when the filers are using either shared or dedicated filer stores. +So aggregating metadata updates form its peers is required when the filers are using either shared or dedicated filer stores. ``` FUSE mount <----> filer1 -- filer2 @@ -15,39 +30,11 @@ So when using multiple filers, this `-peers=xxx` option is needed. If not, a FUS filer3 ``` -# File Store Replication +# Persist metadata changes to local embedded store -The second point is about metadata replication. - -This `-peers=...` can synchronize the meta data in the filer stores. If filers are using shared filer stores, this is optional. - -It can also enables Active-Active or one-directional replication. - -## Use Cases - -For filer stores using shared filer stores, such as shared Mysql/Postgres/Cassandra/Redis/Sqlite/ElasticSearch/etc in [[Filer-Stores]], this is not really needed, since all filers are stateless, and there are no need to replicate the meta data back to the same filer store. - -But if each filer has its own filer store, usually with the default local Leveldb, or even with a dedicated Mysql/Postgres/Cassandra/Redis/Sqlite/etc store, this would be very useful. - -Sometimes you may want to replicate the existing store to a new filer store, or move to a new filer store, this would also be useful. - -### One-Directional Replication - -When starting a filer, set the `-peers` option, to receive updates from the peers. - -Assuming there is a separate filer.toml for each filer, and a filer is already running at `localhost:8888`, this command will replicate metadata in `localhost:8888` to `localhost:8889`. - -``` -weed filer -port=8889 -peers=localhost:8888 -``` - -### Active-Active Replication - -``` -weed filer -port=8888 -peers=localhost:8888,localhost:8889 -weed filer -port=8889 -peers=localhost:8888,localhost:8889 -``` +If the filer is running on embedded store, the metadata updates from its peers would be saved locally. +This basically synchronize the metadata across all the filer stores. If filers are using shared filer stores, this is optional. # Example Topologies @@ -56,41 +43,35 @@ weed filer -port=8889 -peers=localhost:8888,localhost:8889 ``` filer1(leveldb) <-> filer2(leveldb) <-> filer3(leveldb) -weed filer -peers=,, - ``` * Two filers are fine. There is no requirements for number of filers. ``` filer1(leveldb) <-> filer2(leveldb) - -weed filer -peers=, - ``` -* Two filers with different stores are also fine. Of course, you will need a different `filer.toml`. +* Two filers with different embedded stores are also fine. Of course, you will need a different `filer.toml`. ``` -filer1(leveldb) <-> filer2(elastic search) - -weed filer -peers=, - +filer1(leveldb) <-> filer2(rocksdb) ``` -* Master-Slave mode for filers with different stores. +* Two filers with one shared stores are fine. ``` -filer1(leveldb) --> filer2(elastic search) +filer1(mysql) <-> filer2(mysql) +``` -# start filer2 as this. -weed filer -peers= +* Two filers with a shared store and an embedded store are NOT fine. +``` +filer1(leveldb) <--XX NOT WORKING XX---> filer2(mysql) ``` # How is it implemented? -Each filer has a local meta data change log. When starting with `-peers` setting, each filer will subscribe to meta data changes from its peers and apply to local filer store. +Each filer has a local meta data change log. When starting, each filer will subscribe to meta data changes from its peers and apply to local filer store. Each filer store will auto generate a unique `filer.store.id`. So for shared filer stores, such as mysql/postgres/redis, there is no need to setup peers because the `filer.store.id` will be the same. @@ -100,4 +81,4 @@ It is actually OK if you need to change filer IP or port. The replication can st # Limitation -Multiple filers with local leveldb filer stores can work well with the `-peers` configured. However, this layout does not work well with `weed filer.sync` cross data center replication as of now. This is because currently `weed filer.sync` use `filer.store.id` to identify data that needs to be replicated. Having multiple `filer.store.id` will confuse the `weed filer.sync`. +Multiple filers with local leveldb filer stores can work well. However, this layout does not work well with `weed filer.sync` cross data center replication as of now. This is because currently `weed filer.sync` use `filer.store.id` to identify data that needs to be replicated. Having multiple `filer.store.id` will confuse the `weed filer.sync`.