Updated Filer Store Replication (markdown)

2024-01-19 02:48:24 +00:00 · 2021-11-07 14:41:51 -08:00 · 2021-11-07 14:41:51 -08:00 · f30e537390
parent b4966f6a01
commit f30e537390
1 changed files with 33 additions and 52 deletions
--- a/Filer-Store-Replication.md
+++ b/Filer-Store-Replication.md
@ -1,12 +1,27 @@
-Here we talk about using `weed filer -peers=...`, which achieves two purposes:
+# Parallel filers with embedded filer store
 If one filer is not enough, you can add more filers. This seems easy with shared filer stores, such as Redis, MySql, Postgres, Cassandra, HBase, etc.
 But did you notice this also works for embedded filer stores, such as LevelDB, RocksDB, SQLite, etc?
 How is it possible?
 # Automatic Peer Discovery
 When a filer starts up, it will report itself to the master. So the master knows all the filers. It will keep each filer updated about its peers (Since version 2.77).
 # Metadata synchronization
 Knowing all the peers, one filer will keep its own metadata updated:
 1. Aggregate filer meta data changes from peers
-2. Replay filer meta data changes to local filer store
+2. Replay filer meta data changes to local filer store, if it is an embedded store.
-# FUSE mount with multiple filers
+## Aggregate metadata updates
-The first point is tightly related to FUSE Mount, which streams filer meta data changes from one filer. 
+This is tightly related to FUSE Mount, which streams filer meta data changes from one filer. When using multiple filers but without peer file metadata updates, a FUSE mount can only see the changes applied to the connected filer. 
-So when using multiple filers, this `-peers=xxx` option is needed. If not, a FUSE mount can only see the changes applied to the connected filer. This is required when the filers are using either shared or dedicated filer stores.
+So aggregating metadata updates form its peers is required when the filers are using either shared or dedicated filer stores.
 ```
  FUSE mount <----> filer1 -- filer2
@ -15,39 +30,11 @@ So when using multiple filers, this `-peers=xxx` option is needed. If not, a FUS
                         filer3
 ```
-# File Store Replication
+# Persist metadata changes to local embedded store
-The second point is about metadata replication. 
+If the filer is running on embedded store, the metadata updates from its peers would be saved locally.
 This `-peers=...` can synchronize the meta data in the filer stores. If filers are using shared filer stores, this is optional.
 It can also enables Active-Active or one-directional replication.
 ## Use Cases
 For filer stores using shared filer stores, such as shared Mysql/Postgres/Cassandra/Redis/Sqlite/ElasticSearch/etc in [[Filer-Stores]], this is not really needed, since all filers are stateless, and there are no need to replicate the meta data back to the same filer store.
 But if each filer has its own filer store, usually with the default local Leveldb, or even with a dedicated Mysql/Postgres/Cassandra/Redis/Sqlite/etc store, this would be very useful.
 Sometimes you may want to replicate the existing store to a new filer store, or move to a new filer store, this would also be useful.
 ### One-Directional Replication
 When starting a filer, set the `-peers` option, to receive updates from the peers.
 Assuming there is a separate filer.toml for each filer, and a filer is already running at `localhost:8888`, this command will replicate metadata in `localhost:8888` to `localhost:8889`.
 ```
 weed filer -port=8889 -peers=localhost:8888
 ```
 ### Active-Active Replication
 ```
 weed filer -port=8888 -peers=localhost:8888,localhost:8889
 weed filer -port=8889 -peers=localhost:8888,localhost:8889
 ```
 This basically synchronize the metadata across all the filer stores. If filers are using shared filer stores, this is optional.
 # Example Topologies
@ -56,41 +43,35 @@ weed filer -port=8889 -peers=localhost:8888,localhost:8889
 ```
 filer1(leveldb) <-> filer2(leveldb) <-> filer3(leveldb) 
 weed filer -peers=<filer1:port1>,<filer2:port2>,<filer3:port3>
 ```
 * Two filers are fine. There is no requirements for number of filers.
 ```
 filer1(leveldb) <-> filer2(leveldb)
 weed filer -peers=<filer1:port1>,<filer2:port2>
 ```
-* Two filers with different stores are also fine. Of course, you will need a different `filer.toml`.
+* Two filers with different embedded stores are also fine. Of course, you will need a different `filer.toml`.
 ```
-filer1(leveldb) <-> filer2(elastic search)
+filer1(leveldb) <-> filer2(rocksdb)
 weed filer -peers=<filer1:port1>,<filer2:port2>
 ```
-* Master-Slave mode for filers with different stores.
+* Two filers with one shared stores are fine.
 ```
-filer1(leveldb) --> filer2(elastic search)
+filer1(mysql) <-> filer2(mysql)
 ```
-# start filer2 as this. 
+* Two filers with a shared store and an embedded store are NOT fine.
 weed filer -peers=<filer1:port1>
 ```
 filer1(leveldb) <--XX NOT WORKING XX---> filer2(mysql)
 ```
 # How is it implemented?
-Each filer has a local meta data change log. When starting with `-peers` setting, each filer will subscribe to meta data changes from its peers and apply to local filer store.
+Each filer has a local meta data change log. When starting, each filer will subscribe to meta data changes from its peers and apply to local filer store.
 Each filer store will auto generate a unique `filer.store.id`. So for shared filer stores, such as mysql/postgres/redis, there is no need to setup peers because the `filer.store.id` will be the same.
@ -100,4 +81,4 @@ It is actually OK if you need to change filer IP or port. The replication can st
 # Limitation
-Multiple filers with local leveldb filer stores can work well with the `-peers` configured. However, this layout does not work well with `weed filer.sync` cross data center replication as of now. This is because currently `weed filer.sync` use `filer.store.id` to identify data that needs to be replicated. Having multiple `filer.store.id` will confuse the `weed filer.sync`.
+Multiple filers with local leveldb filer stores can work well. However, this layout does not work well with `weed filer.sync` cross data center replication as of now. This is because currently `weed filer.sync` use `filer.store.id` to identify data that needs to be replicated. Having multiple `filer.store.id` will confuse the `weed filer.sync`.