Updated Filer Store Replication (markdown)

2024-01-19 02:48:24 +00:00 · 2021-11-07 14:41:51 -08:00 · 2021-11-07 14:41:51 -08:00 · f30e537390
parent b4966f6a01
commit f30e537390
1 changed files with 33 additions and 52 deletions
--- a/Filer-Store-Replication.md
+++ b/Filer-Store-Replication.md
@ -1,12 +1,27 @@
-Here we talk about using `weed filer -peers=...`, which achieves two purposes:
+# Parallel filers with embedded filer store
+
+If one filer is not enough, you can add more filers. This seems easy with shared filer stores, such as Redis, MySql, Postgres, Cassandra, HBase, etc.
+
+But did you notice this also works for embedded filer stores, such as LevelDB, RocksDB, SQLite, etc?
+
+How is it possible?
+
+# Automatic Peer Discovery
+
+When a filer starts up, it will report itself to the master. So the master knows all the filers. It will keep each filer updated about its peers (Since version 2.77).
+
+# Metadata synchronization
+
+Knowing all the peers, one filer will keep its own metadata updated:
+
 1. Aggregate filer meta data changes from peers
-2. Replay filer meta data changes to local filer store
+2. Replay filer meta data changes to local filer store, if it is an embedded store.

-# FUSE mount with multiple filers
+## Aggregate metadata updates

-The first point is tightly related to FUSE Mount, which streams filer meta data changes from one filer. 
+This is tightly related to FUSE Mount, which streams filer meta data changes from one filer. When using multiple filers but without peer file metadata updates, a FUSE mount can only see the changes applied to the connected filer. 

-So when using multiple filers, this `-peers=xxx` option is needed. If not, a FUSE mount can only see the changes applied to the connected filer. This is required when the filers are using either shared or dedicated filer stores.
+So aggregating metadata updates form its peers is required when the filers are using either shared or dedicated filer stores.

 ```
  FUSE mount <----> filer1 -- filer2
@ -15,39 +30,11 @@ So when using multiple filers, this `-peers=xxx` option is needed. If not, a FUS
                         filer3
 ```

-# File Store Replication
+# Persist metadata changes to local embedded store

-The second point is about metadata replication. 
-
-This `-peers=...` can synchronize the meta data in the filer stores. If filers are using shared filer stores, this is optional.
-
-It can also enables Active-Active or one-directional replication.
-
-## Use Cases
-
-For filer stores using shared filer stores, such as shared Mysql/Postgres/Cassandra/Redis/Sqlite/ElasticSearch/etc in [[Filer-Stores]], this is not really needed, since all filers are stateless, and there are no need to replicate the meta data back to the same filer store.
-
-But if each filer has its own filer store, usually with the default local Leveldb, or even with a dedicated Mysql/Postgres/Cassandra/Redis/Sqlite/etc store, this would be very useful.
-
-Sometimes you may want to replicate the existing store to a new filer store, or move to a new filer store, this would also be useful.
-
-### One-Directional Replication
-
-When starting a filer, set the `-peers` option, to receive updates from the peers.
-
-Assuming there is a separate filer.toml for each filer, and a filer is already running at `localhost:8888`, this command will replicate metadata in `localhost:8888` to `localhost:8889`.
-
-```
-weed filer -port=8889 -peers=localhost:8888
-```
-
-### Active-Active Replication
-
-```
-weed filer -port=8888 -peers=localhost:8888,localhost:8889
-weed filer -port=8889 -peers=localhost:8888,localhost:8889
-```
+If the filer is running on embedded store, the metadata updates from its peers would be saved locally.

+This basically synchronize the metadata across all the filer stores. If filers are using shared filer stores, this is optional.

 # Example Topologies

@ -56,41 +43,35 @@ weed filer -port=8889 -peers=localhost:8888,localhost:8889
 ```
 filer1(leveldb) <-> filer2(leveldb) <-> filer3(leveldb) 

-weed filer -peers=<filer1:port1>,<filer2:port2>,<filer3:port3>
-
 ```

 * Two filers are fine. There is no requirements for number of filers.

 ```
 filer1(leveldb) <-> filer2(leveldb)
-
-weed filer -peers=<filer1:port1>,<filer2:port2>
-
 ```

-* Two filers with different stores are also fine. Of course, you will need a different `filer.toml`.
+* Two filers with different embedded stores are also fine. Of course, you will need a different `filer.toml`.

 ```
-filer1(leveldb) <-> filer2(elastic search)
-
-weed filer -peers=<filer1:port1>,<filer2:port2>
-
+filer1(leveldb) <-> filer2(rocksdb)
 ```

-* Master-Slave mode for filers with different stores.
+* Two filers with one shared stores are fine.

 ```
-filer1(leveldb) --> filer2(elastic search)
+filer1(mysql) <-> filer2(mysql)
+```

-# start filer2 as this. 
-weed filer -peers=<filer1:port1>
+* Two filers with a shared store and an embedded store are NOT fine.

+```
+filer1(leveldb) <--XX NOT WORKING XX---> filer2(mysql)
 ```

 # How is it implemented?

-Each filer has a local meta data change log. When starting with `-peers` setting, each filer will subscribe to meta data changes from its peers and apply to local filer store.
+Each filer has a local meta data change log. When starting, each filer will subscribe to meta data changes from its peers and apply to local filer store.

 Each filer store will auto generate a unique `filer.store.id`. So for shared filer stores, such as mysql/postgres/redis, there is no need to setup peers because the `filer.store.id` will be the same.

@ -100,4 +81,4 @@ It is actually OK if you need to change filer IP or port. The replication can st

 # Limitation

-Multiple filers with local leveldb filer stores can work well with the `-peers` configured. However, this layout does not work well with `weed filer.sync` cross data center replication as of now. This is because currently `weed filer.sync` use `filer.store.id` to identify data that needs to be replicated. Having multiple `filer.store.id` will confuse the `weed filer.sync`.
+Multiple filers with local leveldb filer stores can work well. However, this layout does not work well with `weed filer.sync` cross data center replication as of now. This is because currently `weed filer.sync` use `filer.store.id` to identify data that needs to be replicated. Having multiple `filer.store.id` will confuse the `weed filer.sync`.