Updated Filer Store Replication (markdown)

Chris Lu 2021-11-07 14:41:51 -08:00
parent b4966f6a01
commit f30e537390

@ -1,12 +1,27 @@
Here we talk about using `weed filer -peers=...`, which achieves two purposes:
# Parallel filers with embedded filer store
If one filer is not enough, you can add more filers. This seems easy with shared filer stores, such as Redis, MySql, Postgres, Cassandra, HBase, etc.
But did you notice this also works for embedded filer stores, such as LevelDB, RocksDB, SQLite, etc?
How is it possible?
# Automatic Peer Discovery
When a filer starts up, it will report itself to the master. So the master knows all the filers. It will keep each filer updated about its peers (Since version 2.77).
# Metadata synchronization
Knowing all the peers, one filer will keep its own metadata updated:
1. Aggregate filer meta data changes from peers
2. Replay filer meta data changes to local filer store
2. Replay filer meta data changes to local filer store, if it is an embedded store.
# FUSE mount with multiple filers
## Aggregate metadata updates
The first point is tightly related to FUSE Mount, which streams filer meta data changes from one filer.
This is tightly related to FUSE Mount, which streams filer meta data changes from one filer. When using multiple filers but without peer file metadata updates, a FUSE mount can only see the changes applied to the connected filer.
So when using multiple filers, this `-peers=xxx` option is needed. If not, a FUSE mount can only see the changes applied to the connected filer. This is required when the filers are using either shared or dedicated filer stores.
So aggregating metadata updates form its peers is required when the filers are using either shared or dedicated filer stores.
```
FUSE mount <----> filer1 -- filer2
@ -15,39 +30,11 @@ So when using multiple filers, this `-peers=xxx` option is needed. If not, a FUS
filer3
```
# File Store Replication
# Persist metadata changes to local embedded store
The second point is about metadata replication.
This `-peers=...` can synchronize the meta data in the filer stores. If filers are using shared filer stores, this is optional.
It can also enables Active-Active or one-directional replication.
## Use Cases
For filer stores using shared filer stores, such as shared Mysql/Postgres/Cassandra/Redis/Sqlite/ElasticSearch/etc in [[Filer-Stores]], this is not really needed, since all filers are stateless, and there are no need to replicate the meta data back to the same filer store.
But if each filer has its own filer store, usually with the default local Leveldb, or even with a dedicated Mysql/Postgres/Cassandra/Redis/Sqlite/etc store, this would be very useful.
Sometimes you may want to replicate the existing store to a new filer store, or move to a new filer store, this would also be useful.
### One-Directional Replication
When starting a filer, set the `-peers` option, to receive updates from the peers.
Assuming there is a separate filer.toml for each filer, and a filer is already running at `localhost:8888`, this command will replicate metadata in `localhost:8888` to `localhost:8889`.
```
weed filer -port=8889 -peers=localhost:8888
```
### Active-Active Replication
```
weed filer -port=8888 -peers=localhost:8888,localhost:8889
weed filer -port=8889 -peers=localhost:8888,localhost:8889
```
If the filer is running on embedded store, the metadata updates from its peers would be saved locally.
This basically synchronize the metadata across all the filer stores. If filers are using shared filer stores, this is optional.
# Example Topologies
@ -56,41 +43,35 @@ weed filer -port=8889 -peers=localhost:8888,localhost:8889
```
filer1(leveldb) <-> filer2(leveldb) <-> filer3(leveldb)
weed filer -peers=<filer1:port1>,<filer2:port2>,<filer3:port3>
```
* Two filers are fine. There is no requirements for number of filers.
```
filer1(leveldb) <-> filer2(leveldb)
weed filer -peers=<filer1:port1>,<filer2:port2>
```
* Two filers with different stores are also fine. Of course, you will need a different `filer.toml`.
* Two filers with different embedded stores are also fine. Of course, you will need a different `filer.toml`.
```
filer1(leveldb) <-> filer2(elastic search)
weed filer -peers=<filer1:port1>,<filer2:port2>
filer1(leveldb) <-> filer2(rocksdb)
```
* Master-Slave mode for filers with different stores.
* Two filers with one shared stores are fine.
```
filer1(leveldb) --> filer2(elastic search)
filer1(mysql) <-> filer2(mysql)
```
# start filer2 as this.
weed filer -peers=<filer1:port1>
* Two filers with a shared store and an embedded store are NOT fine.
```
filer1(leveldb) <--XX NOT WORKING XX---> filer2(mysql)
```
# How is it implemented?
Each filer has a local meta data change log. When starting with `-peers` setting, each filer will subscribe to meta data changes from its peers and apply to local filer store.
Each filer has a local meta data change log. When starting, each filer will subscribe to meta data changes from its peers and apply to local filer store.
Each filer store will auto generate a unique `filer.store.id`. So for shared filer stores, such as mysql/postgres/redis, there is no need to setup peers because the `filer.store.id` will be the same.
@ -100,4 +81,4 @@ It is actually OK if you need to change filer IP or port. The replication can st
# Limitation
Multiple filers with local leveldb filer stores can work well with the `-peers` configured. However, this layout does not work well with `weed filer.sync` cross data center replication as of now. This is because currently `weed filer.sync` use `filer.store.id` to identify data that needs to be replicated. Having multiple `filer.store.id` will confuse the `weed filer.sync`.
Multiple filers with local leveldb filer stores can work well. However, this layout does not work well with `weed filer.sync` cross data center replication as of now. This is because currently `weed filer.sync` use `filer.store.id` to identify data that needs to be replicated. Having multiple `filer.store.id` will confuse the `weed filer.sync`.