Created Filer Metadata Events (markdown)

Chris Lu 2020-05-23 18:49:52 -07:00
parent cd0624b312
commit 1b04ed2cd3

88
Filer-Metadata-Events.md Normal file

@ -0,0 +1,88 @@
On filer, there is a `/topics/.system/log` folder, it stores all events from filer metadata change events.
## Metadata Event Format
The events are organized by timestamp, `yyyy-MM-dd/hh-mm.segment`.
The events are encoded by protobuf, defined in https://github.com/chrislusf/seaweedfs/blob/master/weed/pb/filer.proto . The related sections are:
```
service SeaweedFiler {
rpc SubscribeMetadata (SubscribeMetadataRequest) returns (stream SubscribeMetadataResponse) {
}
}
message SubscribeMetadataRequest {
string client_name = 1;
string path_prefix = 2;
int64 since_ns = 3;
}
message SubscribeMetadataResponse {
string directory = 1;
EventNotification event_notification = 2;
int64 ts_ns = 3;
}
message EventNotification {
Entry old_entry = 1;
Entry new_entry = 2;
bool delete_chunks = 3;
string new_parent_path = 4;
}
message LogEntry {
int64 ts_ns = 1;
int32 partition_key_hash = 2;
bytes data = 3;
}
```
The ondisk file is a repeated of the following bytes:
1. 4 bytes of `LogEntry` size
2. serialized `LogEntry`
The `LogEntry.data` stores serialized `SubscribeMetadataResponse`
## Read Metadata Events
The events can be read by any program as files. One example is here: https://github.com/chrislusf/seaweedfs/blob/master/unmaintained/see_log_entry/see_log_entry.go
## Subscribe to Metadata
The protobuf definition also defined a RPC call, where you can subscribe to all metadata changes asynchronously.
This is how the `weed mount` can asynchronously get metadata updates from the filer, and avoid cross-wire metadata read operations to dramatically cut down operation latency.
```
service SeaweedFiler {
rpc SubscribeMetadata (SubscribeMetadataRequest) returns (stream SubscribeMetadataResponse) {
}
}
message SubscribeMetadataRequest {
string client_name = 1;
string path_prefix = 2;
int64 since_ns = 3;
}
message SubscribeMetadataResponse {
string directory = 1;
EventNotification event_notification = 2;
int64 ts_ns = 3;
}
```
This is similar to `inotify` in normal file system, but actually much more powerful. Usually `inotify` can only monitor one directory and its direct children, but SeaweedFS metadata subscription can monitor one directory and all its sub-directories and files.
This could be useful to build powerful event driven data processing pipelines. Please let me know if you have some great ideas or accomplishment!
### What kind of events it contains?
The `SubscribeMetadataResponse` contains `EventNotification`, which contains `old_entry` and `new_entry`. The following events can be derived:
* Create Event: `new_entry` is not null, and `old_entry` is null.
* Delete Event: `old_entry` is not null, and `new_entry` is null.
* Update Event: `old_entry` is not null, and `new_entry` is not null.
* Rename Event: similar to Update Event, but also need `SubscribeMetadataResponse.directory` and `EventNotification.new_parent_path`.
## Purging Metadata
It is safe to delete the metadata files for now. However, future development will rely on these metadata events to implement new features, such as:
* Async filer replication (to replace existing dependency on other message queue, e.g., Kafka)
* Recover deleted files
* Snapshot
The metadata should not take too much space. Just leave it there.