On filer, there is a /topics/.system/log
folder, it stores all filer metadata change events.
Metadata Event Format
The events are stored in files organized by timestamp, yyyy-MM-dd/hh-mm.segment
.
The events are encoded by protobuf, defined in https://github.com/seaweedfs/seaweedfs/blob/master/weed/pb/filer.proto . The related sections are:
service SeaweedFiler {
rpc SubscribeMetadata (SubscribeMetadataRequest) returns (stream SubscribeMetadataResponse) {
}
}
message SubscribeMetadataRequest {
string client_name = 1;
string path_prefix = 2;
int64 since_ns = 3;
}
message SubscribeMetadataResponse {
string directory = 1;
EventNotification event_notification = 2;
int64 ts_ns = 3;
}
message EventNotification {
Entry old_entry = 1;
Entry new_entry = 2;
bool delete_chunks = 3;
string new_parent_path = 4;
}
message LogEntry {
int64 ts_ns = 1;
int32 partition_key_hash = 2;
bytes data = 3;
}
The ondisk file is a repeated bytes of the following format:
- 4 bytes of
LogEntry
size - serialized
LogEntry
The LogEntry.data
stores serialized SubscribeMetadataResponse
Read Metadata Events
The events can be read by any program as files. One example is here: https://github.com/seaweedfs/seaweedfs/blob/master/unmaintained/see_log_entry/see_log_entry.go
Subscribe to Metadata
The protobuf definition also defined a RPC call, where you can subscribe to all metadata changes asynchronously.
This is how the weed mount
can asynchronously get metadata updates from the filer, and avoid cross-wire metadata read operations to dramatically cut down operation latency.
service SeaweedFiler {
rpc SubscribeMetadata (SubscribeMetadataRequest) returns (stream SubscribeMetadataResponse) {
}
}
message SubscribeMetadataRequest {
string client_name = 1;
string path_prefix = 2;
int64 since_ns = 3;
}
message SubscribeMetadataResponse {
string directory = 1;
EventNotification event_notification = 2;
int64 ts_ns = 3;
}
This is similar to inotify
in normal file system, but actually much more powerful. Usually inotify
can only monitor one directory and its direct children, but SeaweedFS metadata subscription can monitor one directory and all its sub-directories and files.
This API also allows you to read from any point of time and replay all metadata events.
This API could be useful to build powerful event driven data processing pipelines. Please let me know if you have some great ideas or accomplishment!
What kind of events it contains?
The SubscribeMetadataResponse
contains EventNotification
, which contains old_entry
and new_entry
. The following events can be derived:
- Create Event:
new_entry
is not null, andold_entry
is null. - Delete Event:
old_entry
is not null, andnew_entry
is null. - Update Event:
old_entry
is not null, andnew_entry
is not null. - Rename Event: similar to Update Event, but also need
SubscribeMetadataResponse.directory
andEventNotification.new_parent_path
.
Purging Metadata Logs
It is ok to delete the metadata logs. However, many features rely on these metadata events, such as:
- Async filer replication
- Sync metadata with
weed mount
- Async filer metadata backup
- Filer backup
So better not to mess with the metadata logs, which should not take too much space. Just leave them there.
Introduction
API
Configuration
- Replication
- Store file with a Time To Live
- Failover Master Server
- Erasure coding for warm storage
- Server Startup Setup
- Environment Variables
Filer
- Filer Setup
- Directories and Files
- Data Structure for Large Files
- Filer Data Encryption
- Filer Commands and Operations
- Filer JWT Use
Filer Stores
- Filer Cassandra Setup
- Filer Redis Setup
- Super Large Directories
- Path-Specific Filer Store
- Choosing a Filer Store
- Customize Filer Store
Advanced Filer Configurations
- Migrate to Filer Store
- Add New Filer Store
- Filer Store Replication
- Filer Active Active cross cluster continuous synchronization
- Filer as a Key-Large-Value Store
- Path Specific Configuration
- Filer Change Data Capture
FUSE Mount
WebDAV
Cloud Drive
- Cloud Drive Benefits
- Cloud Drive Architecture
- Configure Remote Storage
- Mount Remote Storage
- Cache Remote Storage
- Cloud Drive Quick Setup
- Gateway to Remote Object Storage
AWS S3 API
- Amazon S3 API
- AWS CLI with SeaweedFS
- s3cmd with SeaweedFS
- rclone with SeaweedFS
- restic with SeaweedFS
- nodejs with Seaweed S3
- S3 API Benchmark
- S3 API FAQ
- S3 Bucket Quota
- S3 API Audit log
- S3 Nginx Proxy
AWS IAM
Machine Learning
HDFS
- Hadoop Compatible File System
- run Spark on SeaweedFS
- run HBase on SeaweedFS
- run Presto on SeaweedFS
- Hadoop Benchmark
- HDFS via S3 connector
Replication and Backup
- Async Replication to another Filer [Deprecated]
- Async Backup
- Async Filer Metadata Backup
- Async Replication to Cloud [Deprecated]
- Kubernetes Backups and Recovery with K8up
Messaging
Use Cases
Operations
Advanced
- Large File Handling
- Optimization
- Volume Management
- Tiered Storage
- Cloud Tier
- Cloud Monitoring
- Load Command Line Options from a file
- SRV Service Discovery