mirror of
https://github.com/seaweedfs/seaweedfs.git
synced 2024-01-19 02:48:24 +00:00
Updated Distributed Filer (markdown)
parent
8bfe42da4f
commit
16212899e2
|
@ -1,17 +1,10 @@
|
||||||
The default weed filer is in standalone mode, storing file metadata on disk.
|
The default weed filer is in standalone mode, storing file metadata on local LevelDB.
|
||||||
It is quite efficient to go through deep directory path and can handle
|
It is quite efficient to go through deep directory path and can handle
|
||||||
millions of files.
|
millions of files.
|
||||||
|
|
||||||
However, no SPOF is a must-have requirement for many projects.
|
However, no SPOF is a must-have requirement for many projects.
|
||||||
|
|
||||||
Luckily, SeaweedFS is so flexible that we can use a completely different way
|
SeaweedFS can utilize existing familiar data store, e.g., Cassandra, Mysql, Postgres, Redis, to store the filer metadata.
|
||||||
to manage file metadata.
|
|
||||||
|
|
||||||
This distributed filer uses Redis or Cassandra to store the metadata.
|
|
||||||
|
|
||||||
## Redis Setup
|
|
||||||
|
|
||||||
No setup required.
|
|
||||||
|
|
||||||
## Cassandra Setup
|
## Cassandra Setup
|
||||||
|
|
||||||
|
@ -35,21 +28,24 @@ CREATE TABLE seaweed_files (
|
||||||
);
|
);
|
||||||
```
|
```
|
||||||
|
|
||||||
## Sample usage
|
## Create a filer.toml
|
||||||
|
|
||||||
To start a weed filer in distributed mode with Redis:
|
Try run ```weed filer -h``` to see an example filer.toml file. The file should be under one of current directory, $HOME/.seaweedfs/, or /etc/seaweedfs/ folers.
|
||||||
|
|
||||||
|
Here is the shortest example for Cassandra
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
# assuming you already started weed master and weed volume
|
[cassandra]
|
||||||
weed filer -redis.server=localhost:6379
|
enabled = true
|
||||||
|
keyspace="seaweedfs"
|
||||||
|
hosts=[
|
||||||
|
"localhost:9042",
|
||||||
|
]
|
||||||
```
|
```
|
||||||
|
|
||||||
To start a weed filer in distributed mode with Cassandra:
|
With the filer.toml file created, you can start ```weed filer```.
|
||||||
|
|
||||||
```bash
|
## See it in action
|
||||||
# assuming you already started weed master and weed volume
|
|
||||||
weed filer -cassandra.server=localhost
|
|
||||||
```
|
|
||||||
|
|
||||||
Now you can add/delete files
|
Now you can add/delete files
|
||||||
|
|
||||||
|
@ -62,33 +58,7 @@ curl -F "filename=@Makefile" "http://localhost:8888/path/to/sources/new_name"
|
||||||
curl "http://localhost:8888/path/to/sources/new_name"
|
curl "http://localhost:8888/path/to/sources/new_name"
|
||||||
```
|
```
|
||||||
|
|
||||||
## Limitation
|
Or you can visit ```http://localhost:8888/``` to see the files and click around.
|
||||||
|
|
||||||
List sub folders and files are not supported because Redis or Cassandra
|
|
||||||
does not support prefix search.
|
|
||||||
|
|
||||||
## Flat Namespace Design
|
|
||||||
|
|
||||||
Instead of using both directory and file metadata, this implementation uses
|
|
||||||
a flat namespace.
|
|
||||||
|
|
||||||
If storing each directory metadata separately, there would be multiple
|
|
||||||
network round trips to fetch directory information for deep directories,
|
|
||||||
impeding system performance.
|
|
||||||
|
|
||||||
A flat namespace would take more space because the parent directories are
|
|
||||||
repeatedly stored. But disk space is a lesser concern especially for
|
|
||||||
distributed systems.
|
|
||||||
|
|
||||||
So either Redis or Cassandra is a simple file_full_path ~ file_id mapping.
|
|
||||||
(Actually Cassandra is a file_full_path ~ list_of_file_ids mapping
|
|
||||||
with the hope to support easy file appending for streaming files.)
|
|
||||||
|
|
||||||
## Complexity
|
|
||||||
|
|
||||||
For one file retrieval, the full_filename=>file_id lookup will be O(logN)
|
|
||||||
using Redis or Cassandra. But very likely the one additional network hop would
|
|
||||||
take longer than the actual lookup.
|
|
||||||
|
|
||||||
## Deployment Notes
|
## Deployment Notes
|
||||||
|
|
||||||
|
@ -100,20 +70,3 @@ Replication is controlled by the client side. The filer's default replication is
|
||||||
|
|
||||||
The same setting on master server would not take effect since filer will always use the specified or filer's default replication to write.
|
The same setting on master server would not take effect since filer will always use the specified or filer's default replication to write.
|
||||||
|
|
||||||
## Use Cases
|
|
||||||
|
|
||||||
Clients can assess one "weed filer" via HTTP, create files via HTTP POST,
|
|
||||||
read files via HTTP POST directly.
|
|
||||||
|
|
||||||
## Future
|
|
||||||
|
|
||||||
SeaweedFS can support other distributed databases. It will be better
|
|
||||||
if that database can support prefix search, in order to list files
|
|
||||||
under a directory.
|
|
||||||
|
|
||||||
## Helps Wanted
|
|
||||||
|
|
||||||
Please implement your preferred metadata store!
|
|
||||||
|
|
||||||
Just follow the cassandra_store/cassandra_store.go file and send me a pull
|
|
||||||
request. I will handle the rest.
|
|
Loading…
Reference in a new issue