adjust filer doc

Chris Lu 2018-06-17 17:18:15 -07:00
parent 04723d71f4
commit 7d404749bb
5 changed files with 81 additions and 200 deletions

@ -6,6 +6,14 @@ When talking about file systems, many people would assume directories, list file
First, run ```weed filer -h``` to see an example ```filer.toml``` file. Copy it out and read it, create the data store if needed.
The simplest filer.toml can be:
```
```
[leveldb]
enabled = true
dir = "." # directory to store level db files
```
Two ways to start a weed filer
```bash
@ -53,11 +61,15 @@ For reads:
1. Client Read File Metadata => Weed Filer => Weed Filer database (LevelDB, Cassandra, Redis, Mysql, Postgres, etc)
2. Client Read File Chunks => Weed Volume Servers
![](FilerRead.png)
For writes:
1. Client stream files to Filer
2. Filer uploads data to Weed Volume Servers, and break the large files into chunks.
3. Filer writes the metadata and chunk information into Filer database.
## Filer Store
#### Complexity
For one file retrieval, the (file_parent_directory, fileName)=>meta data lookup will be O(logN) for LSM tree or Btree implementations, where N is number of existing entries, or O(1) for Redis.
@ -72,13 +84,16 @@ For directory renaming, it will be O(N) operations, with N as the number of file
### Comparing Storage Options
Here is a comparison of different filer store options.
The Filer Store persists all file metadata and directory information.
1. "memory" : only for testing/example purpose.
2. "leveldb": simple, single machine, fast, scalable, but no failover.
3. "mysql"/"postgres": robust and well-understood, fast enough for most cases, scalable.
4. "cassandra": robust and well-understood, fast, scalable.
5. "redis": very fast, scalable with clustering, need to enable persistent storage, file listing is limited because one directory's sub file names are stored in one key~value entry.
| Filer Store Name | Lookup | number of entries in a folder | Scalability | Note |
| ---------------- | -- | -- | -- | -- |
| memory | O(1) | limited by memory | Local, Fast | for testing only, no persistent storage |
| LevelDB | O(logN)| unlimited | Local, Very Fast | Default, fairly scalable |
| Redis | O(1) | limited | Local or Distributed, Fastest | one directory's sub file names are stored in one key~value entry |
| Cassandra | O(logN)| unlimited | Local or Distributed, Very Fast| |
| MySql | O(logN)| unlimited | Local or Distributed, Fast | Easy to manage, export |
| PostGres | O(logN)| unlimited | Local or Distributed, Fast | Easy to manage, export |
### Extending Storage Options
@ -97,3 +112,13 @@ Filer has two use cases.
When filer is used directly to upload and download files, in addition to file meta data, the filer also need to process the file content during read and write. So it's a good idea to add multiple filer servers. Having an nginx server in front of the filer servers to load balance the requests would be a good idea.
When filer is used to support "weed mount", the filer only provides file meta data retrieval. The actual file content are read and write directly between "weed mount" and "weed volume" servers. So the filer is limited only by the filer storage capability.
## Upgrading from previous Filer storage
Upgrading is complicated since the storage format is very different.
Here are the basic steps:
1. Export all files from existing storage, including the full path, and fileId.
2. For each fileId, find out the size, mime type.
3. Register the file in the new filer, via SeaweedFiler CreateEntry() gRpc API. See [[Filer Commands and Operations]]

@ -1,76 +0,0 @@
The default weed filer is in standalone mode, storing file metadata on local LevelDB.
It is quite efficient to go through deep directory path and can handle
millions of files.
However, no SPOF is a must-have requirement for many projects.
SeaweedFS can utilize existing familiar data store, e.g., Cassandra, Mysql, Postgres, Redis, to store the filer metadata.
The following takes Cassandra as an example.
## Cassandra Setup
Here is the CQL to create the table.CassandraStore.
Optionally you can adjust the keyspace name and replication settings.
For production, you would want to set replication_factor to 3
if there are at least 3 Cassandra servers.
```cql
create keyspace seaweedfs WITH replication = {
'class':'SimpleStrategy',
'replication_factor':1
};
use seaweedfs;
CREATE TABLE filemeta (
directory varchar,
name varchar,
meta blob,
PRIMARY KEY (directory, name)
) WITH CLUSTERING ORDER BY (name ASC);
```
## Create a filer.toml
Try run ```weed filer -h``` to see an example filer.toml file. The file should be under one of current directory, $HOME/.seaweedfs/, or /etc/seaweedfs/ folers.
Here is the shortest example for Cassandra
```bash
[cassandra]
enabled = true
keyspace="seaweedfs"
hosts=[
"localhost:9042",
]
```
With the filer.toml file created, you can start ```weed filer```.
## See it in action
Now you can add/delete files
```bash
# POST a file and read it back
curl -F "filename=@README.md" "http://localhost:8888/path/to/sources/"
curl "http://localhost:8888/path/to/sources/README.md"
# POST a file with a new name and read it back
curl -F "filename=@Makefile" "http://localhost:8888/path/to/sources/new_name"
curl "http://localhost:8888/path/to/sources/new_name"
```
Or you can visit ```http://localhost:8888/``` to see the files and click around.
## Deployment Notes
Replication is controlled by the client side. The filer's default replication is "000". To enable it, start filer with similar option like this:
```bash
-defaultReplicaPlacement=001
```
The same setting on master server would not take effect since filer will always use the specified or filer's default replication to write.

49
Filer-Cassandra-Setup.md Normal file

@ -0,0 +1,49 @@
SeaweedFS can utilize existing familiar data store, e.g., Cassandra, Mysql, Postgres, Redis, to store the filer metadata.
The following takes Cassandra as an example.
## Cassandra Setup
Here is the CQL to create the table.CassandraStore.
Optionally you can adjust the keyspace name and replication settings.
For production, you would want to set replication_factor to 3
if there are at least 3 Cassandra servers.
```cql
create keyspace seaweedfs WITH replication = {
'class':'SimpleStrategy',
'replication_factor':1
};
use seaweedfs;
CREATE TABLE filemeta (
directory varchar,
name varchar,
meta blob,
PRIMARY KEY (directory, name)
) WITH CLUSTERING ORDER BY (name ASC);
```
## Create a filer.toml
Try run ```weed filer -h``` to see an example filer.toml file. The file should be under one of current directory, $HOME/.seaweedfs/, or /etc/seaweedfs/ folers.
Here is the shortest example for Cassandra
```bash
[cassandra]
enabled = true
keyspace="seaweedfs"
hosts=[
"localhost:9042",
]
```
## Starting the Filer
```bash
weed filer
```

117
Filer.md

@ -1,117 +0,0 @@
This page aims to consolidate the pages on the [[single-node filer|Directories and Files]] and [[distributed filer]] into one.
## Background
SeaweedFS comes with a lightweight "filer" server, which provides a RESTful wrapper around SeaweedFS's blob API, mapping content to a traditional file directory of paths. The files in filer can also be mounted to Linux or Mac with FUSE support.
## Backends
SeaweedFS's built-in filer supports three different backends (although pull requests to add more are always welcome).
The default backend, LevelDB, is for simple, non-distributed single nodes.
The other backends, Redis and Cassandra, are for clustering backing stores that can be distributed across several nodes at high scale.
The LevelDB backend is very capable and efficient; the main disadvantage it has, relative to the distributed backends, is that it presents a single point of failure. In "[pets vs. cattle][pvc]" terms, the LevelDB backend is only suitable for "pet" servers, while the Redis and Cassandra backends are suitable for "cattle" servers.
[pvc]: https://blog.engineyard.com/2014/pets-vs-cattle
## Initialization
The LevelDB and Redis backends need no initialization.
### Initializing the Cassandra backend
Here is the CQL to create the table used by SeaweedFS's Cassandra store, as well as a keyspace for specifying the replication strategy to use.
While the table name and field structure must match what is written here, you are free to rename the keyspace and use whatever replication settings you wish. For production, you would want to set replication_factor to 3
if there are at least 3 Cassandra servers.
```cql
create keyspace seaweedfs WITH replication = {
'class':'SimpleStrategy',
'replication_factor':1
};
use seaweedfs;
CREATE TABLE filemeta (
directory varchar,
name varchar,
meta blob,
PRIMARY KEY (directory, name)
) WITH CLUSTERING ORDER BY (name ASC);
```
## Create a filer.toml file
Please create a filer.toml file in current directory, or ""$HOME/.seaweedfs/", or ""/etc/seaweedfs/".
Just run "weed filer -h" to see an up-to-date example. Here is one simpler copy. Remember to set enabled=true to pick one option.
```
[leveldb]
enabled = false
dir = "." # directory to store level db files
[cassandra]
enabled = false
keyspace="seaweedfs"
hosts=[
"localhost:9042",
]
[redis]
enabled = true
address = "localhost:6379"
password = ""
db = 0
```
## Starting the Filer
To start the filer, after you have started the master and volume servers (with `weed server`, or `weed master` and `weed volume` respectively), you can start a filer server with `weed filer`:
```bash
weed filer
```
Alternatively, to start all servers in one shot, you can start a filer server alongside a master server and volume server with the `-filer` option to `weed server`:
```
# this is equivalent to `weed master`, `weed volume`, and `weed filer` together
weed server -filer
```
## Using the Filer
The filer provides a simple RESTful interface, where POST requests to a path upload the file content for that path, and GET requests retrieve the content for that path.
```
# POST a file and read it back
curl -F "filename=@README.md" "http://localhost:8888/path/to/sources/"
curl "http://localhost:8888/path/to/sources/README.md"
# POST a file with a new name and read it back
curl -F "filename=@Makefile" "http://localhost:8888/path/to/sources/new_name"
curl "http://localhost:8888/path/to/sources/new_name"
```
You may also request a "listing" for a directory:
```
# list sub folders and files
curl "http://localhost:8888/path/to/sources/?pretty=y"
# if lots of files under this folder, here is a way to efficiently paginate through all of them
curl "http://localhost:8888/path/to/sources/?lastFileName=abc.txt&limit=50&pretty=y"
```
## Upgrading from previous Filer storage
Upgrading is complicated since the storage format is very different.
Here are the basic steps:
1. Export all files from existing storage, including the full path, and fileId.
2. For each fileId, find out the size, mime type.
3. Register the file in the new filer, via SeaweedFiler CreateEntry() gRpc API. See [[Filer Commands and Operations]]

@ -13,7 +13,7 @@
* [[Failover Master Server]]
* Filer
* [[Directories and Files]]
* [[Distributed Filer]]
* [[Filer Cassandra Setup]]
* [[Filer Commands and Operations]]
* [[Mount]]
* [[Customize Filer Store]]