From 22c7c2c1aa7c255c8225634f67d0bc87133896c4 Mon Sep 17 00:00:00 2001 From: Chris Lu Date: Mon, 6 Apr 2020 22:54:56 -0700 Subject: [PATCH] Created Production Setup (markdown) --- Production-Setup.md | 136 ++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 136 insertions(+) create mode 100644 Production-Setup.md diff --git a/Production-Setup.md b/Production-Setup.md new file mode 100644 index 0000000..a3e8926 --- /dev/null +++ b/Production-Setup.md @@ -0,0 +1,136 @@ +The simple setup could be deceptive. Going to production needs more complicated setup. + +There are multiple components. Please follow the steps to setup them one by one. + +* Setup object storage + * Setup Masters + * Add volume servers +* Setup file storage + * Choose filer store + * Setup Filer + +And then, choose the component you want to setup +* Setup S3 +* Setup FUSE mount + +For dev ops, metrics is also needed: +* Setup metrics + +#### For single node setup +You can just use `weed server -filer -s3 -ip=xx.xx.xx.xx`, to have one master, one volume server, one filer, and one S3 API server running. + +It is better to have several volumes running on one machine, so that if one volume is compacting, the other volumes can still serve read and write requests. The default volume size is 30GB. So if your server does not have multiple 30GB empty spaces, you need to reduce the volume size. + +``` +weed server -filer -s3 -ip=xx.xx.xx.xx -volume.max=0 -master.volumeSizeLimitMB=1024 +``` + +# Setup object storage + +## Setup Masters + +### One master is fine + +If there are 2 machines, it is not possible to achieve consensus. Just do not bother to setup multiple masters. + +Even for small clusters, it is totally fine to have one single master. The load on master is very light. It is unlikely to go down. You can always just restart it + +### Setup masters + +OK. Your CTO just wants multiple masters. To do so, see [[Failover Master Server]] for details. + +Assuming your machine has a directory: `/data/seaweedfs`. Run these on 3 machines with ip addresses as ip1, ip2, ip3. + +``` +weed master -mdir=/data/seaweedfs/master -peers=ip1:9333,ip2:9333,ip3:9333 +weed master -mdir=/data/seaweedfs/master -peers=ip1:9333,ip2:9333,ip3:9333 +weed master -mdir=/data/seaweedfs/master -peers=ip1:9333,ip2:9333,ip3:9333 +``` + +Additional notes: +* Depending on the disk space of each volume server, the master may need to set smaller volume size, e.g., add `-volumeSizeLimitMB=1024`. +* Since it is for production, you may also want to add `-metrics.address=`. See [[System Metrics]]. + +### Add volume servers + +Adding volume servers is easy. Actually this is much easier than most other systems. + +#### For machine with one disk to use +Run this to setup: +``` +weed volume -master=ip1:9333,ip2:9333,ip3:9333 -dataCenter=dc1 -rack=rack1 -dir=/data/seaweedfs/volume -ip=xxx.xxx.xxx.xxx -max=0 +``` + +#### For machine with multiple disks +Configure the `-dir` to be comma separated directory list, and set `-max` for corresponding directories, assuming the `/data/seaweedfs/volume[x]` are on different disks. +``` +weed volume -master=ip1:9333,ip2:9333,ip3:9333 -dataCenter=dc1 -rack=rack1 -ip=xxx.xxx.xxx.xxx -dir=/data/seaweedfs/volume1,/data/seaweedfs/volume2,/data/seaweedfs/volume3 -max=0,0,0 +``` +Do not use directories on the same disk. The automatic volume count limit will double count the capacity. + +#### For machine with multiple disks +You can also create multiple volume servers on different ports. +``` +weed volume -master=ip1:9333,ip2:9333,ip3:9333 -dataCenter=dc1 -rack=rack1 -dir=/data/seaweedfs/volume -ip=xxx.xxx.xxx.xxx -max=0 +``` + +Additional notes: +* If the disk space is huge and there will be a lot of volumes, configure `-index=leveldb` to reduce memory load. +* For windows, `-max=0` does not work. You need to manually set it. +* For busy volume servers, `-compactionMBps` can help to throttle the background jobs, e.g., compaction, balancing, encoding/decoding,etc. +* After adding volume servers, there will not be data rebalancing. Data are written to them after new volumes are created on them. You can use `weed shell` and run `volume.balance -force` to manually balance them. + +## Check the object store setup +Now the object store setup is completed. You can visit "http://:9333/" to check it around. You can also assign some file ids to trigger allocating the volumes. + +If you only use SeaweedFS object store, that is all. + +# Setup file storage + +## Choose filer store + +If currently only one filer is needed, just use one filer with default filer store. It is very scalable. + +You can always migrate to other scalable filer store by export and import the filer meta data. See [[Filer Stores]] + +Run `weed scaffold -conf=filer` to generate an example `filer.toml` file. + +The filer store to choose depends on your requirements, your existing data stores, etc. + +## Setup filer + + +``` +weed filer -ip=xxx.xxx.xxx.xxx -master=ip1:9333,ip2:9333,ip3:9333 +``` + +Additional notes: +* Both `weed filer` and `weed master` has option `-defaultReplicaPlacement`. `weed master` uses it for the object store, while `weed filer` uses it for files. The `weed filer` default setting is "000", and overwrites the one `weed master` has. +* `-encryptVolumeData` option is when you need to encrypt the data on volume servers. See [[Filer Data Encryption]] + +## Setup multiple filers + +If using shared filer store, the filers are itself stateless. You can create multiple filers. + +If using default embedded filer store, the filers are unaware of each other, but they share the same SeaweedFS object store. + +# Additional components + +## Setup S3 API + +Follow [[Amazon S3 API]] to generate a json config file, to assign accessKey and secretKey for different identities, and give read/write permissions to different buckets. + +Run + +`weed s3 -filer= -config= -port=8333` + +The s3 ip address is not needed. When running S3 related tools, remember to set the endpoint to `http://:8333`. + +## Setup FUSE mount + +Run + +`weed mount -filer= -chunkCacheCountLimit=xxx -chunkSizeLimitMB=4` + +* `-chunkCacheCountLimit` means how many entries cached in memory, default to 1000. With default `-chunkSizeLimitMB` set to 4, it may take up to 4x1000 MB memory. If all files are bigger than 4MB. +* `-replication` is the replication level for each file. It overwrites replication settings on both filer and master.