seaweedfs/weed-fs/note/replication.txt

1. each file can choose the replication factor
2. replication granularity is in volume level
3. if not enough spaces, we can automatically decrease some volume's the replication factor, especially for cold data
4. plan to support migrating data to cheaper storage
5. plan to manual volume placement, access-based volume placement, auction based volume placement

When a new volume server is started, it reports 
  1. how many volumes it can hold
  2. current list of existing volumes and each volume's replication type
Each volume server remembers:
  1. current volume ids
  2. replica locations are read from the master

The master assign volume ids based on
  1. replication factor
     data center, rack
  2. concurrent write support
On master, stores the replication configuration
{
  replication:{
    {type:"00", min_volume_count:3, weight:10},
    {type:"01", min_volume_count:2, weight:20},
    {type:"10", min_volume_count:2, weight:20},
    {type:"11", min_volume_count:3, weight:30},
    {type:"20", min_volume_count:2, weight:20}
  },
  port:9333,
}
Or manually via command line
  1. add volume with specified replication factor
  2. add volume with specified volume id


If duplicated volume ids are reported from different volume servers, 
the master determines the replication factor of the volume,
if less than the replication factor, the volume is in readonly mode
if more than the replication factor, the volume will purge the smallest/oldest volume
if equal, the volume will function as usual


Use cases:
  on volume server
    1. weed volume -mserver="xx.xx.xx.xx:9333" -publicUrl="good.com:8080" -dir="/tmp" -volumes=50
  on weed master
    1. weed master -port=9333
      generate a default json configuration file if doesn't exist
      
Bootstrap
  1. at the very beginning, the system has no volumes at all.
When data node starts:
  1. each data node send to master its existing volumes and max volume blocks
  2. master remembers the topology/data_center/rack/data_node/volumes
     for each replication level, stores
       volume id ~ data node
       writable volume ids
If any "assign" request comes in
  1. find a writable volume with the right replicationLevel
  2. if not found, grow the volumes with the right replication level
  3. return a writable volume to the user

  
Plan:
  Step 1. implement one copy(no replication), automatically assign volume ids
  Step 2. add replication

For the above operations, here are the todo list:
  for data node:
    0. detect existing volumes DONE
    1. onStartUp, and periodically, send existing volumes and maxVolumeCount  store.Join(), DONE
    2. accept command to grow a volume( id + replication level)  DONE
       /admin/assign_volume?volume=some_id&replicationType=01
    3. accept setting volumeLocationList  DONE
       /admin/set_volume_locations_list?volumeLocationsList=[{Vid:xxx,Locations:[loc1,loc2,loc3]}]
    4. for each write, pass the write to the next location, (Step 2)
       POST method should accept an index, like ttl, get decremented every hop
  for master:
    1. accept data node's report of existing volumes and maxVolumeCount ALREADY EXISTS /dir/join
    2. periodically refresh for active data nodes, and adjust writable volumes
    3. send command to grow a volume(id + replication level)  DONE
    4. NOT_IMPLEMENTING: if dead/stale data nodes are found, for the affected volumes, send stale info
       to other data nodes. BECAUSE the master will stop sending writes to these data nodes
    5. accept lookup for volume locations    ALREADY EXISTS /dir/lookup
    6. read topology/datacenter/rack layout

TODO:
  1. replicate content to the other server if the replication type needs replicas
starting a shell 2012-08-24 03:56:09 +00:00			`1. each file can choose the replication factor`
			`2. replication granularity is in volume level`
			`3. if not enough spaces, we can automatically decrease some volume's the replication factor, especially for cold data`
replication related work on data nodes 2012-09-11 00:08:52 +00:00			`4. plan to support migrating data to cheaper storage`
			`5. plan to manual volume placement, access-based volume placement, auction based volume placement`
starting a shell 2012-08-24 03:56:09 +00:00
			`When a new volume server is started, it reports`
			`1. how many volumes it can hold`
replication related work on data nodes 2012-09-11 00:08:52 +00:00			`2. current list of existing volumes and each volume's replication type`
starting a shell 2012-08-24 03:56:09 +00:00			`Each volume server remembers:`
replication related work on data nodes 2012-09-11 00:08:52 +00:00			`1. current volume ids`
			`2. replica locations are read from the master`
starting a shell 2012-08-24 03:56:09 +00:00
			`The master assign volume ids based on`
			`1. replication factor`
			`data center, rack`
			`2. concurrent write support`
			`On master, stores the replication configuration`
			`{`
			`replication:{`
replication related work on data nodes 2012-09-11 00:08:52 +00:00			`{type:"00", min_volume_count:3, weight:10},`
			`{type:"01", min_volume_count:2, weight:20},`
			`{type:"10", min_volume_count:2, weight:20},`
			`{type:"11", min_volume_count:3, weight:30},`
			`{type:"20", min_volume_count:2, weight:20}`
starting a shell 2012-08-24 03:56:09 +00:00			`},`
			`port:9333,`
			`}`
			`Or manually via command line`
			`1. add volume with specified replication factor`
			`2. add volume with specified volume id`


			`If duplicated volume ids are reported from different volume servers,`
			`the master determines the replication factor of the volume,`
			`if less than the replication factor, the volume is in readonly mode`
			`if more than the replication factor, the volume will purge the smallest/oldest volume`
			`if equal, the volume will function as usual`


			`Use cases:`
			`on volume server`
			`1. weed volume -mserver="xx.xx.xx.xx:9333" -publicUrl="good.com:8080" -dir="/tmp" -volumes=50`
			`on weed master`
			`1. weed master -port=9333`
			`generate a default json configuration file if doesn't exist`

			`Bootstrap`
			`1. at the very beginning, the system has no volumes at all.`
replication related work on data nodes 2012-09-11 00:08:52 +00:00			`When data node starts:`
			`1. each data node send to master its existing volumes and max volume blocks`
			`2. master remembers the topology/data_center/rack/data_node/volumes`
			`for each replication level, stores`
			`volume id ~ data node`
			`writable volume ids`
			`If any "assign" request comes in`
			`1. find a writable volume with the right replicationLevel`
			`2. if not found, grow the volumes with the right replication level`
			`3. return a writable volume to the user`

replication related work 2012-09-12 08:07:23 +00:00
			`Plan:`
			`Step 1. implement one copy(no replication), automatically assign volume ids`
			`Step 2. add replication`

replication related work on data nodes 2012-09-11 00:08:52 +00:00			`For the above operations, here are the todo list:`
			`for data node:`
detect existing volumes 2012-09-13 08:38:27 +00:00			`0. detect existing volumes DONE`
replication related work on data nodes 2012-09-11 00:08:52 +00:00			`1. onStartUp, and periodically, send existing volumes and maxVolumeCount store.Join(), DONE`
			`2. accept command to grow a volume( id + replication level) DONE`
			`/admin/assign_volume?volume=some_id&replicationType=01`
replication related work 2012-09-12 08:07:23 +00:00			`3. accept setting volumeLocationList DONE`
			`/admin/set_volume_locations_list?volumeLocationsList=[{Vid:xxx,Locations:[loc1,loc2,loc3]}]`
			`4. for each write, pass the write to the next location, (Step 2)`
replication related work on data nodes 2012-09-11 00:08:52 +00:00			`POST method should accept an index, like ttl, get decremented every hop`
			`for master:`
replication related work 2012-09-12 08:07:23 +00:00			`1. accept data node's report of existing volumes and maxVolumeCount ALREADY EXISTS /dir/join`
replication related work on data nodes 2012-09-11 00:08:52 +00:00			`2. periodically refresh for active data nodes, and adjust writable volumes`
replication related work 2012-09-12 08:07:23 +00:00			`3. send command to grow a volume(id + replication level) DONE`
replication related work on data nodes 2012-09-11 00:08:52 +00:00			`4. NOT_IMPLEMENTING: if dead/stale data nodes are found, for the affected volumes, send stale info`
			`to other data nodes. BECAUSE the master will stop sending writes to these data nodes`
replication related work 2012-09-12 08:07:23 +00:00			`5. accept lookup for volume locations ALREADY EXISTS /dir/lookup`
register reported topology 2012-09-14 08:17:13 +00:00			`6. read topology/datacenter/rack layout`
replication related work 2012-09-12 08:07:23 +00:00
refactoring content uploading 2012-09-21 00:58:29 +00:00			`TODO:`
			`1. replicate content to the other server if the replication type needs replicas`