Go to file

Kegan Myers 5bf1599611 initial commit		2022-08-23 21:56:32 -05:00
cmd	initial commit	2022-08-23 21:56:32 -05:00
pkg	initial commit	2022-08-23 21:56:32 -05:00
.gitignore	initial commit	2022-08-23 21:56:32 -05:00
go.mod	initial commit	2022-08-23 21:56:32 -05:00
go.sum	initial commit	2022-08-23 21:56:32 -05:00
main.go	initial commit	2022-08-23 21:56:32 -05:00
Makefile	initial commit	2022-08-23 21:56:32 -05:00
README.md	initial commit	2022-08-23 21:56:32 -05:00

README.md

FiLeStore

fls is a tool for easily, efficiently, and reliably storing your files across a pool of multiple disks, servers, racks, zones, regions, or even datacenters.

What is the state of the project?

This project is very early in its development. It is no more than an experiment at this point. It lacks many features to make it useful, and many more features that would make it "good".

Do not use it to store any data you care about.

TODO

Chunk file validation
Chunk file repair/rebuilding
Input file reconstruction (with data validation)
Input file reconstruction (with missing chunk files/shards, without rebuilding)
Networking features
- Chunk storage
  - Tracking health of stored chunks
  - Rebuilding lost chunks
  - Balancing of chunks
- Filesystem features
- FUSE mount of network filesystem

DONE

Chunk file generation (data + parity)
Input file reconstruction (requires all data chunks, does not validate reconstructed data)

How does it work?

Files are striped (with a configurable stripe width, 10MiB by default) across a configurable number of data chunks (10 by default) and parity chunks (4 by default) are generated with Reed-Solomon erasure encoding. Chunks can be stored anywhere you can put a file. If the shards are distributed on enough disks/servers/whatever it is possible to recover from the loss of up to the number of parity chunks (by default you can lose any of up to 4 data or parity chunk files while maintaining data availability).

Why?

For fun. To solve a specific problem I have with existing options for distributed replicated file systems. The primary goal of this project is reliable file storage. Some are overly complex. Some are difficult to administer. Some scale poorly. Some don't have adequate data integrity features. Some require full file replication. Hopefully all of these shortcomings and more will be addressed for this specific problem space.