Go to file
2022-08-23 21:56:32 -05:00
cmd initial commit 2022-08-23 21:56:32 -05:00
pkg initial commit 2022-08-23 21:56:32 -05:00
.gitignore initial commit 2022-08-23 21:56:32 -05:00
go.mod initial commit 2022-08-23 21:56:32 -05:00
go.sum initial commit 2022-08-23 21:56:32 -05:00
main.go initial commit 2022-08-23 21:56:32 -05:00
Makefile initial commit 2022-08-23 21:56:32 -05:00
README.md initial commit 2022-08-23 21:56:32 -05:00

FiLeStore

fls is a tool for easily, efficiently, and reliably storing your files across a pool of multiple disks, servers, racks, zones, regions, or even datacenters.

What is the state of the project?

This project is very early in its development. It is no more than an experiment at this point. It lacks many features to make it useful, and many more features that would make it "good".

Do not use it to store any data you care about.

TODO

  • Chunk file validation
  • Chunk file repair/rebuilding
  • Input file reconstruction (with data validation)
  • Input file reconstruction (with missing chunk files/shards, without rebuilding)
  • Networking features
    • Chunk storage
      • Tracking health of stored chunks
      • Rebuilding lost chunks
      • Balancing of chunks
    • Filesystem features
    • FUSE mount of network filesystem

DONE

  • Chunk file generation (data + parity)
  • Input file reconstruction (requires all data chunks, does not validate reconstructed data)

How does it work?

Files are striped (with a configurable stripe width, 10MiB by default) across a configurable number of data chunks (10 by default) and parity chunks (4 by default) are generated with Reed-Solomon erasure encoding. Chunks can be stored anywhere you can put a file. If the shards are distributed on enough disks/servers/whatever it is possible to recover from the loss of up to the number of parity chunks (by default you can lose any of up to 4 data or parity chunk files while maintaining data availability).

Why?

For fun. To solve a specific problem I have with existing options for distributed replicated file systems. The primary goal of this project is reliable file storage. Some are overly complex. Some are difficult to administer. Some scale poorly. Some don't have adequate data integrity features. Some require full file replication. Hopefully all of these shortcomings and more will be addressed for this specific problem space.