Updated Filer Data Encryption (markdown)

Chris Lu 2020-03-08 22:21:52 -07:00
parent d81c983356
commit f2d4f4758a

@ -6,9 +6,14 @@ For filer,
However, there could be many volume servers. And the volumes may be tiered to the cloud. What if there are some security breach? However, there could be many volume servers. And the volumes may be tiered to the cloud. What if there are some security breach?
### Encrypt data on volume servers ### Encrypt data on volume servers
`weed filer -encryptVolumeData` is an option to encrypt the data on volume servers. The encryption key is randomly generated during write time, and is different for different files. The encryption key is stored as metadata in filer store. `weed filer -encryptVolumeData` is an option to encrypt the data on volume servers.
So the volume data on the volume servers are encrypted and should be safe. As long as the filer store is not exposed, it is nearly impossible to guess the encryption keys for each file. The encryption keys are randomly generated during write time, and are different for different files. The encryption keys are stored as metadata in filer store.
So the volume data on the volume servers are encrypted. As long as the filer store is not exposed, it is nearly impossible to guess the encryption keys for all the files.
### Safe Data Storage
Actually the volume servers do not have any concept of encryption. With the file content encrypted, it is safe to put volume servers anywhere you want. The volume servers are not visible to any unencrypted data, for either storage or transmission.
### Safely Forget Data ### Safely Forget Data
Another side is, with GDPR, companies are required to "forget" customer data after some time. If the volume data is stored on a glacial storage system, it is cumbersome to dig them out and destroy them. It is much easier to just delete the metadata, and the volume data is automatically "destroyed". Another side is, with GDPR, companies are required to "forget" customer data after some time. If the volume data is stored on a glacial storage system, it is cumbersome to dig them out and destroy them. It is much easier to just delete the metadata, and the volume data is automatically "destroyed".
@ -16,7 +21,7 @@ Another side is, with GDPR, companies are required to "forget" customer data aft
### Encryption Algorithm ### Encryption Algorithm
The encryption is through GCM https://en.wikipedia.org/wiki/Galois/Counter_Mode The encryption is through GCM https://en.wikipedia.org/wiki/Galois/Counter_Mode
There is one randomly generated cipher key of 256 bits for each file chunk. One file has one or many file chunks. By default the chunk size is 32MB. The cipher code is here https://github.com/chrislusf/seaweedfs/blob/master/weed/util/cipher.go There is one randomly generated cipher key of 256 bits for each file chunk. The cipher code is here https://github.com/chrislusf/seaweedfs/blob/master/weed/util/cipher.go
### Note ### Note
The volume servers are agnostic to encryption. There are no encryption if you only use master and volume servers as an object store. The volume servers are agnostic to encryption. There are no encryption if you only use master and volume servers as an object store.