From a4f35512d186394b5783957fbb9e3803813bfff9 Mon Sep 17 00:00:00 2001 From: Chris Lu Date: Fri, 25 Dec 2020 13:18:45 -0800 Subject: [PATCH] Created TensorFlow with SeaweedFS (markdown) --- TensorFlow-with-SeaweedFS.md | 93 ++++++++++++++++++++++++++++++++++++ 1 file changed, 93 insertions(+) create mode 100644 TensorFlow-with-SeaweedFS.md diff --git a/TensorFlow-with-SeaweedFS.md b/TensorFlow-with-SeaweedFS.md new file mode 100644 index 0000000..556a66a --- /dev/null +++ b/TensorFlow-with-SeaweedFS.md @@ -0,0 +1,93 @@ +# Simplest Example +``` +import tensorflow as tf +import os + +os.environ["S3_ENDPOINT"] = "http://localhost:8333" + +# ... + +train_dataset = tf.data.TFRecordDataset(filenames=[ + "s3://bucketname/path/to/file1.tfrecord", + "s3://bucketname/path/to/file2.tfrecord", +]).map(record_parser).batch(BATCH_SIZE) + +# ... + +model.fit(train_dataset, ...) + +``` +# TensorFlow on SeaweedFS S3 +[TensorFlow already supports S3](https://github.com/tensorflow/examples/blob/master/community/en/docs/deploy/s3.md) +Here is an adaption of it with unnecessary content removed. + +## Configuration + +When reading or writing data on S3 with your TensorFlow program, the behavior +can be controlled by various environmental variables: + +* **S3_ENDPOINT**: The endpoint could be overridden explicitly with + `S3_ENDPOINT` specified. + +To read or write objects in a bucket that is not publicly accessible, +AWS credentials must be provided through one of the following methods: + +* Set credentials in the AWS credentials profile file on the local system, + located at: `~/.aws/credentials` on Linux, macOS, or Unix, or + `C:\Users\USERNAME\.aws\credentials` on Windows. +* Set the `AWS_ACCESS_KEY_ID` and `AWS_SECRET_ACCESS_KEY` environment + variables. + +## Example Setup + +Using the above information, we can configure Tensorflow to communicate to an S3 +endpoint by setting the following environment variables: + +```bash +S3_ENDPOINT=http://localhost:8333 +AWS_ACCESS_KEY_ID=XXXXX # Credentials if configured +AWS_SECRET_ACCESS_KEY=XXXXX + +``` + +## Usage + +Once setup is completed, Tensorflow can interact with S3 in a variety of ways. +Anywhere there is a Tensorflow IO function, an S3 URL can be used. + +### Smoke Test + +To test your setup, stat a file: + +```python +from tensorflow.python.lib.io import file_io +print file_io.stat('s3://bucketname/path/') +``` + +You should see output similar to this: + +```console + > +``` + +### Reading Data + +```python +filenames = ["s3://bucketname/path/to/file1.tfrecord", + "s3://bucketname/path/to/file2.tfrecord"] +dataset = tf.data.TFRecordDataset(filenames) +``` + +### Tensorflow Tools + +Many Tensorflow tools, such as Tensorboard or model serving, can also take S3 +URLS as arguments: + +```bash +tensorboard --logdir s3://bucketname/path/to/model/ +tensorflow_model_server --port=9000 --model_name=model --model_base_path=s3://bucketname/path/to/model/export/ +``` + +This enables an end to end workflow using S3 for all data needs. + +