Created TensorFlow with SeaweedFS (markdown)

2024-01-19 02:48:24 +00:00 · 2020-12-25 13:18:45 -08:00 · 2020-12-25 13:18:45 -08:00 · a4f35512d1
parent 42b3b5d706
commit a4f35512d1
1 changed files with 93 additions and 0 deletions
--- a/TensorFlow-with-SeaweedFS.md
+++ b/TensorFlow-with-SeaweedFS.md
@ -0,0 +1,93 @@
+# Simplest Example
+```
+import tensorflow as tf
+import os
+
+os.environ["S3_ENDPOINT"] = "http://localhost:8333"
+
+# ...
+
+train_dataset = tf.data.TFRecordDataset(filenames=[
+    "s3://bucketname/path/to/file1.tfrecord",
+    "s3://bucketname/path/to/file2.tfrecord",
+]).map(record_parser).batch(BATCH_SIZE)
+
+# ...
+
+model.fit(train_dataset, ...)
+
+```
+# TensorFlow on SeaweedFS S3
+[TensorFlow already supports S3](https://github.com/tensorflow/examples/blob/master/community/en/docs/deploy/s3.md)
+Here is an adaption of it with unnecessary content removed.
+
+## Configuration
+
+When reading or writing data on S3 with your TensorFlow program, the behavior
+can be controlled by various environmental variables:
+
+*   **S3_ENDPOINT**: The endpoint could be overridden explicitly with
+    `S3_ENDPOINT` specified.
+
+To read or write objects in a bucket that is not publicly accessible,
+AWS credentials must be provided through one of the following methods:
+
+*   Set credentials in the AWS credentials profile file on the local system,
+    located at: `~/.aws/credentials` on Linux, macOS, or Unix, or
+    `C:\Users\USERNAME\.aws\credentials` on Windows.
+*   Set the `AWS_ACCESS_KEY_ID` and `AWS_SECRET_ACCESS_KEY` environment
+    variables.
+
+## Example Setup
+
+Using the above information, we can configure Tensorflow to communicate to an S3
+endpoint by setting the following environment variables:
+
+```bash
+S3_ENDPOINT=http://localhost:8333
+AWS_ACCESS_KEY_ID=XXXXX                 # Credentials if configured
+AWS_SECRET_ACCESS_KEY=XXXXX
+
+```
+
+## Usage
+
+Once setup is completed, Tensorflow can interact with S3 in a variety of ways.
+Anywhere there is a Tensorflow IO function, an S3 URL can be used.
+
+### Smoke Test
+
+To test your setup, stat a file:
+
+```python
+from tensorflow.python.lib.io import file_io
+print file_io.stat('s3://bucketname/path/')
+```
+
+You should see output similar to this:
+
+```console
+<tensorflow.python.pywrap_tensorflow_internal.FileStatistics; proxy of <Swig Object of type 'tensorflow::FileStatistics *' at 0x10c2171b0> >
+```
+
+### Reading Data
+
+```python
+filenames = ["s3://bucketname/path/to/file1.tfrecord",
+             "s3://bucketname/path/to/file2.tfrecord"]
+dataset = tf.data.TFRecordDataset(filenames)
+```
+
+### Tensorflow Tools
+
+Many Tensorflow tools, such as Tensorboard or model serving, can also take S3
+URLS as arguments:
+
+```bash
+tensorboard --logdir s3://bucketname/path/to/model/
+tensorflow_model_server --port=9000 --model_name=model --model_base_path=s3://bucketname/path/to/model/export/
+```
+
+This enables an end to end workflow using S3 for all data needs.
+
+