mirror of
https://github.com/seaweedfs/seaweedfs.git
synced 2024-01-19 02:48:24 +00:00
Created TensorFlow with SeaweedFS (markdown)
parent
42b3b5d706
commit
a4f35512d1
93
TensorFlow-with-SeaweedFS.md
Normal file
93
TensorFlow-with-SeaweedFS.md
Normal file
|
@ -0,0 +1,93 @@
|
||||||
|
# Simplest Example
|
||||||
|
```
|
||||||
|
import tensorflow as tf
|
||||||
|
import os
|
||||||
|
|
||||||
|
os.environ["S3_ENDPOINT"] = "http://localhost:8333"
|
||||||
|
|
||||||
|
# ...
|
||||||
|
|
||||||
|
train_dataset = tf.data.TFRecordDataset(filenames=[
|
||||||
|
"s3://bucketname/path/to/file1.tfrecord",
|
||||||
|
"s3://bucketname/path/to/file2.tfrecord",
|
||||||
|
]).map(record_parser).batch(BATCH_SIZE)
|
||||||
|
|
||||||
|
# ...
|
||||||
|
|
||||||
|
model.fit(train_dataset, ...)
|
||||||
|
|
||||||
|
```
|
||||||
|
# TensorFlow on SeaweedFS S3
|
||||||
|
[TensorFlow already supports S3](https://github.com/tensorflow/examples/blob/master/community/en/docs/deploy/s3.md)
|
||||||
|
Here is an adaption of it with unnecessary content removed.
|
||||||
|
|
||||||
|
## Configuration
|
||||||
|
|
||||||
|
When reading or writing data on S3 with your TensorFlow program, the behavior
|
||||||
|
can be controlled by various environmental variables:
|
||||||
|
|
||||||
|
* **S3_ENDPOINT**: The endpoint could be overridden explicitly with
|
||||||
|
`S3_ENDPOINT` specified.
|
||||||
|
|
||||||
|
To read or write objects in a bucket that is not publicly accessible,
|
||||||
|
AWS credentials must be provided through one of the following methods:
|
||||||
|
|
||||||
|
* Set credentials in the AWS credentials profile file on the local system,
|
||||||
|
located at: `~/.aws/credentials` on Linux, macOS, or Unix, or
|
||||||
|
`C:\Users\USERNAME\.aws\credentials` on Windows.
|
||||||
|
* Set the `AWS_ACCESS_KEY_ID` and `AWS_SECRET_ACCESS_KEY` environment
|
||||||
|
variables.
|
||||||
|
|
||||||
|
## Example Setup
|
||||||
|
|
||||||
|
Using the above information, we can configure Tensorflow to communicate to an S3
|
||||||
|
endpoint by setting the following environment variables:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
S3_ENDPOINT=http://localhost:8333
|
||||||
|
AWS_ACCESS_KEY_ID=XXXXX # Credentials if configured
|
||||||
|
AWS_SECRET_ACCESS_KEY=XXXXX
|
||||||
|
|
||||||
|
```
|
||||||
|
|
||||||
|
## Usage
|
||||||
|
|
||||||
|
Once setup is completed, Tensorflow can interact with S3 in a variety of ways.
|
||||||
|
Anywhere there is a Tensorflow IO function, an S3 URL can be used.
|
||||||
|
|
||||||
|
### Smoke Test
|
||||||
|
|
||||||
|
To test your setup, stat a file:
|
||||||
|
|
||||||
|
```python
|
||||||
|
from tensorflow.python.lib.io import file_io
|
||||||
|
print file_io.stat('s3://bucketname/path/')
|
||||||
|
```
|
||||||
|
|
||||||
|
You should see output similar to this:
|
||||||
|
|
||||||
|
```console
|
||||||
|
<tensorflow.python.pywrap_tensorflow_internal.FileStatistics; proxy of <Swig Object of type 'tensorflow::FileStatistics *' at 0x10c2171b0> >
|
||||||
|
```
|
||||||
|
|
||||||
|
### Reading Data
|
||||||
|
|
||||||
|
```python
|
||||||
|
filenames = ["s3://bucketname/path/to/file1.tfrecord",
|
||||||
|
"s3://bucketname/path/to/file2.tfrecord"]
|
||||||
|
dataset = tf.data.TFRecordDataset(filenames)
|
||||||
|
```
|
||||||
|
|
||||||
|
### Tensorflow Tools
|
||||||
|
|
||||||
|
Many Tensorflow tools, such as Tensorboard or model serving, can also take S3
|
||||||
|
URLS as arguments:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
tensorboard --logdir s3://bucketname/path/to/model/
|
||||||
|
tensorflow_model_server --port=9000 --model_name=model --model_base_path=s3://bucketname/path/to/model/export/
|
||||||
|
```
|
||||||
|
|
||||||
|
This enables an end to end workflow using S3 for all data needs.
|
||||||
|
|
||||||
|
|
Loading…
Reference in a new issue