Simplest Example
import tensorflow as tf
import os
os.environ["S3_ENDPOINT"] = "http://localhost:8333"
# ...
train_dataset = tf.data.TFRecordDataset(filenames=[
"s3://bucketname/path/to/file1.tfrecord",
"s3://bucketname/path/to/file2.tfrecord",
]).map(record_parser).batch(BATCH_SIZE)
# ...
model.fit(train_dataset, ...)
TensorFlow on SeaweedFS S3
TensorFlow already supports S3
Here is an adaption of it with unnecessary content removed.
Configuration
When reading or writing data on S3 with your TensorFlow program, the behavior can be controlled by various environmental variables:
- S3_ENDPOINT: The endpoint could be overridden explicitly with
S3_ENDPOINT
specified.
To read or write objects in a bucket that is not publicly accessible, AWS credentials must be provided through one of the following methods:
- Set credentials in the AWS credentials profile file on the local system,
located at:
~/.aws/credentials
on Linux, macOS, or Unix, orC:\Users\USERNAME\.aws\credentials
on Windows. - Set the
AWS_ACCESS_KEY_ID
andAWS_SECRET_ACCESS_KEY
environment variables.
Example Setup
Using the above information, we can configure Tensorflow to communicate to an S3 endpoint by setting the following environment variables:
S3_ENDPOINT=http://localhost:8333
AWS_ACCESS_KEY_ID=XXXXX # Credentials if configured
AWS_SECRET_ACCESS_KEY=XXXXX
Usage
Once setup is completed, Tensorflow can interact with S3 in a variety of ways. Anywhere there is a Tensorflow IO function, an S3 URL can be used.
Smoke Test
To test your setup, stat a file:
from tensorflow.python.lib.io import file_io
print file_io.stat('s3://bucketname/path/')
You should see output similar to this:
<tensorflow.python.pywrap_tensorflow_internal.FileStatistics; proxy of <Swig Object of type 'tensorflow::FileStatistics *' at 0x10c2171b0> >
Reading Data
filenames = ["s3://bucketname/path/to/file1.tfrecord",
"s3://bucketname/path/to/file2.tfrecord"]
dataset = tf.data.TFRecordDataset(filenames)
Tensorflow Tools
Many Tensorflow tools, such as Tensorboard or model serving, can also take S3 URLS as arguments:
tensorboard --logdir s3://bucketname/path/to/model/
tensorflow_model_server --port=9000 --model_name=model --model_base_path=s3://bucketname/path/to/model/export/
This enables an end to end workflow using S3 for all data needs.
Introduction
API
Configuration
- Replication
- Store file with a Time To Live
- Failover Master Server
- Erasure coding for warm storage
- Server Startup Setup
- Environment Variables
Filer
- Filer Setup
- Directories and Files
- Data Structure for Large Files
- Filer Data Encryption
- Filer Commands and Operations
- Filer JWT Use
Filer Stores
- Filer Cassandra Setup
- Filer Redis Setup
- Super Large Directories
- Path-Specific Filer Store
- Choosing a Filer Store
- Customize Filer Store
Advanced Filer Configurations
- Migrate to Filer Store
- Add New Filer Store
- Filer Store Replication
- Filer Active Active cross cluster continuous synchronization
- Filer as a Key-Large-Value Store
- Path Specific Configuration
- Filer Change Data Capture
FUSE Mount
WebDAV
Cloud Drive
- Cloud Drive Benefits
- Cloud Drive Architecture
- Configure Remote Storage
- Mount Remote Storage
- Cache Remote Storage
- Cloud Drive Quick Setup
- Gateway to Remote Object Storage
AWS S3 API
- Amazon S3 API
- AWS CLI with SeaweedFS
- s3cmd with SeaweedFS
- rclone with SeaweedFS
- restic with SeaweedFS
- nodejs with Seaweed S3
- S3 API Benchmark
- S3 API FAQ
- S3 Bucket Quota
- S3 API Audit log
- S3 Nginx Proxy
AWS IAM
Machine Learning
HDFS
- Hadoop Compatible File System
- run Spark on SeaweedFS
- run HBase on SeaweedFS
- run Presto on SeaweedFS
- Hadoop Benchmark
- HDFS via S3 connector
Replication and Backup
- Async Replication to another Filer [Deprecated]
- Async Backup
- Async Filer Metadata Backup
- Async Replication to Cloud [Deprecated]
- Kubernetes Backups and Recovery with K8up
Messaging
Use Cases
Operations
Advanced
- Large File Handling
- Optimization
- Volume Management
- Tiered Storage
- Cloud Tier
- Cloud Monitoring
- Load Command Line Options from a file
- SRV Service Discovery