mirror of
https://github.com/seaweedfs/seaweedfs.git
synced 2024-01-19 02:48:24 +00:00
Created HDFS via S3 connector (markdown)
parent
9d038d2ca2
commit
d74df68773
46
HDFS-via-S3-connector.md
Normal file
46
HDFS-via-S3-connector.md
Normal file
|
@ -0,0 +1,46 @@
|
||||||
|
Current recommended way for Hadoop to access SeaweedFS is via [SeaweedFS Hadoop Compatible File System](Hadoop-Compatible-File-System), which is the most efficient way with the client directly accessing filer for metadata and accessing volume servers for file content.
|
||||||
|
|
||||||
|
However, the downside is that you need to add a SeaweedFS jar to classpath, and change some Hadoop settings.
|
||||||
|
|
||||||
|
# HDFS Access SeaweedFS via S3 connector
|
||||||
|
|
||||||
|
The S3a connector is already included in hadoop distributions. You can use it directly.
|
||||||
|
|
||||||
|
Use
|
||||||
|
|
||||||
|
```
|
||||||
|
<properties>
|
||||||
|
<maven.compiler.source>8</maven.compiler.source>
|
||||||
|
<maven.compiler.target>8</maven.compiler.target>
|
||||||
|
<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
|
||||||
|
<scala.version>2.12.11</scala.version>
|
||||||
|
<spark.version>3.1.2</spark.version>
|
||||||
|
<hadoop.version>3.3.1</hadoop.version>
|
||||||
|
<spark.pom.scope>compile</spark.pom.scope>
|
||||||
|
</properties>
|
||||||
|
|
||||||
|
```
|
||||||
|
And add this in your code:
|
||||||
|
```
|
||||||
|
SparkSession spark = SparkSession.builder()
|
||||||
|
.master("local[*]")
|
||||||
|
.config("spark.eventLog.enabled", "false")
|
||||||
|
.config("spark.driver.memory", "1g")
|
||||||
|
.config("spark.executor.memory", "1g")
|
||||||
|
.appName("SparkDemoFromS3")
|
||||||
|
.getOrCreate();
|
||||||
|
spark.sparkContext().hadoopConfiguration().set("fs.s3a.access.key", "admin");
|
||||||
|
spark.sparkContext().hadoopConfiguration().set("fs.s3a.secret.key", "xx");
|
||||||
|
spark.sparkContext().hadoopConfiguration().set("fs.s3a.endpoint", "ip:8333");
|
||||||
|
spark.sparkContext().hadoopConfiguration().set("com.amazonaws.services.s3a.enableV4", "true");
|
||||||
|
spark.sparkContext().hadoopConfiguration().set("fs.s3a.path.style.access", "true");
|
||||||
|
spark.sparkContext().hadoopConfiguration().set("fs.s3a.connection.ssl.enabled", "false");
|
||||||
|
spark.sparkContext().hadoopConfiguration().set("fs.s3a.multiobjectdelete.enable", "false");
|
||||||
|
spark.sparkContext().hadoopConfiguration().set("fs.s3a.directory.marker.retention", "keep");
|
||||||
|
spark.sparkContext().hadoopConfiguration().set("fs.s3a.change.detection.version.required", "false");
|
||||||
|
spark.sparkContext().hadoopConfiguration().set("fs.s3a.change.detection.mode", "warn");
|
||||||
|
RDD<String> rdd = spark.sparkContext().textFile("s3a://bk002/test1.txt", 1);
|
||||||
|
System.out.println(rdd.count());
|
||||||
|
rdd.saveAsTextFile("s3a://bk002/testcc/t2");
|
||||||
|
|
||||||
|
```
|
Loading…
Reference in a new issue