mirror of
https://github.com/seaweedfs/seaweedfs.git
synced 2024-01-19 02:48:24 +00:00
Created HDFS via S3 connector (markdown)
parent
9d038d2ca2
commit
d74df68773
46
HDFS-via-S3-connector.md
Normal file
46
HDFS-via-S3-connector.md
Normal file
|
@ -0,0 +1,46 @@
|
|||
Current recommended way for Hadoop to access SeaweedFS is via [SeaweedFS Hadoop Compatible File System](Hadoop-Compatible-File-System), which is the most efficient way with the client directly accessing filer for metadata and accessing volume servers for file content.
|
||||
|
||||
However, the downside is that you need to add a SeaweedFS jar to classpath, and change some Hadoop settings.
|
||||
|
||||
# HDFS Access SeaweedFS via S3 connector
|
||||
|
||||
The S3a connector is already included in hadoop distributions. You can use it directly.
|
||||
|
||||
Use
|
||||
|
||||
```
|
||||
<properties>
|
||||
<maven.compiler.source>8</maven.compiler.source>
|
||||
<maven.compiler.target>8</maven.compiler.target>
|
||||
<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
|
||||
<scala.version>2.12.11</scala.version>
|
||||
<spark.version>3.1.2</spark.version>
|
||||
<hadoop.version>3.3.1</hadoop.version>
|
||||
<spark.pom.scope>compile</spark.pom.scope>
|
||||
</properties>
|
||||
|
||||
```
|
||||
And add this in your code:
|
||||
```
|
||||
SparkSession spark = SparkSession.builder()
|
||||
.master("local[*]")
|
||||
.config("spark.eventLog.enabled", "false")
|
||||
.config("spark.driver.memory", "1g")
|
||||
.config("spark.executor.memory", "1g")
|
||||
.appName("SparkDemoFromS3")
|
||||
.getOrCreate();
|
||||
spark.sparkContext().hadoopConfiguration().set("fs.s3a.access.key", "admin");
|
||||
spark.sparkContext().hadoopConfiguration().set("fs.s3a.secret.key", "xx");
|
||||
spark.sparkContext().hadoopConfiguration().set("fs.s3a.endpoint", "ip:8333");
|
||||
spark.sparkContext().hadoopConfiguration().set("com.amazonaws.services.s3a.enableV4", "true");
|
||||
spark.sparkContext().hadoopConfiguration().set("fs.s3a.path.style.access", "true");
|
||||
spark.sparkContext().hadoopConfiguration().set("fs.s3a.connection.ssl.enabled", "false");
|
||||
spark.sparkContext().hadoopConfiguration().set("fs.s3a.multiobjectdelete.enable", "false");
|
||||
spark.sparkContext().hadoopConfiguration().set("fs.s3a.directory.marker.retention", "keep");
|
||||
spark.sparkContext().hadoopConfiguration().set("fs.s3a.change.detection.version.required", "false");
|
||||
spark.sparkContext().hadoopConfiguration().set("fs.s3a.change.detection.mode", "warn");
|
||||
RDD<String> rdd = spark.sparkContext().textFile("s3a://bk002/test1.txt", 1);
|
||||
System.out.println(rdd.count());
|
||||
rdd.saveAsTextFile("s3a://bk002/testcc/t2");
|
||||
|
||||
```
|
Loading…
Reference in a new issue