Updated run Spark on SeaweedFS (markdown)

Chris Lu 2020-09-28 21:26:42 -07:00
parent caf9033926
commit e8d5c881b8

@ -11,7 +11,7 @@ To make these files visible to Spark, set HADOOP_CONF_DIR in $SPARK_HOME/conf/sp
## installation not inheriting from Hadoop cluster configuration
Copy the seaweedfs-hadoop2-client-x.x.x.jar to all executor machines.
Copy the seaweedfs-hadoop2-client-1.4.8.jar to all executor machines.
Add the following to spark/conf/spark-defaults.conf on every node running Spark
```
@ -19,7 +19,7 @@ spark.driver.extraClassPath=/path/to/seaweedfs-hadoop2-client-1.4.8.jar
spark.executor.extraClassPath=/path/to/seaweedfs-hadoop2-client-1.4.8.jar
```
And modify the configuration at runntime:
And modify the configuration at runtime:
```
./bin/spark-submit \
@ -31,3 +31,29 @@ And modify the configuration at runntime:
--conf spark.hadoop.fs.defaultFS=seaweedfs://localhost:8888 \
myApp.jar
```
# Example
1. change the spark-defaults.conf
```
spark.driver.extraClassPath=/Users/chris/go/src/github.com/chrislusf/seaweedfs/other/java/hdfs2/target/seaweedfs-hadoop2-client-1.4.8.jar
spark.executor.extraClassPath=/Users/chris/go/src/github.com/chrislusf/seaweedfs/other/java/hdfs2/target/seaweedfs-hadoop2-client-1.4.8.jar
spark.hadoop.fs.seaweedfs.impl=seaweed.hdfs.SeaweedFileSystem
```
2. create the spark history folder
```
$ curl -X POST http://192.168.2.3:8888/spark2-history/
```
3. Run a spark job
```
$ bin/spark-submit --name spark-pi \
--class org.apache.spark.examples.SparkPi \
--conf spark.jars.ivy=/tmp/.ivy \
--conf spark.eventLog.enabled=true \
--conf spark.hadoop.fs.defaultFS=seaweedfs://192.168.2.3:8888 \
--conf spark.eventLog.dir=seaweedfs://192.168.2.3:8888/spark2-history/ \
file:///usr/local/spark/examples/jars/spark-examples_2.12-3.0.0.jar
```