Updated run Spark on SeaweedFS (markdown)

2024-01-19 02:48:24 +00:00 · 2020-09-28 21:26:42 -07:00 · 2020-09-28 21:26:42 -07:00 · e8d5c881b8
parent caf9033926
commit e8d5c881b8
1 changed files with 28 additions and 2 deletions
--- a/run-Spark-on-SeaweedFS.md
+++ b/run-Spark-on-SeaweedFS.md
@ -11,7 +11,7 @@ To make these files visible to Spark, set HADOOP_CONF_DIR in $SPARK_HOME/conf/sp

 ## installation not inheriting from Hadoop cluster configuration

-Copy the seaweedfs-hadoop2-client-x.x.x.jar to all executor machines.
+Copy the seaweedfs-hadoop2-client-1.4.8.jar to all executor machines.

 Add the following to spark/conf/spark-defaults.conf on every node running Spark
 ```
@ -19,7 +19,7 @@ spark.driver.extraClassPath=/path/to/seaweedfs-hadoop2-client-1.4.8.jar
 spark.executor.extraClassPath=/path/to/seaweedfs-hadoop2-client-1.4.8.jar
 ```

-And modify the configuration at runntime:
+And modify the configuration at runtime:

 ```
 ./bin/spark-submit \ 
@ -31,3 +31,29 @@ And modify the configuration at runntime:
  --conf spark.hadoop.fs.defaultFS=seaweedfs://localhost:8888 \ 
  myApp.jar
 ```
+
+# Example
+
+  1. change the spark-defaults.conf
+
+```
+spark.driver.extraClassPath=/Users/chris/go/src/github.com/chrislusf/seaweedfs/other/java/hdfs2/target/seaweedfs-hadoop2-client-1.4.8.jar
+spark.executor.extraClassPath=/Users/chris/go/src/github.com/chrislusf/seaweedfs/other/java/hdfs2/target/seaweedfs-hadoop2-client-1.4.8.jar
+spark.hadoop.fs.seaweedfs.impl=seaweed.hdfs.SeaweedFileSystem
+```
+
+  2. create the spark history folder
+```
+$ curl -X POST http://192.168.2.3:8888/spark2-history/
+```
+  3. Run a spark job
+```
+$ bin/spark-submit --name spark-pi \
+--class org.apache.spark.examples.SparkPi \
+--conf spark.jars.ivy=/tmp/.ivy \
+--conf spark.eventLog.enabled=true \
+--conf spark.hadoop.fs.defaultFS=seaweedfs://192.168.2.3:8888 \
+--conf spark.eventLog.dir=seaweedfs://192.168.2.3:8888/spark2-history/ \
+file:///usr/local/spark/examples/jars/spark-examples_2.12-3.0.0.jar
+
+```