Created run Spark on SeaweedFS (markdown)

2024-01-19 02:48:24 +00:00 · 2019-09-03 01:16:20 -07:00 · 2019-09-03 01:16:20 -07:00 · 61dc41624f
parent 9e0a43b3f1
commit 61dc41624f
1 changed files with 33 additions and 0 deletions
--- a/run-Spark-on-SeaweedFS.md
+++ b/run-Spark-on-SeaweedFS.md
@ -0,0 +1,33 @@
+# Installation for Spark
+Follow instructions on spark doc: 
+* https://spark.apache.org/docs/latest/configuration.html#inheriting-hadoop-cluster-configuration
+* https://spark.apache.org/docs/latest/configuration.html#custom-hadoophive-configuration
+
+## installation inheriting from Hadoop cluster configuration
+
+Inheriting from Hadoop cluster configuration should be the easiest way. 
+
+To make these files visible to Spark, set HADOOP_CONF_DIR in $SPARK_HOME/conf/spark-env.sh to a location containing the configuration file `core-site.xml`, usually `/etc/hadoop/conf`
+
+## installation not inheriting from Hadoop cluster configuration
+
+Copy the seaweedfs-hadoop2-client-x.x.x.jar to all executor machines.
+
+Add the following to spark/conf/spark-defaults.conf on every node running Spark
+```
+spark.driver.extraClassPath   /path/to/seaweedfs-hadoop2-client-x.x.x.jar
+spark.executor.extraClassPath /path/to/seaweedfs-hadoop2-client-x.x.x.jar
+```
+
+And modify the configuration at runntime:
+
+```
+./bin/spark-submit \ 
+  --name "My app" \ 
+  --master local[4] \  
+  --conf spark.eventLog.enabled=false \ 
+  --conf "spark.executor.extraJavaOptions=-XX:+PrintGCDetails -XX:+PrintGCTimeStamps" \ 
+  --conf spark.hadoop.fs.seaweedfs.impl=seaweed.hdfs.SeaweedFileSystem \ 
+  --conf spark.hadoop.fs.defaultFS=seaweedfs://localhost:8888 \ 
+  myApp.jar
+```