diff --git a/Hadoop-Compatible-File-System.md b/Hadoop-Compatible-File-System.md index fde5381..ae4219f 100644 --- a/Hadoop-Compatible-File-System.md +++ b/Hadoop-Compatible-File-System.md @@ -91,39 +91,6 @@ $ bin/hdfs dfs -ls / $ bin/hdfs dfs -ls seaweedfs://localhost:8888/ ``` -# Installation for Spark -Follow instructions on spark doc: -* https://spark.apache.org/docs/latest/configuration.html#inheriting-hadoop-cluster-configuration -* https://spark.apache.org/docs/latest/configuration.html#custom-hadoophive-configuration - -## installation inheriting from Hadoop cluster configuration - -Inheriting from Hadoop cluster configuration should be the easiest way. - -To make these files visible to Spark, set HADOOP_CONF_DIR in $SPARK_HOME/conf/spark-env.sh to a location containing the configuration file `core-site.xml`, usually `/etc/hadoop/conf` - -## installation not inheriting from Hadoop cluster configuration - -Copy the seaweedfs-hadoop2-client-x.x.x.jar to all executor machines. - -Add the following to spark/conf/spark-defaults.conf on every node running Spark -``` -spark.driver.extraClassPath /path/to/seaweedfs-hadoop2-client-x.x.x.jar -spark.executor.extraClassPath /path/to/seaweedfs-hadoop2-client-x.x.x.jar -``` - -And modify the configuration at runntime: - -``` -./bin/spark-submit \ - --name "My app" \ - --master local[4] \ - --conf spark.eventLog.enabled=false \ - --conf "spark.executor.extraJavaOptions=-XX:+PrintGCDetails -XX:+PrintGCTimeStamps" \ - --conf spark.hadoop.fs.seaweedfs.impl=seaweed.hdfs.SeaweedFileSystem \ - --conf spark.hadoop.fs.defaultFS=seaweedfs://localhost:8888 \ - myApp.jar -``` # Supported HDFS Operations