I am trying to manually add Spark 2.0.0 to an existing HDP 2.4 installation (on HDInsight 3.4). The necessary environment variables (such as HADOOP_HOME) are set correctly. The spark-defaults.conf file is identical to the existing Spark 1.6 installation, except for the parameter spark.yarn.jars which I updated. I was able to fix some Azure-specific dependencies (i.e. using hadoop-azure-2.7.2.jar). However, when I run pyspark I run into the following Yarn-error:
16/11/14 12:33:18 ERROR SparkContext: Error initializing SparkContext.
org.apache.spark.SparkException: Yarn application has already ended! It might have been killed or unable to launch application master.
Does anyone have experience with this issue / any tips or pointers? Thanks!
Thanks for you answer. However, actually the error message originates from the python/pyspark/shell.py script included in Spark 2.0, which is trying to create a SparkSession:
spark = SparkSession.builder\