Support Questions

jschivers · ‎03-18-2016

I'm attempting to run a Spark job via YARN using Gremlin (graph traversal language). However, the Application Master dies with a "bad substitution" error. I can see in the error message that ${hdp.version} isn't being resolved. According to various sources online I should be able to set the following property when I submit my job to fix the issue:

spark.yarn.am.extraJavaOptions=-Dhdp.version=2.3.4.0-3485

It sure seems like this should work, but it doesn't. Can anybody help?

abajwa · ‎03-18-2016

Background:

Starting with HDP 2.2 which is based on Hadoop 2.6, Hortonworks has added support for rolling upgrades (detailed description available here http://hortonworks.com/blog/introducing-rolling-upgrades-downgrades-apache-hadoop-yarn-cluster/). A fundamental assumption made by rolling upgrades is that jobs should not rely implicitly on the current version of artefacts such as jar files and native libraries, since they could change during the execution of a job in the middle of a rolling upgrade. Instead, the system is configured to require a particular value for hdp.version at the time of job submission.

Solution:

1. One option is to modify mapred-site.xml to replace the hdp.version property with the right value for your cluster. CAUTION - if you modify mapred-site.xml on a node on the cluster, this will break rolling upgrades in certain scenarios where a program like oozie submitting a job from that node will use the hardcoded version instead of the version specified by the client.

2. Better option is to:

a) create a file called java-opts with the following config value in it -Dhdp.version=2.3.4.0-3485. You can also specify the same value using SPARK_JAVA_OPTS, i.e. export SPARK_JAVA_OPTS="-Dhdp.version=2.3.4.0-3485"

b) modify /usr/hdp/current/spark-client/conf/spark-defaults.conf and add below lines

spark.driver.extraJavaOptions   -Dhdp.version=2.3.4.0-3485
spark.yarn.am.extraJavaOptions 	-Dhdp.version=2.3.4.0-3485

divakarreddy_a · ‎03-18-2016

@Jerrell Schivers

I observed similar issue with ${hdp.version} variable and getting same error message "bad substitution". I hard coded the version Number in Configuration files and running jobs.

bcobb · ‎03-22-2016

We tried all of Ali's suggestions but we were only successful by hard-coding the ${hdp.version} in the mapreduce.application.classpath:

$PWD/mr-framework/hadoop/share/hadoop/mapreduce/*:$PWD/mr-framework/hadoop/share/hadoop/mapreduce/lib/*:$PWD/mr-framework/hadoop/share/hadoop/common/*:$PWD/mr-framework/hadoop/share/hadoop/common/lib/*:$PWD/mr-framework/hadoop/share/hadoop/yarn/*:$PWD/mr-framework/hadoop/share/hadoop/yarn/lib/*:$PWD/mr-framework/hadoop/share/hadoop/hdfs/*:$PWD/mr-framework/hadoop/share/hadoop/hdfs/lib/*:/usr/hdp/2.3.4.0-3485/hadoop/lib/hadoop-lzo-0.6.0.2.3.4.0-3485.jar:/etc/hadoop/conf/secure

ramaraocse · ‎07-27-2016

you can also update the hdp.version property through Ambari GUI. below are the steps:

1. Go to 'Ambari -> YARN -> configs' and go to 'Advanced' tab.

2. scroll down the page to till end, there will find an option to add custom property for yarn-site

3. click on 'add property' and enter 'hdp.version' and the version value.

4. save the changes and restart the required services. It will deploy the hdp.verion property in yarn-site.xml

these steps worked for me, it should work for you as well 🙂 !

shaun_mcadams · ‎08-02-2016

+1 from me on this work around

ed_day · ‎10-06-2016

Me too! Several hours wasted before I found this. Thanks Rama!

Srini_hi · ‎01-26-2020

This worked!

I already made these changes prior to running the last command.

hdp-select status hadoop-client

Set a couple of parameters

export HADOOP_OPTS="-Dhdp.version=2.6.1.0-129”

export HADOOP_CONF_DIR=/etc/hadoop/conf

Source-in the environment

source ~/get_env.sh

Included last two lines to $SPARK_HOME/conf/spark-defaults.conf

spark.driver.extraJavaOptions -Dhdp.version=2.6.1.0-129

spark.yarn.am.extraJavaOptions -Dhdp.version=2.6.1.0-129

Added Hadoop version under Ambari / Yarn / Advanced / Custom:

hdp.version=2.6.1.0-129

Ensure this runs okay

yarn jar hadoop-mapreduce-examples.jar pi 5 5

Run spark pi example under yarn

cd /home/spark/spark-2.4.4-bin-hadoop2.7

spark-submit --class org.apache.spark.examples.SparkPi --master yarn --deploy-mode cluster --executor-memory 2G --num-executors 5 --executor-cores 2 --conf spark.authenticate.enableSaslEncryption=true --conf spark.network.sasl.serverAlwaysEncrypt=true --conf spark.authenticate=true examples/jars/spark-examples_2.11-2.4.4.jar 100

mathis_andrew · ‎02-13-2018

Using HDP 2.5, submitting from a vanilla Spark 2.1.0 (i.e., not HDP), and deploying with deploy-mode "cluster", we were successful by using a variation on the suggestions of @Ali Bajwa:

Creating a file named "java-opts" in our Spark conf directory containing "-Dhdp.version=2.5.x.x-xx" (subsituting our specific version)
Adding into our Spark configuration (via the Spark submit --conf option) "spark.driver.extraJavaOptions=-Dhdp.version=2.5.x.x-xx" and "spark.executor.extraJavaOptions=-Dhdp.version=2.5.x.x-xx"

Both of the creation of the "java-opts" file and the spark configuration modifications were required for success in our case.

The "spark.driver.extraJavaOptions" option was definitely necessary in our case, but the "spark.executor.extraJavaOptions" may not be necessary. As I understand it, the "spark.yarn.am.extraJavaOptions" option that Ali mentioned is not relevant in cluster mode.

Cloudera Community

Support Questions

"bad substitution" error running Spark on Yarn

Permission Error while running spark-shell

How to : capture Spark Driver and Executor Logs in...

Running Spark on HBase causes issue in Yarn job

spark application continuously running in Yarn

Running Hive LLAP on specific Nodes using YARN Nod...

How to install and run Spark 2.0 on HDP 2.5 Sandbo...

Run Spark App Error

Akka Error while running Spark Jobs

Can we have a long running Spark application which...

Spark job keeps on running even after killing appl...