Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

"bad substitution" error running Spark on Yarn

avatar
Explorer

I'm attempting to run a Spark job via YARN using Gremlin (graph traversal language). However, the Application Master dies with a "bad substitution" error. I can see in the error message that ${hdp.version} isn't being resolved. According to various sources online I should be able to set the following property when I submit my job to fix the issue:

spark.yarn.am.extraJavaOptions=-Dhdp.version=2.3.4.0-3485

It sure seems like this should work, but it doesn't. Can anybody help?

8 REPLIES 8

avatar

Background:

Starting with HDP 2.2 which is based on Hadoop 2.6, Hortonworks has added support for rolling upgrades (detailed description available here http://hortonworks.com/blog/introducing-rolling-upgrades-downgrades-apache-hadoop-yarn-cluster/). A fundamental assumption made by rolling upgrades is that jobs should not rely implicitly on the current version of artefacts such as jar files and native libraries, since they could change during the execution of a job in the middle of a rolling upgrade. Instead, the system is configured to require a particular value for hdp.version at the time of job submission.

Solution:

1. One option is to modify mapred-site.xml to replace the hdp.version property with the right value for your cluster. CAUTION - if you modify mapred-site.xml on a node on the cluster, this will break rolling upgrades in certain scenarios where a program like oozie submitting a job from that node will use the hardcoded version instead of the version specified by the client.

2. Better option is to:

a) create a file called java-opts with the following config value in it -Dhdp.version=2.3.4.0-3485. You can also specify the same value using SPARK_JAVA_OPTS, i.e. export SPARK_JAVA_OPTS="-Dhdp.version=2.3.4.0-3485"

b) modify /usr/hdp/current/spark-client/conf/spark-defaults.conf and add below lines

spark.driver.extraJavaOptions   -Dhdp.version=2.3.4.0-3485
spark.yarn.am.extraJavaOptions 	-Dhdp.version=2.3.4.0-3485

avatar

@Jerrell Schivers

I observed similar issue with ${hdp.version} variable and getting same error message "bad substitution". I hard coded the version Number in Configuration files and running jobs.

avatar
Rising Star

We tried all of Ali's suggestions but we were only successful by hard-coding the ${hdp.version} in the mapreduce.application.classpath:

$PWD/mr-framework/hadoop/share/hadoop/mapreduce/*:$PWD/mr-framework/hadoop/share/hadoop/mapreduce/lib/*:$PWD/mr-framework/hadoop/share/hadoop/common/*:$PWD/mr-framework/hadoop/share/hadoop/common/lib/*:$PWD/mr-framework/hadoop/share/hadoop/yarn/*:$PWD/mr-framework/hadoop/share/hadoop/yarn/lib/*:$PWD/mr-framework/hadoop/share/hadoop/hdfs/*:$PWD/mr-framework/hadoop/share/hadoop/hdfs/lib/*:/usr/hdp/2.3.4.0-3485/hadoop/lib/hadoop-lzo-0.6.0.2.3.4.0-3485.jar:/etc/hadoop/conf/secure

avatar
New Contributor

you can also update the hdp.version property through Ambari GUI. below are the steps:

1. Go to 'Ambari -> YARN -> configs' and go to 'Advanced' tab.

2. scroll down the page to till end, there will find an option to add custom property for yarn-site

3. click on 'add property' and enter 'hdp.version' and the version value.

4. save the changes and restart the required services. It will deploy the hdp.verion property in yarn-site.xml

these steps worked for me, it should work for you as well 🙂 !

avatar
New Contributor

+1 from me on this work around

avatar
Expert Contributor

Me too! Several hours wasted before I found this. Thanks Rama!

avatar
Explorer

This worked!

 

 

I already made these changes prior to running the last command.

 

hdp-select status hadoop-client
 
 
Set a couple of parameters
export HADOOP_OPTS="-Dhdp.version=2.6.1.0-129”
export HADOOP_CONF_DIR=/etc/hadoop/conf
 
Source-in the environment
source ~/get_env.sh
 
 
 
Included last two lines to $SPARK_HOME/conf/spark-defaults.conf
spark.driver.extraJavaOptions   -Dhdp.version=2.6.1.0-129
spark.yarn.am.extraJavaOptions -Dhdp.version=2.6.1.0-129
 
Added Hadoop version under Ambari / Yarn / Advanced / Custom:
 hdp.version=2.6.1.0-129
 
 
Ensure this runs okay 
yarn jar hadoop-mapreduce-examples.jar pi 5 5
 
Run spark pi example under yarn
 
cd /home/spark/spark-2.4.4-bin-hadoop2.7
 
spark-submit   --class org.apache.spark.examples.SparkPi  --master yarn   --deploy-mode cluster   --executor-memory 2G  --num-executors 5  --executor-cores 2   --conf spark.authenticate.enableSaslEncryption=true   --conf spark.network.sasl.serverAlwaysEncrypt=true   --conf spark.authenticate=true   examples/jars/spark-examples_2.11-2.4.4.jar   100 

avatar
New Contributor

Using HDP 2.5, submitting from a vanilla Spark 2.1.0 (i.e., not HDP), and deploying with deploy-mode "cluster", we were successful by using a variation on the suggestions of @Ali Bajwa:

  • Creating a file named "java-opts" in our Spark conf directory containing "-Dhdp.version=2.5.x.x-xx" (subsituting our specific version)
  • Adding into our Spark configuration (via the Spark submit --conf option) "spark.driver.extraJavaOptions=-Dhdp.version=2.5.x.x-xx" and "spark.executor.extraJavaOptions=-Dhdp.version=2.5.x.x-xx"

Both of the creation of the "java-opts" file and the spark configuration modifications were required for success in our case.

The "spark.driver.extraJavaOptions" option was definitely necessary in our case, but the "spark.executor.extraJavaOptions" may not be necessary. As I understand it, the "spark.yarn.am.extraJavaOptions" option that Ali mentioned is not relevant in cluster mode.