Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

"bad substitution" error running Spark on Yarn

Highlighted

"bad substitution" error running Spark on Yarn

New Contributor

I'm attempting to run a Spark job via YARN using Gremlin (graph traversal language). However, the Application Master dies with a "bad substitution" error. I can see in the error message that ${hdp.version} isn't being resolved. According to various sources online I should be able to set the following property when I submit my job to fix the issue:

spark.yarn.am.extraJavaOptions=-Dhdp.version=2.3.4.0-3485

It sure seems like this should work, but it doesn't. Can anybody help?

7 REPLIES 7

Re: "bad substitution" error running Spark on Yarn

Background:

Starting with HDP 2.2 which is based on Hadoop 2.6, Hortonworks has added support for rolling upgrades (detailed description available here http://hortonworks.com/blog/introducing-rolling-upgrades-downgrades-apache-hadoop-yarn-cluster/). A fundamental assumption made by rolling upgrades is that jobs should not rely implicitly on the current version of artefacts such as jar files and native libraries, since they could change during the execution of a job in the middle of a rolling upgrade. Instead, the system is configured to require a particular value for hdp.version at the time of job submission.

Solution:

1. One option is to modify mapred-site.xml to replace the hdp.version property with the right value for your cluster. CAUTION - if you modify mapred-site.xml on a node on the cluster, this will break rolling upgrades in certain scenarios where a program like oozie submitting a job from that node will use the hardcoded version instead of the version specified by the client.

2. Better option is to:

a) create a file called java-opts with the following config value in it -Dhdp.version=2.3.4.0-3485. You can also specify the same value using SPARK_JAVA_OPTS, i.e. export SPARK_JAVA_OPTS="-Dhdp.version=2.3.4.0-3485"

b) modify /usr/hdp/current/spark-client/conf/spark-defaults.conf and add below lines

spark.driver.extraJavaOptions   -Dhdp.version=2.3.4.0-3485
spark.yarn.am.extraJavaOptions 	-Dhdp.version=2.3.4.0-3485

Re: "bad substitution" error running Spark on Yarn

@Jerrell Schivers

I observed similar issue with ${hdp.version} variable and getting same error message "bad substitution". I hard coded the version Number in Configuration files and running jobs.

Re: "bad substitution" error running Spark on Yarn

Contributor

We tried all of Ali's suggestions but we were only successful by hard-coding the ${hdp.version} in the mapreduce.application.classpath:

$PWD/mr-framework/hadoop/share/hadoop/mapreduce/*:$PWD/mr-framework/hadoop/share/hadoop/mapreduce/lib/*:$PWD/mr-framework/hadoop/share/hadoop/common/*:$PWD/mr-framework/hadoop/share/hadoop/common/lib/*:$PWD/mr-framework/hadoop/share/hadoop/yarn/*:$PWD/mr-framework/hadoop/share/hadoop/yarn/lib/*:$PWD/mr-framework/hadoop/share/hadoop/hdfs/*:$PWD/mr-framework/hadoop/share/hadoop/hdfs/lib/*:/usr/hdp/2.3.4.0-3485/hadoop/lib/hadoop-lzo-0.6.0.2.3.4.0-3485.jar:/etc/hadoop/conf/secure

Re: "bad substitution" error running Spark on Yarn

New Contributor

you can also update the hdp.version property through Ambari GUI. below are the steps:

1. Go to 'Ambari -> YARN -> configs' and go to 'Advanced' tab.

2. scroll down the page to till end, there will find an option to add custom property for yarn-site

3. click on 'add property' and enter 'hdp.version' and the version value.

4. save the changes and restart the required services. It will deploy the hdp.verion property in yarn-site.xml

these steps worked for me, it should work for you as well :) !

Re: "bad substitution" error running Spark on Yarn

New Contributor

+1 from me on this work around

Re: "bad substitution" error running Spark on Yarn

Expert Contributor

Me too! Several hours wasted before I found this. Thanks Rama!

Re: "bad substitution" error running Spark on Yarn

New Contributor

Using HDP 2.5, submitting from a vanilla Spark 2.1.0 (i.e., not HDP), and deploying with deploy-mode "cluster", we were successful by using a variation on the suggestions of @Ali Bajwa:

  • Creating a file named "java-opts" in our Spark conf directory containing "-Dhdp.version=2.5.x.x-xx" (subsituting our specific version)
  • Adding into our Spark configuration (via the Spark submit --conf option) "spark.driver.extraJavaOptions=-Dhdp.version=2.5.x.x-xx" and "spark.executor.extraJavaOptions=-Dhdp.version=2.5.x.x-xx"

Both of the creation of the "java-opts" file and the spark configuration modifications were required for success in our case.

The "spark.driver.extraJavaOptions" option was definitely necessary in our case, but the "spark.executor.extraJavaOptions" may not be necessary. As I understand it, the "spark.yarn.am.extraJavaOptions" option that Ali mentioned is not relevant in cluster mode.