- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
"bad substitution" error running Spark on Yarn
- Labels:
-
Apache Spark
Created 03-18-2016 11:32 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I'm attempting to run a Spark job via YARN using Gremlin (graph traversal language). However, the Application Master dies with a "bad substitution" error. I can see in the error message that ${hdp.version} isn't being resolved. According to various sources online I should be able to set the following property when I submit my job to fix the issue:
spark.yarn.am.extraJavaOptions=-Dhdp.version=2.3.4.0-3485
It sure seems like this should work, but it doesn't. Can anybody help?
Created 03-18-2016 11:45 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Background:
Starting with HDP 2.2 which is based on Hadoop 2.6, Hortonworks has added support for rolling upgrades (detailed description available here http://hortonworks.com/blog/introducing-rolling-upgrades-downgrades-apache-hadoop-yarn-cluster/). A fundamental assumption made by rolling upgrades is that jobs should not rely implicitly on the current version of artefacts such as jar files and native libraries, since they could change during the execution of a job in the middle of a rolling upgrade. Instead, the system is configured to require a particular value for hdp.version at the time of job submission.
Solution:
1. One option is to modify mapred-site.xml to replace the hdp.version property with the right value for your cluster. CAUTION - if you modify mapred-site.xml on a node on the cluster, this will break rolling upgrades in certain scenarios where a program like oozie submitting a job from that node will use the hardcoded version instead of the version specified by the client.
2. Better option is to:
a) create a file called java-opts with the following config value in it -Dhdp.version=2.3.4.0-3485. You can also specify the same value using SPARK_JAVA_OPTS, i.e. export SPARK_JAVA_OPTS="-Dhdp.version=2.3.4.0-3485"
b) modify /usr/hdp/current/spark-client/conf/spark-defaults.conf and add below lines
spark.driver.extraJavaOptions -Dhdp.version=2.3.4.0-3485 spark.yarn.am.extraJavaOptions -Dhdp.version=2.3.4.0-3485
Created 03-18-2016 11:45 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I observed similar issue with ${hdp.version} variable and getting same error message "bad substitution". I hard coded the version Number in Configuration files and running jobs.
Created 03-22-2016 07:44 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
We tried all of Ali's suggestions but we were only successful by hard-coding the ${hdp.version} in the mapreduce.application.classpath:
$PWD/mr-framework/hadoop/share/hadoop/mapreduce/*:$PWD/mr-framework/hadoop/share/hadoop/mapreduce/lib/*:$PWD/mr-framework/hadoop/share/hadoop/common/*:$PWD/mr-framework/hadoop/share/hadoop/common/lib/*:$PWD/mr-framework/hadoop/share/hadoop/yarn/*:$PWD/mr-framework/hadoop/share/hadoop/yarn/lib/*:$PWD/mr-framework/hadoop/share/hadoop/hdfs/*:$PWD/mr-framework/hadoop/share/hadoop/hdfs/lib/*:/usr/hdp/2.3.4.0-3485/hadoop/lib/hadoop-lzo-0.6.0.2.3.4.0-3485.jar:/etc/hadoop/conf/secure
Created 07-27-2016 07:00 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
you can also update the hdp.version property through Ambari GUI. below are the steps:
1. Go to 'Ambari -> YARN -> configs' and go to 'Advanced' tab.
2. scroll down the page to till end, there will find an option to add custom property for yarn-site
3. click on 'add property' and enter 'hdp.version' and the version value.
4. save the changes and restart the required services. It will deploy the hdp.verion property in yarn-site.xml
these steps worked for me, it should work for you as well 🙂 !
Created 08-02-2016 03:46 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
+1 from me on this work around
Created 10-06-2016 11:47 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Me too! Several hours wasted before I found this. Thanks Rama!
Created 01-26-2020 11:18 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
This worked!
I already made these changes prior to running the last command.
Created 02-13-2018 06:20 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Using HDP 2.5, submitting from a vanilla Spark 2.1.0 (i.e., not HDP), and deploying with deploy-mode "cluster", we were successful by using a variation on the suggestions of @Ali Bajwa:
- Creating a file named "java-opts" in our Spark conf directory containing "-Dhdp.version=2.5.x.x-xx" (subsituting our specific version)
- Adding into our Spark configuration (via the Spark submit --conf option) "spark.driver.extraJavaOptions=-Dhdp.version=2.5.x.x-xx" and "spark.executor.extraJavaOptions=-Dhdp.version=2.5.x.x-xx"
Both of the creation of the "java-opts" file and the spark configuration modifications were required for success in our case.
The "spark.driver.extraJavaOptions" option was definitely necessary in our case, but the "spark.executor.extraJavaOptions" may not be necessary. As I understand it, the "spark.yarn.am.extraJavaOptions" option that Ali mentioned is not relevant in cluster mode.
