Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Run Spark 2.4 jobs on HDP 3.1

Run Spark 2.4 jobs on HDP 3.1

New Contributor

Due to requirements of my Spark structured streaming application, I must run my app in Spark 2.4(.3).

I tried installing Spark 2.4 alongside the HDP installation and link it with the HDP services. Loosely following the steps provided here https://community.cloudera.com/t5/Advanced-Analytics-Apache-Spark/Multiple-Spark-version-on-the-same... for a similar case with CDH.

I got thing running a bit, and I can run my job in local mode. With the caveat that also this local job can not connect to HDFS (e.g. for storing the checkpoint files) with URLs like hdfs:///foo/bar. It will always require the hostname of the namenode to be specified.

But my jobs fail totally when I try to submit them in YARN cluster mode. The basic hurdle seems to be the setting of hdp.version. I could fix this for local mode by putting "spark.executor.extraJavaOptions=-Dhdp.version=3.1.0.0-78" into spark-defaults.conf. But even if I put the same on the command line for spark-submit, the job then fails with

[2019-06-03 12:18:20.797]Container exited with a non-zero exit code 1. Error file: prelaunch.err.
Last 4096 bytes of prelaunch.err :
/app/hadoop/yarn/local/usercache/xxx/appcache/application_1557437878895_0077/container_e138_1557437878895_0077_02_000001/launch_container.sh: line 39: $PWD:$PWD/__spark_conf__:$PWD/__spark_libs__/*:$HADOOP_CONF_DIR:/usr/hdp/3.1.0.0-78/hadoop/*:/usr/hdp/3.1.0.0-78/hadoop/lib/*:/usr/hdp/current/hadoop-hdfs-client/*:/usr/hdp/current/hadoop-hdfs-client/lib/*:/usr/hdp/current/hadoop-yarn-client/*:/usr/hdp/current/hadoop-yarn-client/lib/*:$PWD/mr-framework/hadoop/share/hadoop/mapreduce/*:$PWD/mr-framework/hadoop/share/hadoop/mapreduce/lib/*:$PWD/mr-framework/hadoop/share/hadoop/common/*:$PWD/mr-framework/hadoop/share/hadoop/common/lib/*:$PWD/mr-framework/hadoop/share/hadoop/yarn/*:$PWD/mr-framework/hadoop/share/hadoop/yarn/lib/*:$PWD/mr-framework/hadoop/share/hadoop/hdfs/*:$PWD/mr-framework/hadoop/share/hadoop/hdfs/lib/*:$PWD/mr-framework/hadoop/share/hadoop/tools/lib/*:/usr/hdp/${hdp.version}/hadoop/lib/hadoop-lzo-0.6.0.${hdp.version}.jar:/etc/hadoop/conf/secure:$PWD/__spark_conf__/__hadoop_conf__: bad substitution

Any hints how I can get this running?

2 REPLIES 2

Re: Run Spark 2.4 jobs on HDP 3.1

New Contributor

To resolve the error message above, I followed the instructions from https://help.talend.com/reader/94B268T~nLXQ60ogwYcJxA/qjFmkaPs6bapYtWH4ZW~WA and added the hdp.version property in the custom yarn-site.xml (in Ambari)

After restarting all affected services, my jobs started to function

Re: Run Spark 2.4 jobs on HDP 3.1

New Contributor

Never mind, a colleague pointed me to https://community.hortonworks.com/articles/244059/steps-to-install-supplementary-spark-on-hdp-cluste... which is exactly what I was looking for