Created 06-03-2016 02:06 PM
Hi, I'm running the Hortonworks sandbox 2.4 which comes with Spark 1.6.0. Trying to run a sample spark program and I'm able to submit it successfully with spark-submit. As I need to use this spark jar file in an external application, trying to use the "java -jar" command to achieve the same output as in spark-submit.
I'm using maven to build and used the maven shade plugin to build a fat jar as I was facing issues with the class not found exception earlier for the dependencies "spark-core_2.10" & "spark-yarn_2.10" artifacts. Now those issues are resolved.
However, the jar file is referring to the yarn-default.xml that comes with the dependencies inside the fat jar and not using the yarn-site.xml present in the Hortonworks sandbox. This is causing issues as below as it is not able to copy the file into hdfs.
java.io.FileNotFoundException: File file:/tmp/spark-1b318406-c7a1-4a94-9605- d6a46f0170d4/__spark_conf__5199620712586647591.zip does not exist
How can I make this jar file point to the Hortonworks sandbox settings from this default ones? If I don't build the fat jar, it was throwing the exception as below for a sample.
Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/spark/api/java/function/Function
Created 06-03-2016 02:47 PM
Hello @Pradeep K
Have you tried setting this environment variable on the target environment where you are running Spark job? The "yarn-site.xml" is typically located here in this conf directory.
We recommend that you set HADOOP_CONF_DIR to the appropriate directory; for example:
export HADOOP_CONF_DIR=/etc/hadoop/conf
In addition, make sure you configure "spark-defaults.conf" via Ambari under Spark service "Config" tab (or directly in $SPARK_HOME/conf if you are not running with Ambari). More instructions here:
spark-defaults.conf
Edit the spark-defaults.conf
file in the Spark client /conf
directory. Make sure the following values are specified, including hostname and port. (Note: if you installed the tech preview, these will already be in the file.) For example:
spark.yarn.historyServer.address c6401.ambari.apache.org:18080 spark.history.ui.port 18080 spark.yarn.services org.apache.spark.deploy.yarn.history.YarnHistoryService spark.driver.extraJavaOptions -Dhdp.version=2.3.0.0-2800 spark.history.provider org.apache.spark.deploy.yarn.history.YarnHistoryProvider spark.yarn.am.extraJavaOptions -Dhdp.version=2.3.0.0-2800
Created 06-06-2016 05:42 AM
Hello @Paul Hargis
I provided the below properties and gave a try. Similar error persists. The problem is the JAR file is referring to the yarn-default.xml that is embedded within the JAR which was downloaded by maven when building the fat jar. It is supposed to refer to yarn-site.xml and other files on the sandbox for things to work. The goal is to deploy and initiate this JAR through Spring Cloud Data flow. I'm exploring in that area as well if there is some option to override these properties.
Created 06-06-2016 02:29 PM
Okay, then have you tried copying the target file "yarn-site.xml" into the "src/main/resources" directory and then rebuilding the jar? This is the directory where maven will typically look for config files to be packaged with the fat jar. Granted, it is not a wonderful solution because it means each jar is already "targeted" for a given system (or cluster), but this is sometimes what is required.
Please note: yarn-default.xml and yarn-site.xml are both read by Hadoop YARN daemons, with yarn-default.xml denoting the defaults and yarn-site.xml representing custom configuration values that override those in yarn-default.xml.