Created 04-30-2018 01:05 PM
I am trying to setup spark on yarn cluster using Hadoop cookbook using HDP distribution. As part of this i am using yarn-env.sh to configure yarn resourcemanager.
I have the following in yarn-env.sh
export YARN_RESOURCEMANAGER_OPTS="-Dyarn.resourcemanager.hostname=192.168.33.33"
I am able to see the cluster on the http://192.168.33.33:8088/cluster/nodes. This also shows 2 nodemanagers connected to it. But when i run
yarn node --list
it tries to connect to 0.0.0.0 and gives the following log
18/04/30 06:09:29 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
18/04/30 06:09:31 INFO ipc.Client: Retrying connect to server: 0.0.0.0/0.0.0.0:8032. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
After some retries it fails.
The same error is received when i run spark-submit using following command
spark-submit --class org.apache.spark.examples.SparkPi --master yarn --deploy-mode cluster --executor-memory 1G /usr/hdp/2.6.3.0-235/spark2/examples/jars/spark-examples_2.11-2.2.0.2.6.3.0-235.jar
as spark uses $HADOOP_CONF_DIR for getting yarn configurations.
What is the cause. Does yarn command reads only from yarn-site.xml and not yarn-env.sh
Created 05-01-2018 06:33 AM
yarn-env.sh is used when you run any yarn command. So it works if you use the yarn command to submit a mapreduce job as below.
yarn jar /usr/hdp/current/hadoop-mapreduce-client/hadoop-mapreduce-examples.jar pi 5 5
But spark-submit command doesn't invoke yarn-env.sh, so it will read the yarn-site.xml from $HADOOP_CONF_DIR and gets resourcemanager address from it.
Created 05-03-2018 07:05 AM
@Tarun Parimi But yarn commands are also trying to connect to 0.0.0.0. Instead of resource manager IP Address. This happens on both master and slave machines.
Created 05-03-2018 10:24 AM
I didn't notice that you were only setting YARN_RESOURCEMANAGER_OPTS. This env variable is used for only the resourcemanger daemon. So to specify the opts for all hadoop and yarn client commands, you can use HADOOP_CLIENT_OPTS in . hadoop-env.sh .
export HADOOP_CLIENT_OPTS="-Dyarn.resourcemanager.hostname=192.168.33.33"
But I am not sure why you would need to this when you can just set it in the yarn-site.xml, which is what is recommended.