Created 05-08-2017 10:24 AM
Hi All,
When I am trying to run the spark application in YARN mode using the HDFS file system it works fine when I provide the below properties .
sparkConf.set("spark.hadoop.yarn.resourcemanager.hostname",resourcemanagerHostname); sparkConf.set("spark.hadoop.yarn.resourcemanager.address",resourcemanagerAddress); sparkConf.set("spark.yarn.stagingDir",stagingDirectory );
But the problem here ,
1.
Since my HDFS is NamdeNode HA enabled it won't work when I provide spark.yarn.stagingDir has the commons URL of hdfs example hdfs://hdcluster/user/tmp/ it gives error has unknown host hdcluster , But it works fine when I give the URL as hdfs://<ActiveNameNode>/user/tmp/ , But we don't in advance which will be active so how to resolve this .
And few things I have noticed are SparkContext takes the Hadoop configuration but SparkConfiguration class won't have any methods to accepts Hadoop configuration.
2.
How To provide the resource Manager address when Resource Manager are running in HA .
Thanks in Advance ,
Param.
Created 05-08-2017 06:12 PM
1. Can you try setting spark.yarn.stagingDir to hdfs:///user/tmp/ ?
2. Can you please share which spark config are you trying to set which require RM address?
Created 05-09-2017 07:34 AM
@yvora Thanks for the response.
1. Can you try setting spark.yarn.stagingDir to hdfs:///user/tmp/ ?
This is not working .
2. Can you please share which spark config are you trying to set which require RM address?
I am trying to run the Spark application through java program , so when the master is yarn , by default it connects to resource manager @ 0.0.0.0:8032 in order to override this property , I need to set the same in spark configuration i.e
sparkConf.set("spark.hadoop.yarn.resourcemanager.hostname",resourcemanagerHostname); sparkConf.set("spark.hadoop.yarn.resourcemanager.address",resourcemanagerAddress);
But the problem is when the I have HA enabled the resource manager How to I connect to it .
And some idea I got about my question is :
There is way to achieve this in the Spark context as below .
JavaSparkContext jsc = new JavaSparkContext(sparkConf);
jsc.hadoopConfiguration().addResource(new Path(hadoopClusterSiteFilesBasePath + "core-site.xml")); jsc.hadoopConfiguration().addResource(new Path(hadoopClusterSiteFilesBasePath + "hdfs-site.xml")); jsc.hadoopConfiguration().addResource(new Path(hadoopClusterSiteFilesBasePath + "mapred-site.xml")); jsc.hadoopConfiguration().addResource(new Path(hadoopClusterSiteFilesBasePath + "yarn-site.xml"));
But it needs the resource manager and staging directory configuration even before creating the context ,so there is problem .
And what I am looking for is something like above for the SparkConguration class/object .
Thanks ,
Param.
Created 10-17-2018 10:13 AM
I am facing the same issue with trying to start a sparksession on yarn. Did you solve this ?