Created 07-26-2019 05:19 PM
I use IntelliJ IDE installed on my Windows 10 laptop and try to run spark job in yarn mode on my 5 node HDP 3.1.1 cluster.
My codes:
package p1 import org.apache.spark.sql.{SparkSession, functions => F} import org.apache.log4j.{Logger, Level} object SparkDeneme extends App { Logger.getLogger("org").setLevel(Level.INFO) val spark = SparkSession.builder() .appName("SparkDeneme") .master("yarn") .config("spark.hadoop.fs.defaultFS","hdfs://node1.impektra.com:8020") .config("spark.hadoop.yarn.resoursemanager.address","node1.impektra.com:8030") .getOrCreate() } import spark.implicits._ val sc = spark.sparkContext val dfFromList = sc.parallelize(List(1,2,3,4,5,6)).toDF("rakamlar") // dfFromList.printSchema() dfFromList.show()
When I run get following error:
19/07/26 20:00:32 ERROR SparkContext: Error initializing SparkContext. org.apache.spark.SparkException: Could not parse Master URL: 'yarn' at org.apache.spark.SparkContext$.org$apache$spark$SparkContext$$createTaskScheduler(SparkContext.scala:2744) at org.apache.spark.SparkContext.<init>(SparkContext.scala:492) at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2493) at org.apache.spark.sql.SparkSession$Builder$$anonfun$7.apply(SparkSession.scala:933) at org.apache.spark.sql.SparkSession$Builder$$anonfun$7.apply(SparkSession.scala:924) at scala.Option.getOrElse(Option.scala:121) at org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:924) at p1.SparkDeneme$.delayedEndpoint$p1$SparkDeneme$1(SparkDeneme.scala:17) at p1.SparkDeneme$delayedInit$body.apply(SparkDeneme.scala:8) at scala.Function0$class.apply$mcV$sp(Function0.scala:34) at scala.runtime.AbstractFunction0.apply$mcV$sp(AbstractFunction0.scala:12) at scala.App$$anonfun$main$1.apply(App.scala:76) at scala.App$$anonfun$main$1.apply(App.scala:76) at scala.collection.immutable.List.foreach(List.scala:381) at scala.collection.generic.TraversableForwarder$class.foreach(TraversableForwarder.scala:35) at scala.App$class.main(App.scala:76) at p1.SparkDeneme$.main(SparkDeneme.scala:8) at p1.SparkDeneme.main(SparkDeneme.scala)
I tried to get help from this tutorial
Anyone who has succeeded to run Spark YARN mode in IntelliJ?
Created 07-28-2019 10:45 PM
Did you try using yarn-client (or) yarn-client instead of yarn in .master.
If error still exists then add spark-yarn.jar to the build path, then try to submit the job again.
Refer to this link for more details about similar issue.
Created 07-28-2019 10:45 PM
Did you try using yarn-client (or) yarn-client instead of yarn in .master.
If error still exists then add spark-yarn.jar to the build path, then try to submit the job again.
Refer to this link for more details about similar issue.
Created 07-30-2019 09:34 AM
Hi @Shu I tried yarn-client and spark-yarn.jar But it can't pass the Could not parse Master URL: 'yarn' ERROR
Created 07-31-2019 02:39 AM
Try specifying defaultFS,resourcemanager address
val spark = SparkSession.builder().master("yarn") .config("spark.hadoop.fs.defaultFS","<name_node_address>") .config("spark.hadoop.yarn.resourcemanager.address","<resourcemanager_address>") .app_name("<job_name>") .enableHiveSupport() .getOrCreate()
and then add spark-yarn_x.x.jar to maven repository and try to run again.
Created 07-31-2019 06:46 AM
Hi @Shu thank you. Adding spark-yarn_x.x.jar to maven repository solved the problem. But I have come across other errors. Anyway in here the problem was parsing the yarn and it is solved.