Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

“User did not initialize spark context” Error when using Scala code in SPARK YARN Cluster mode

avatar
New Contributor

I have following code:

object LoaderProcessor extendsApp{val logger =LoggerFactory.getLogger(this.getClass())
execute()

def execute():Unit={
val spark = get_spark()
import spark.implicits._

var df = spark.read
  .format("csv")
  .option("delimiter",",")
  .option("header",true)
  .option("inferSchema","true")
  .option("timestampFormat","yyyy/MM/dd HH:mm:ss")
  .load(args(2))

df = df.withColumn("zs_source", lit(1))//the only operation on dataframe

val o_file =Config().getString("myapp.dataFolder")+"/8/1/data.csv"
logger.info("Writing output to: {}", o_file)


df.write.mode("overwrite").option("header","true").csv(o_file)
}

def get_spark():SparkSession={
val env =System.getenv("MYAPP_ENV")
var spark:SparkSession=null
if(env ==null|| env =="dev_local"){
  spark = org.apache.spark.sql.SparkSession.builder
    .master("local").appName("MyApp").getOrCreate;
}else{
  spark = org.apache.spark.sql.SparkSession.builder
    .appName("MyApp")//.enableHiveSupport().getOrCreate;
}
spark.sparkContext.setCheckpointDir(Config().getString("myapp.rddcp"))return spark
}
}

It works well in client mode. Could not figure out the problem. I have my clusters on HDInsight.

Also noticed that the "write" operation keeps writing on output folder like this:

part-00000-3e9566ae-c13c-468a-8732-e7b8a8df5335-c000.csv


and then in few seconds:

part-00000-4f4979a0-d9f9-481b-aac4-115e63b9f59c-c000.csv


8/12/01 15:08:53 INFO ApplicationMaster: Starting the user application in a separate Thread 18/12/01 15:08:53 INFO ApplicationMaster: Waiting for spark context initialization... 18/12/01 15:08:55 INFO Config$: Environment: dev 18/12/01 15:08:55 ERROR ApplicationMaster: Uncaught exception: java.lang.IllegalStateException: User did not initialize spark context! at org.apache.spark.deploy.yarn.ApplicationMaster.runDriver(ApplicationMaster.scala:510) at org.apache.spark.deploy.yarn.ApplicationMaster.org$apache$spark$deploy$yarn$ApplicationMaster$$runImpl(ApplicationMaster.scala:345) at org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$run$2.apply$mcV$sp(ApplicationMaster.scala:260) at org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$run$2.apply(ApplicationMaster.scala:260) at org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$run$2.apply(ApplicationMaster.scala:260) at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$5.run(ApplicationMaster.scala:815) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1869) at org.apache.spark.deploy.yarn.ApplicationMaster.doAsUser(ApplicationMaster.scala:814) at org.apache.spark.deploy.yarn.ApplicationMaster.run(ApplicationMaster.scala:259) at org.apache.spark.deploy.yarn.ApplicationMaster$.main(ApplicationMaster.scala:839) at org.apache.spark.deploy.yarn.ApplicationMaster.main(ApplicationMaster.scala)

spark-submit --master yarn --deploy-mode cluster --jars "wasb://xx@yy/zs/jars/config-1.3.1.jar" --class myapp.LoaderProcessor "wasb://xx@yy/zs/jars/myapp.jar" l 8 /data/8_data.csv 1 , true false -->PROBLEM

spark-submit --deploy-mode client --jars "wasb://xx@yy/zs/jars/config-1.3.1.jar" --class myapp.LoaderProcessor "wasb://xx@yy/zs/jars/myapp.jar" l 8 /data/8_data.csv 1 , true false -->WORKS!!!

3 REPLIES 3

avatar
New Contributor


@Debjyoti Das Did this resolve in Cluster Mode? Pyspark code when deployed on HD Insight in cluster mode has the same issue, I cant switch to client mode. Thanks!

avatar
Rising Star

@Debjyoti Das have you tried replacing:

spark = org.apache.spark.sql.SparkSession.builder.appName("MyApp")//.enableHiveSupport().getOrCreate;

With:

spark = org.apache.spark.sql.SparkSession.builder.appName("MyApp").getOrCreate();

avatar
Expert Contributor

Can you please try removing the master("local[*]") from the spark code and pass it as a parameter in spark submit -- master yarn --deploy-mode cluster.. It should work..