Support Questions

Find answers, ask questions, and share your expertise

“User did not initialize spark context” Error when using Scala code in SPARK YARN Cluster mode

avatar
New Contributor

I have following code:

object LoaderProcessor extendsApp{val logger =LoggerFactory.getLogger(this.getClass())
execute()

def execute():Unit={
val spark = get_spark()
import spark.implicits._

var df = spark.read
  .format("csv")
  .option("delimiter",",")
  .option("header",true)
  .option("inferSchema","true")
  .option("timestampFormat","yyyy/MM/dd HH:mm:ss")
  .load(args(2))

df = df.withColumn("zs_source", lit(1))//the only operation on dataframe

val o_file =Config().getString("myapp.dataFolder")+"/8/1/data.csv"
logger.info("Writing output to: {}", o_file)


df.write.mode("overwrite").option("header","true").csv(o_file)
}

def get_spark():SparkSession={
val env =System.getenv("MYAPP_ENV")
var spark:SparkSession=null
if(env ==null|| env =="dev_local"){
  spark = org.apache.spark.sql.SparkSession.builder
    .master("local").appName("MyApp").getOrCreate;
}else{
  spark = org.apache.spark.sql.SparkSession.builder
    .appName("MyApp")//.enableHiveSupport().getOrCreate;
}
spark.sparkContext.setCheckpointDir(Config().getString("myapp.rddcp"))return spark
}
}

It works well in client mode. Could not figure out the problem. I have my clusters on HDInsight.

Also noticed that the "write" operation keeps writing on output folder like this:

part-00000-3e9566ae-c13c-468a-8732-e7b8a8df5335-c000.csv


and then in few seconds:

part-00000-4f4979a0-d9f9-481b-aac4-115e63b9f59c-c000.csv


8/12/01 15:08:53 INFO ApplicationMaster: Starting the user application in a separate Thread 18/12/01 15:08:53 INFO ApplicationMaster: Waiting for spark context initialization... 18/12/01 15:08:55 INFO Config$: Environment: dev 18/12/01 15:08:55 ERROR ApplicationMaster: Uncaught exception: java.lang.IllegalStateException: User did not initialize spark context! at org.apache.spark.deploy.yarn.ApplicationMaster.runDriver(ApplicationMaster.scala:510) at org.apache.spark.deploy.yarn.ApplicationMaster.org$apache$spark$deploy$yarn$ApplicationMaster$$runImpl(ApplicationMaster.scala:345) at org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$run$2.apply$mcV$sp(ApplicationMaster.scala:260) at org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$run$2.apply(ApplicationMaster.scala:260) at org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$run$2.apply(ApplicationMaster.scala:260) at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$5.run(ApplicationMaster.scala:815) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1869) at org.apache.spark.deploy.yarn.ApplicationMaster.doAsUser(ApplicationMaster.scala:814) at org.apache.spark.deploy.yarn.ApplicationMaster.run(ApplicationMaster.scala:259) at org.apache.spark.deploy.yarn.ApplicationMaster$.main(ApplicationMaster.scala:839) at org.apache.spark.deploy.yarn.ApplicationMaster.main(ApplicationMaster.scala)

spark-submit --master yarn --deploy-mode cluster --jars "wasb://xx@yy/zs/jars/config-1.3.1.jar" --class myapp.LoaderProcessor "wasb://xx@yy/zs/jars/myapp.jar" l 8 /data/8_data.csv 1 , true false -->PROBLEM

spark-submit --deploy-mode client --jars "wasb://xx@yy/zs/jars/config-1.3.1.jar" --class myapp.LoaderProcessor "wasb://xx@yy/zs/jars/myapp.jar" l 8 /data/8_data.csv 1 , true false -->WORKS!!!

3 REPLIES 3

avatar
New Contributor


@Debjyoti Das Did this resolve in Cluster Mode? Pyspark code when deployed on HD Insight in cluster mode has the same issue, I cant switch to client mode. Thanks!

avatar
Rising Star

@Debjyoti Das have you tried replacing:

spark = org.apache.spark.sql.SparkSession.builder.appName("MyApp")//.enableHiveSupport().getOrCreate;

With:

spark = org.apache.spark.sql.SparkSession.builder.appName("MyApp").getOrCreate();

avatar
Expert Contributor

Can you please try removing the master("local[*]") from the spark code and pass it as a parameter in spark submit -- master yarn --deploy-mode cluster.. It should work..