Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

“User did not initialize spark context” Error when using Scala code in SPARK YARN Cluster mode

Highlighted

“User did not initialize spark context” Error when using Scala code in SPARK YARN Cluster mode

New Contributor

I have following code:

object LoaderProcessor extendsApp{val logger =LoggerFactory.getLogger(this.getClass())
execute()

def execute():Unit={
val spark = get_spark()
import spark.implicits._

var df = spark.read
  .format("csv")
  .option("delimiter",",")
  .option("header",true)
  .option("inferSchema","true")
  .option("timestampFormat","yyyy/MM/dd HH:mm:ss")
  .load(args(2))

df = df.withColumn("zs_source", lit(1))//the only operation on dataframe

val o_file =Config().getString("myapp.dataFolder")+"/8/1/data.csv"
logger.info("Writing output to: {}", o_file)


df.write.mode("overwrite").option("header","true").csv(o_file)
}

def get_spark():SparkSession={
val env =System.getenv("MYAPP_ENV")
var spark:SparkSession=null
if(env ==null|| env =="dev_local"){
  spark = org.apache.spark.sql.SparkSession.builder
    .master("local").appName("MyApp").getOrCreate;
}else{
  spark = org.apache.spark.sql.SparkSession.builder
    .appName("MyApp")//.enableHiveSupport().getOrCreate;
}
spark.sparkContext.setCheckpointDir(Config().getString("myapp.rddcp"))return spark
}
}

It works well in client mode. Could not figure out the problem. I have my clusters on HDInsight.

Also noticed that the "write" operation keeps writing on output folder like this:

part-00000-3e9566ae-c13c-468a-8732-e7b8a8df5335-c000.csv


and then in few seconds:

part-00000-4f4979a0-d9f9-481b-aac4-115e63b9f59c-c000.csv


8/12/01 15:08:53 INFO ApplicationMaster: Starting the user application in a separate Thread 18/12/01 15:08:53 INFO ApplicationMaster: Waiting for spark context initialization... 18/12/01 15:08:55 INFO Config$: Environment: dev 18/12/01 15:08:55 ERROR ApplicationMaster: Uncaught exception: java.lang.IllegalStateException: User did not initialize spark context! at org.apache.spark.deploy.yarn.ApplicationMaster.runDriver(ApplicationMaster.scala:510) at org.apache.spark.deploy.yarn.ApplicationMaster.org$apache$spark$deploy$yarn$ApplicationMaster$$runImpl(ApplicationMaster.scala:345) at org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$run$2.apply$mcV$sp(ApplicationMaster.scala:260) at org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$run$2.apply(ApplicationMaster.scala:260) at org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$run$2.apply(ApplicationMaster.scala:260) at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$5.run(ApplicationMaster.scala:815) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1869) at org.apache.spark.deploy.yarn.ApplicationMaster.doAsUser(ApplicationMaster.scala:814) at org.apache.spark.deploy.yarn.ApplicationMaster.run(ApplicationMaster.scala:259) at org.apache.spark.deploy.yarn.ApplicationMaster$.main(ApplicationMaster.scala:839) at org.apache.spark.deploy.yarn.ApplicationMaster.main(ApplicationMaster.scala)

spark-submit --master yarn --deploy-mode cluster --jars "wasb://xx@yy/zs/jars/config-1.3.1.jar" --class myapp.LoaderProcessor "wasb://xx@yy/zs/jars/myapp.jar" l 8 /data/8_data.csv 1 , true false -->PROBLEM

spark-submit --deploy-mode client --jars "wasb://xx@yy/zs/jars/config-1.3.1.jar" --class myapp.LoaderProcessor "wasb://xx@yy/zs/jars/myapp.jar" l 8 /data/8_data.csv 1 , true false -->WORKS!!!

3 REPLIES 3
Highlighted

Re: “User did not initialize spark context” Error when using Scala code in SPARK YARN Cluster mode

New Contributor


@Debjyoti Das Did this resolve in Cluster Mode? Pyspark code when deployed on HD Insight in cluster mode has the same issue, I cant switch to client mode. Thanks!

Re: “User did not initialize spark context” Error when using Scala code in SPARK YARN Cluster mode

Contributor

@Debjyoti Das have you tried replacing:

spark = org.apache.spark.sql.SparkSession.builder.appName("MyApp")//.enableHiveSupport().getOrCreate;

With:

spark = org.apache.spark.sql.SparkSession.builder.appName("MyApp").getOrCreate();
Highlighted

Re: “User did not initialize spark context” Error when using Scala code in SPARK YARN Cluster mode

Contributor

Can you please try removing the master("local[*]") from the spark code and pass it as a parameter in spark submit -- master yarn --deploy-mode cluster.. It should work..

Don't have an account?
Coming from Hortonworks? Activate your account here