Support Questions

Find answers, ask questions, and share your expertise

How do I run Spark 2.2 on YARN?

avatar
Expert Contributor

I am trying to run Spark 2.2 with HDP 2.6. I stop Spark2 from Ambari, then I run:

/home/ed/spark2.2/spark-2.2.0-bin-hadoop2.7/bin/spark-shell --jars /home/ed/.ivy2/jars/stanford-corenlp-3.6.0-models.jar,/home/ed/.ivy2/jars/jersey-bundle-1.19.1.jar --packages databricks:spark-corenlp:0.2.0-s_2.11,edu.stanford.nlp:stanford-corenlp:3.6.0 \--master yarn --deploy-mode client --driver-memory 4g --executor-memory 4g --executor-cores 2 --num-executors 11 --conf spark.hadoop.yarn.timeline-service.enabled=false

It used to run fine, then it started giving me:

Error initializing SparkContext.org.apache.spark.SparkException: Yarn application has already ended! It might have been killed or unable to launch application master.

now it just hangs after:

17/12/05 07:41:17 WARN Client: Neither spark.yarn.jars nor spark.yarn.archive is set, falling back to uploading libraries under SPARK_HOME.

I can run it OK, without --master yarn --deploy-mode client but then I get the driver only as executor.

I have tried spark.hadoop.yarn.timeline-service.enabled = true.

yarn.nodemanager.vmem-check-enabled and pmem are set to false.

Can anyone help or point me where to look for errors? TIA!

PS spark-defaults.conf:

spark.driver.extraLibraryPath /usr/hdp/current/hadoop-client/lib/native:/usr/hdp/current/hadoop-client/lib/native/Linux-amd64-64
spark.eventLog.dir hdfs:///spark2-history/
spark.eventLog.enabled true
spark.executor.extraLibraryPath /usr/hdp/current/hadoop-client/lib/native:/usr/hdp/current/hadoop-client/lib/native/Linux-amd64-64
spark.history.fs.logDirectory hdfs:///spark2-history/
spark.history.kerberos.keytab none
spark.history.kerberos.principal none
spark.history.provider org.apache.spark.deploy.history.FsHistoryProvider
spark.history.ui.port 18081
spark.yarn.historyServer.address master.royble.co.uk:18081
spark.driver.extraJavaOptions -Dhdp.version=2.6.0.3-8
spark.yarn.am.extraJavaOptions -Dhdp.version=2.6.0.3-8
# spark.eventLog.dir hdfs:///spark-history
# spark.eventLog.enabled true
# spark.history.fs.logDirectory hdfs:///spark-history
# spark.history.provider org.apache.spark.deploy.history.FsHistoryProvider
# spark.history.ui.port 18080
spark.history.kerberos.keytab none
spark.history.kerberos.principal none
spark.yarn.containerLauncherMaxThreads 25
spark.yarn.driver.memoryOverhead 384
spark.yarn.executor.memoryOverhead 384
spark.yarn.historyServer.address spark-server:18081
spark.yarn.max.executor.failures 3
spark.yarn.preserve.staging.files false
spark.yarn.queue default
spark.yarn.scheduler.heartbeat.interval-ms 5000
spark.yarn.submit.file.replication 3
spark.jars.packages com.databricks:spark-csv_2.11:1.4.0
spark.io.compression.codec lzf
spark.yarn.queue default
spark.blockManager.port 38000
spark.broadcast.port 38001
spark.driver.port 38002
spark.executor.port 38003
spark.fileserver.port 38004
spark.replClassServer.port 38005
1 ACCEPTED SOLUTION

avatar

@ed day: You need to copy spark jars to hdfs and configure the properties spark.yarn.jars or spark.yarn.archive appropriately.

Please refer official documentation: https://spark.apache.org/docs/latest/running-on-yarn.html#preparations

View solution in original post

2 REPLIES 2

avatar
Expert Contributor

I've also tried the Dhdp.version= fixes from here. I've not put the new Spark on my other machines, could that be the problem, if so where do I put it? I created a new folder on master but if I use the same folder on the nodes, how does master knwo about it?

avatar

@ed day: You need to copy spark jars to hdfs and configure the properties spark.yarn.jars or spark.yarn.archive appropriately.

Please refer official documentation: https://spark.apache.org/docs/latest/running-on-yarn.html#preparations