Archives of Support Questions (Read Only)

This is an archived board for historical reference. Information and links may no longer be available or relevant
Announcements
This board is archived and read-only for historical reference. To ask a new question, please post a new topic on the appropriate active board.

How do I run Spark 2.2 on YARN?

avatar
Expert Contributor

I am trying to run Spark 2.2 with HDP 2.6. I stop Spark2 from Ambari, then I run:

/home/ed/spark2.2/spark-2.2.0-bin-hadoop2.7/bin/spark-shell --jars /home/ed/.ivy2/jars/stanford-corenlp-3.6.0-models.jar,/home/ed/.ivy2/jars/jersey-bundle-1.19.1.jar --packages databricks:spark-corenlp:0.2.0-s_2.11,edu.stanford.nlp:stanford-corenlp:3.6.0 \--master yarn --deploy-mode client --driver-memory 4g --executor-memory 4g --executor-cores 2 --num-executors 11 --conf spark.hadoop.yarn.timeline-service.enabled=false

It used to run fine, then it started giving me:

Error initializing SparkContext.org.apache.spark.SparkException: Yarn application has already ended! It might have been killed or unable to launch application master.

now it just hangs after:

17/12/05 07:41:17 WARN Client: Neither spark.yarn.jars nor spark.yarn.archive is set, falling back to uploading libraries under SPARK_HOME.

I can run it OK, without --master yarn --deploy-mode client but then I get the driver only as executor.

I have tried spark.hadoop.yarn.timeline-service.enabled = true.

yarn.nodemanager.vmem-check-enabled and pmem are set to false.

Can anyone help or point me where to look for errors? TIA!

PS spark-defaults.conf:

spark.driver.extraLibraryPath /usr/hdp/current/hadoop-client/lib/native:/usr/hdp/current/hadoop-client/lib/native/Linux-amd64-64
spark.eventLog.dir hdfs:///spark2-history/
spark.eventLog.enabled true
spark.executor.extraLibraryPath /usr/hdp/current/hadoop-client/lib/native:/usr/hdp/current/hadoop-client/lib/native/Linux-amd64-64
spark.history.fs.logDirectory hdfs:///spark2-history/
spark.history.kerberos.keytab none
spark.history.kerberos.principal none
spark.history.provider org.apache.spark.deploy.history.FsHistoryProvider
spark.history.ui.port 18081
spark.yarn.historyServer.address master.royble.co.uk:18081
spark.driver.extraJavaOptions -Dhdp.version=2.6.0.3-8
spark.yarn.am.extraJavaOptions -Dhdp.version=2.6.0.3-8
# spark.eventLog.dir hdfs:///spark-history
# spark.eventLog.enabled true
# spark.history.fs.logDirectory hdfs:///spark-history
# spark.history.provider org.apache.spark.deploy.history.FsHistoryProvider
# spark.history.ui.port 18080
spark.history.kerberos.keytab none
spark.history.kerberos.principal none
spark.yarn.containerLauncherMaxThreads 25
spark.yarn.driver.memoryOverhead 384
spark.yarn.executor.memoryOverhead 384
spark.yarn.historyServer.address spark-server:18081
spark.yarn.max.executor.failures 3
spark.yarn.preserve.staging.files false
spark.yarn.queue default
spark.yarn.scheduler.heartbeat.interval-ms 5000
spark.yarn.submit.file.replication 3
spark.jars.packages com.databricks:spark-csv_2.11:1.4.0
spark.io.compression.codec lzf
spark.yarn.queue default
spark.blockManager.port 38000
spark.broadcast.port 38001
spark.driver.port 38002
spark.executor.port 38003
spark.fileserver.port 38004
spark.replClassServer.port 38005
1 ACCEPTED SOLUTION

avatar

@ed day: You need to copy spark jars to hdfs and configure the properties spark.yarn.jars or spark.yarn.archive appropriately.

Please refer official documentation: https://spark.apache.org/docs/latest/running-on-yarn.html#preparations

View solution in original post

2 REPLIES 2

avatar
Expert Contributor

I've also tried the Dhdp.version= fixes from here. I've not put the new Spark on my other machines, could that be the problem, if so where do I put it? I created a new folder on master but if I use the same folder on the nodes, how does master knwo about it?

avatar

@ed day: You need to copy spark jars to hdfs and configure the properties spark.yarn.jars or spark.yarn.archive appropriately.

Please refer official documentation: https://spark.apache.org/docs/latest/running-on-yarn.html#preparations