Created 12-05-2017 07:54 AM
I am trying to run Spark 2.2 with HDP 2.6. I stop Spark2 from Ambari, then I run:
/home/ed/spark2.2/spark-2.2.0-bin-hadoop2.7/bin/spark-shell --jars /home/ed/.ivy2/jars/stanford-corenlp-3.6.0-models.jar,/home/ed/.ivy2/jars/jersey-bundle-1.19.1.jar --packages databricks:spark-corenlp:0.2.0-s_2.11,edu.stanford.nlp:stanford-corenlp:3.6.0 \--master yarn --deploy-mode client --driver-memory 4g --executor-memory 4g --executor-cores 2 --num-executors 11 --conf spark.hadoop.yarn.timeline-service.enabled=false
It used to run fine, then it started giving me:
Error initializing SparkContext.org.apache.spark.SparkException: Yarn application has already ended! It might have been killed or unable to launch application master.
now it just hangs after:
17/12/05 07:41:17 WARN Client: Neither spark.yarn.jars nor spark.yarn.archive is set, falling back to uploading libraries under SPARK_HOME.
I can run it OK, without --master yarn --deploy-mode client but then I get the driver only as executor.
I have tried spark.hadoop.yarn.timeline-service.enabled = true.
yarn.nodemanager.vmem-check-enabled and pmem are set to false.
Can anyone help or point me where to look for errors? TIA!
PS spark-defaults.conf:
spark.driver.extraLibraryPath /usr/hdp/current/hadoop-client/lib/native:/usr/hdp/current/hadoop-client/lib/native/Linux-amd64-64 spark.eventLog.dir hdfs:///spark2-history/ spark.eventLog.enabled true spark.executor.extraLibraryPath /usr/hdp/current/hadoop-client/lib/native:/usr/hdp/current/hadoop-client/lib/native/Linux-amd64-64 spark.history.fs.logDirectory hdfs:///spark2-history/ spark.history.kerberos.keytab none spark.history.kerberos.principal none spark.history.provider org.apache.spark.deploy.history.FsHistoryProvider spark.history.ui.port 18081 spark.yarn.historyServer.address master.royble.co.uk:18081 spark.driver.extraJavaOptions -Dhdp.version=2.6.0.3-8 spark.yarn.am.extraJavaOptions -Dhdp.version=2.6.0.3-8 # spark.eventLog.dir hdfs:///spark-history # spark.eventLog.enabled true # spark.history.fs.logDirectory hdfs:///spark-history # spark.history.provider org.apache.spark.deploy.history.FsHistoryProvider # spark.history.ui.port 18080 spark.history.kerberos.keytab none spark.history.kerberos.principal none spark.yarn.containerLauncherMaxThreads 25 spark.yarn.driver.memoryOverhead 384 spark.yarn.executor.memoryOverhead 384 spark.yarn.historyServer.address spark-server:18081 spark.yarn.max.executor.failures 3 spark.yarn.preserve.staging.files false spark.yarn.queue default spark.yarn.scheduler.heartbeat.interval-ms 5000 spark.yarn.submit.file.replication 3 spark.jars.packages com.databricks:spark-csv_2.11:1.4.0 spark.io.compression.codec lzf spark.yarn.queue default spark.blockManager.port 38000 spark.broadcast.port 38001 spark.driver.port 38002 spark.executor.port 38003 spark.fileserver.port 38004 spark.replClassServer.port 38005
Created 12-18-2017 07:08 PM
@ed day: You need to copy spark jars to hdfs and configure the properties spark.yarn.jars or spark.yarn.archive appropriately.
Please refer official documentation: https://spark.apache.org/docs/latest/running-on-yarn.html#preparations
Created 12-05-2017 05:19 PM
I've also tried the Dhdp.version= fixes from here. I've not put the new Spark on my other machines, could that be the problem, if so where do I put it? I created a new folder on master but if I use the same folder on the nodes, how does master knwo about it?
Created 12-18-2017 07:08 PM
@ed day: You need to copy spark jars to hdfs and configure the properties spark.yarn.jars or spark.yarn.archive appropriately.
Please refer official documentation: https://spark.apache.org/docs/latest/running-on-yarn.html#preparations