About ed_day

ed_day · ‎05-07-2018

Duh! Of course it's a different machine! I'll check it in the morning. Thanks!!

ed_day · ‎05-07-2018

I am trying to do the procedure here, but /usr/hdp/current/kafka-broker is a broken link and kafka-topics.sh is nowhere to be found. HDP shows Kafka service running OK. TIA!

ed_day · ‎12-06-2017

@Jay Kumar SenSharma Thanks! Sorry forgot to say I am trying to run Spark 2.2 as an independent service that uses HDP2.6. I assum this won't work for it.

ed_day · ‎12-06-2017

Thanks! Unfortunately it already has that line.

ed_day · ‎12-06-2017

EDIT: I forgot to say I am trying to run Spark 2.2 as an independent service that uses HDP2.6. Please help I am running out of time! I run: ./bin/spark-submit --class org.apache.spark.examples.SparkPi --master yarn --deploy-mode cluster --driver-memory 4g --executor-memory 2g --executor-cores 1 --queue thequeue examples/jars/spark-examples*.jar 10 --executor-cores 4 --num-executors 11 --driver-java-options="-Dhdp.version=2.6.0.3-8" --conf "spark.executor.extraJavaOptions=-Dhdp.version=2.6.0.3-8" I get this error: Spark YARN Cluster mode get this error “Could not find or load main class org.apache.spark.deploy.yarn.ApplicationMaster” I have tried all the fixes I can find except: 1. Classpath issues. Wher do I set this and to waht? 2. This question suggests it may be due to missing jars. Which jars do I need and what do I do with them? TIA!

ed_day · ‎12-05-2017

I've also tried the Dhdp.version= fixes from here. I've not put the new Spark on my other machines, could that be the problem, if so where do I put it? I created a new folder on master but if I use the same folder on the nodes, how does master knwo about it?

ed_day · ‎12-05-2017

I am trying to run Spark 2.2 with HDP 2.6. I stop Spark2 from Ambari, then I run: /home/ed/spark2.2/spark-2.2.0-bin-hadoop2.7/bin/spark-shell --jars /home/ed/.ivy2/jars/stanford-corenlp-3.6.0-models.jar,/home/ed/.ivy2/jars/jersey-bundle-1.19.1.jar --packages databricks:spark-corenlp:0.2.0-s_2.11,edu.stanford.nlp:stanford-corenlp:3.6.0 \--master yarn --deploy-mode client --driver-memory 4g --executor-memory 4g --executor-cores 2 --num-executors 11 --conf spark.hadoop.yarn.timeline-service.enabled=false It used to run fine, then it started giving me: Error initializing SparkContext.org.apache.spark.SparkException: Yarn application has already ended! It might have been killed or unable to launch application master. now it just hangs after: 17/12/05 07:41:17 WARN Client: Neither spark.yarn.jars nor spark.yarn.archive is set, falling back to uploading libraries under SPARK_HOME. I can run it OK, without --master yarn --deploy-mode client but then I get the driver only as executor. I have tried spark.hadoop.yarn.timeline-service.enabled = true. yarn.nodemanager.vmem-check-enabled and pmem are set to false. Can anyone help or point me where to look for errors? TIA! PS spark-defaults.conf: spark.driver.extraLibraryPath /usr/hdp/current/hadoop-client/lib/native:/usr/hdp/current/hadoop-client/lib/native/Linux-amd64-64 spark.eventLog.dir hdfs:///spark2-history/ spark.eventLog.enabled true spark.executor.extraLibraryPath /usr/hdp/current/hadoop-client/lib/native:/usr/hdp/current/hadoop-client/lib/native/Linux-amd64-64 spark.history.fs.logDirectory hdfs:///spark2-history/ spark.history.kerberos.keytab none spark.history.kerberos.principal none spark.history.provider org.apache.spark.deploy.history.FsHistoryProvider spark.history.ui.port 18081 spark.yarn.historyServer.address master.royble.co.uk:18081 spark.driver.extraJavaOptions -Dhdp.version=2.6.0.3-8 spark.yarn.am.extraJavaOptions -Dhdp.version=2.6.0.3-8 # spark.eventLog.dir hdfs:///spark-history # spark.eventLog.enabled true # spark.history.fs.logDirectory hdfs:///spark-history # spark.history.provider org.apache.spark.deploy.history.FsHistoryProvider # spark.history.ui.port 18080 spark.history.kerberos.keytab none spark.history.kerberos.principal none spark.yarn.containerLauncherMaxThreads 25 spark.yarn.driver.memoryOverhead 384 spark.yarn.executor.memoryOverhead 384 spark.yarn.historyServer.address spark-server:18081 spark.yarn.max.executor.failures 3 spark.yarn.preserve.staging.files false spark.yarn.queue default spark.yarn.scheduler.heartbeat.interval-ms 5000 spark.yarn.submit.file.replication 3 spark.jars.packages com.databricks:spark-csv_2.11:1.4.0 spark.io.compression.codec lzf spark.yarn.queue default spark.blockManager.port 38000 spark.broadcast.port 38001 spark.driver.port 38002 spark.executor.port 38003 spark.fileserver.port 38004 spark.replClassServer.port 38005

ed_day · ‎11-29-2017

I am getting desperate here! My Spark2 jobs take hours then get stuck! I have a 4 node cluster each with 16GB RAM and 8 cores. I run HDP 2.6, Spark 2.1 and Zeppelin 0.7. I have: spark.executor.instances 11 spark.executor.cores 2 spark.executor.memory 4G yarn.nodemanager.resource.memory-mb=14336 yarn.nodemanager.resource.cpu-vcores =7 Via Zeppelin (same notebook) I do an INSERT into a Hive table:: dfPredictions.write.mode(SaveMode.Append).insertInto("default.predictions") for a 50 column table with about 12 million records. This gets split into 3 stages of 75, 75 and 200 tasks. The 75 and 75 get stuck at stages 73 and 74 and the garbage collection lasts for hours. Any idea what I can try? EDIT: I have not looked at tweaking partitions, can anyone give me pointers on how to do that, please?

ed_day · ‎11-29-2017

Wow thanks. I'll try these tomorrow when my latest slow job finishes.

ed_day · ‎11-29-2017

My Hive query fails: java.sql.SQLException: Error while processing statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.tez.TezTask I cannot see any logs in the tez view. It looks like the parse stage is hiding to the right but I cannot access it? How do I look at it and where are the logs? TIA!! For some reason I cannot upload a jpg so there is a png here.

Online	Offline
Last Visited	‎05-08-2018 12:07 PM

Member Since	‎06-23-2016 11:52 AM
Last Visited	‎05-08-2018 12:07 PM
Posts	136
Kudos received	8

Cloudera Community

Re: Why is Spark2 running on only one node?

Re: Hive CLI filenotfoundexception

Re: How do I add github dependency to spark?

Re: Can someone confirm these repos exist for HDP ...

Re: Hive INSERT OVERWRITE struct NoMatchingMethodE...

Re: Ho do I test Kafka is working in HDP2.6?

Ho do I test Kafka is working in HDP2.6?

Re: Which jars are needed for ApplicationMaster?

Re: Which jars are needed for ApplicationMaster?

Which jars are needed for ApplicationMaster?

Re: How do I run Spark 2.2 on YARN?

How do I run Spark 2.2 on YARN?

Why is my Spark job stuck?

Re: Where are tez logs?

Where are tez logs?