Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Can we have a long running Spark application which can run on Yarn, does Yarn supports long running applications?

avatar
Rising Star

We are trying to run the Data Science team Spark code Jar on our Hortonworks cluster. They have a long running Spark application which run on mesos in their Datastax cluster. We just need to make it run on our Hortonworks cluster using Yarn as the cluster manager. We have the deploy script which starts an application and then posts the input arguments in a JSON to that long running application via http POST.

1 ACCEPTED SOLUTION

avatar
Master Guru

yes you can. I have found this article here useful to understand how spark job (long running) run on yarn.

http://mkuthan.github.io/blog/2016/09/30/spark-streaming-on-yarn/

View solution in original post

4 REPLIES 4

avatar
Master Guru

yes you can. I have found this article here useful to understand how spark job (long running) run on yarn.

http://mkuthan.github.io/blog/2016/09/30/spark-streaming-on-yarn/

avatar
Rising Star

Thanks Sunile,

This article is really informative. For now we are trying to run this Spark application as a single run service instead of as a Webservice and are going to give all the arguments through the command line only. We have actually found a different entry point to this service and are now trying to run it through the command line. Since I'm new to scala and spark I'm finding it difficult to start this application on my Hortonworks Sandbox. I've already setup the required Jar files and able to submit the application to Spark2 on Hortonworks 2.5 but I'm stuck on the point where I've to run this on Yarn in cluster mode. I'm getting the below error, I've kept all the required jars under the root directory from where I'm running this command. Please note that I'm able to execute this in local mode but don't know what's the issue with the Yarn cluster mode!

16/12/22 09:25:40 ERROR ApplicationMaster: User class threw exception: java.sql.SQLException: No suitable driver
java.sql.SQLException: No suitable driver

The command which I'm using is given below:

su root --command "/usr/hdp/2.5.0.0-1245/spark2/bin/spark-submit --class com.kronos.research.svc.datascience.DataScienceApp --verbose --master yarn --deploy-mode cluster --driver-memory 2g --executor-memory 2g --executor-cores 1 --jars ./dst-svc-reporting-assembly-0.0.1-SNAPSHOT-deps.jar --conf spark.scheduler.mode=FIFO --conf spark.cassandra.output.concurrent.writes=5 --conf spark.cassandra.output.batch.size.bytes=4096 --conf spark.cassandra.output.consistency.level=ALL --conf spark.cassandra.input.consistency.level=LOCAL_ONE --conf spark.executor.extraClassPath=sqljdbc4.jar:ojdbc6.jar --conf spark.executor.extraJavaOptions=\"-Duser.timezone=UTC \" --driver-class-path sqljdbc4.jar:ojdbc6.jar --driver-java-options \"-Dspark.ui.port=0 -Doracle.jdbc.timezoneAsRegion=false -Dspark.cassandra.connection.host=catalyst-1.int.kronos.com,catalyst-2.int.kronos.com,catalyst-3.int.kronos.com -Duser.timezone=UTC\" /root/dst-svc-reporting-assembly-0.0.1-SNAPSHOT.jar -job.type=pipeline -pipeline=usagemon -jdbc.type=tenant -jdbc.tenant=10013 -date.start=\"2013-02-01\" -date.end=\"2016-05-01\" -labor.level=4 -output.type=console -jdbc.throttle=15 -jdbc.cache.dir=hdfs://sandbox.hortonworks.com:8020/etl"

avatar
Master Guru

Zeppelin is a good example of one.

avatar
Super Collaborator

Without the full exception stack trace its difficult to know what happened.

If you are instantiating hive then you may need to add hive-site.xml and the data-nucleus jars to the job. e.g. like

--jars /usr/hdp/current/spark-client/lib/datanucleus-api-jdo-3.2.6.jar,/usr/hdp/current/spark-client/lib/datanucleus-rdbms-3.2.9.jar,/usr/hdp/current/spark-client/lib/datanucleus-core-3.2.10.jar  --files /usr/hdp/current/spark-client/conf/hive-site.xml