Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Can we have a long running Spark application which can run on Yarn, does Yarn supports long running applications?

Solved Go to solution

Can we have a long running Spark application which can run on Yarn, does Yarn supports long running applications?

New Contributor

We are trying to run the Data Science team Spark code Jar on our Hortonworks cluster. They have a long running Spark application which run on mesos in their Datastax cluster. We just need to make it run on our Hortonworks cluster using Yarn as the cluster manager. We have the deploy script which starts an application and then posts the input arguments in a JSON to that long running application via http POST.

1 ACCEPTED SOLUTION

Accepted Solutions

Re: Can we have a long running Spark application which can run on Yarn, does Yarn supports long running applications?

Super Guru

yes you can. I have found this article here useful to understand how spark job (long running) run on yarn.

http://mkuthan.github.io/blog/2016/09/30/spark-streaming-on-yarn/

4 REPLIES 4

Re: Can we have a long running Spark application which can run on Yarn, does Yarn supports long running applications?

Super Guru

yes you can. I have found this article here useful to understand how spark job (long running) run on yarn.

http://mkuthan.github.io/blog/2016/09/30/spark-streaming-on-yarn/

Re: Can we have a long running Spark application which can run on Yarn, does Yarn supports long running applications?

New Contributor

Thanks Sunile,

This article is really informative. For now we are trying to run this Spark application as a single run service instead of as a Webservice and are going to give all the arguments through the command line only. We have actually found a different entry point to this service and are now trying to run it through the command line. Since I'm new to scala and spark I'm finding it difficult to start this application on my Hortonworks Sandbox. I've already setup the required Jar files and able to submit the application to Spark2 on Hortonworks 2.5 but I'm stuck on the point where I've to run this on Yarn in cluster mode. I'm getting the below error, I've kept all the required jars under the root directory from where I'm running this command. Please note that I'm able to execute this in local mode but don't know what's the issue with the Yarn cluster mode!

16/12/22 09:25:40 ERROR ApplicationMaster: User class threw exception: java.sql.SQLException: No suitable driver
java.sql.SQLException: No suitable driver

The command which I'm using is given below:

su root --command "/usr/hdp/2.5.0.0-1245/spark2/bin/spark-submit --class com.kronos.research.svc.datascience.DataScienceApp --verbose --master yarn --deploy-mode cluster --driver-memory 2g --executor-memory 2g --executor-cores 1 --jars ./dst-svc-reporting-assembly-0.0.1-SNAPSHOT-deps.jar --conf spark.scheduler.mode=FIFO --conf spark.cassandra.output.concurrent.writes=5 --conf spark.cassandra.output.batch.size.bytes=4096 --conf spark.cassandra.output.consistency.level=ALL --conf spark.cassandra.input.consistency.level=LOCAL_ONE --conf spark.executor.extraClassPath=sqljdbc4.jar:ojdbc6.jar --conf spark.executor.extraJavaOptions=\"-Duser.timezone=UTC \" --driver-class-path sqljdbc4.jar:ojdbc6.jar --driver-java-options \"-Dspark.ui.port=0 -Doracle.jdbc.timezoneAsRegion=false -Dspark.cassandra.connection.host=catalyst-1.int.kronos.com,catalyst-2.int.kronos.com,catalyst-3.int.kronos.com -Duser.timezone=UTC\" /root/dst-svc-reporting-assembly-0.0.1-SNAPSHOT.jar -job.type=pipeline -pipeline=usagemon -jdbc.type=tenant -jdbc.tenant=10013 -date.start=\"2013-02-01\" -date.end=\"2016-05-01\" -labor.level=4 -output.type=console -jdbc.throttle=15 -jdbc.cache.dir=hdfs://sandbox.hortonworks.com:8020/etl"

Re: Can we have a long running Spark application which can run on Yarn, does Yarn supports long running applications?

Super Guru

Zeppelin is a good example of one.

Re: Can we have a long running Spark application which can run on Yarn, does Yarn supports long running applications?

Expert Contributor

Without the full exception stack trace its difficult to know what happened.

If you are instantiating hive then you may need to add hive-site.xml and the data-nucleus jars to the job. e.g. like

--jars /usr/hdp/current/spark-client/lib/datanucleus-api-jdo-3.2.6.jar,/usr/hdp/current/spark-client/lib/datanucleus-rdbms-3.2.9.jar,/usr/hdp/current/spark-client/lib/datanucleus-core-3.2.10.jar  --files /usr/hdp/current/spark-client/conf/hive-site.xml
Don't have an account?
Coming from Hortonworks? Activate your account here