Support Questions

shikhar_agarwal · ‎12-22-2016

We are trying to run the Data Science team Spark code Jar on our Hortonworks cluster. They have a long running Spark application which run on mesos in their Datastax cluster. We just need to make it run on our Hortonworks cluster using Yarn as the cluster manager. We have the deploy script which starts an application and then posts the input arguments in a JSON to that long running application via http POST.

sunile_manjee · ‎12-22-2016

yes you can. I have found this article here useful to understand how spark job (long running) run on yarn.

http://mkuthan.github.io/blog/2016/09/30/spark-streaming-on-yarn/

View solution in original post

sunile_manjee · ‎12-22-2016

yes you can. I have found this article here useful to understand how spark job (long running) run on yarn.

http://mkuthan.github.io/blog/2016/09/30/spark-streaming-on-yarn/

shikhar_agarwal · ‎12-22-2016

Thanks Sunile,

This article is really informative. For now we are trying to run this Spark application as a single run service instead of as a Webservice and are going to give all the arguments through the command line only. We have actually found a different entry point to this service and are now trying to run it through the command line. Since I'm new to scala and spark I'm finding it difficult to start this application on my Hortonworks Sandbox. I've already setup the required Jar files and able to submit the application to Spark2 on Hortonworks 2.5 but I'm stuck on the point where I've to run this on Yarn in cluster mode. I'm getting the below error, I've kept all the required jars under the root directory from where I'm running this command. Please note that I'm able to execute this in local mode but don't know what's the issue with the Yarn cluster mode!

16/12/22 09:25:40 ERROR ApplicationMaster: User class threw exception: java.sql.SQLException: No suitable driver
java.sql.SQLException: No suitable driver

The command which I'm using is given below:

su root --command "/usr/hdp/2.5.0.0-1245/spark2/bin/spark-submit --class com.kronos.research.svc.datascience.DataScienceApp --verbose --master yarn --deploy-mode cluster --driver-memory 2g --executor-memory 2g --executor-cores 1 --jars ./dst-svc-reporting-assembly-0.0.1-SNAPSHOT-deps.jar --conf spark.scheduler.mode=FIFO --conf spark.cassandra.output.concurrent.writes=5 --conf spark.cassandra.output.batch.size.bytes=4096 --conf spark.cassandra.output.consistency.level=ALL --conf spark.cassandra.input.consistency.level=LOCAL_ONE --conf spark.executor.extraClassPath=sqljdbc4.jar:ojdbc6.jar --conf spark.executor.extraJavaOptions=\"-Duser.timezone=UTC \" --driver-class-path sqljdbc4.jar:ojdbc6.jar --driver-java-options \"-Dspark.ui.port=0 -Doracle.jdbc.timezoneAsRegion=false -Dspark.cassandra.connection.host=catalyst-1.int.kronos.com,catalyst-2.int.kronos.com,catalyst-3.int.kronos.com -Duser.timezone=UTC\" /root/dst-svc-reporting-assembly-0.0.1-SNAPSHOT.jar -job.type=pipeline -pipeline=usagemon -jdbc.type=tenant -jdbc.tenant=10013 -date.start=\"2013-02-01\" -date.end=\"2016-05-01\" -labor.level=4 -output.type=console -jdbc.throttle=15 -jdbc.cache.dir=hdfs://sandbox.hortonworks.com:8020/etl"

TimothySpann · ‎12-22-2016

Zeppelin is a good example of one.

bikas · ‎12-22-2016

Without the full exception stack trace its difficult to know what happened.

If you are instantiating hive then you may need to add hive-site.xml and the data-nucleus jars to the job. e.g. like

--jars /usr/hdp/current/spark-client/lib/datanucleus-api-jdo-3.2.6.jar,/usr/hdp/current/spark-client/lib/datanucleus-rdbms-3.2.9.jar,/usr/hdp/current/spark-client/lib/datanucleus-core-3.2.10.jar  --files /usr/hdp/current/spark-client/conf/hive-site.xml

Cloudera Community

Support Questions

Can we have a long running Spark application which can run on Yarn, does Yarn supports long running applications?