Reply
New Contributor
Posts: 1
Registered: ‎09-10-2015

Running Spark in Oozie using yarn-cluster

Hi,

 

I would like to run spark program from oozie. 

In the Spark program, there are some external jar like Spark-csv and etc.

 

I tried 2 approached:

First: copy the external jar into hdfs  Oozie/share folder,

Second: in the options list : i added the following command : --jars hdfs://<SERVER>:<PORT>/user/oozie/share/lib/spark/spark-csv_2.10-1.1.0.jar

 

however, i was still getting the following error.

 

 <<< Invocation of Main class completed <<<
  
  Failing Oozie Launcher, Main class [org.apache.oozie.action.hadoop.SparkMain], main() threw exception, Failed to load class for data source: com.databricks.spark.csv
  java.lang.RuntimeException: Failed to load class for data source: com.databricks.spark.csv
  at scala.sys.package$.error(package.scala:27)
  at org.apache.spark.sql.sources.ResolvedDataSource$.lookupDataSource(ddl.scala:268)
  at org.apache.spark.sql.sources.ResolvedDataSource$.apply(ddl.scala:279)
  at org.apache.spark.sql.SQLContext.load(SQLContext.scala:679)
  at com.fsx.bda.spark.core.FXCommon$.sourceFile(FXCommon.scala:18)
  at com.fsx.bda.spark.Transformation.SourceFlatFileStep.run(SourceFlatFileStep.scala:35)
  at com.fsx.bda.spark.core.SparkProcess.process(SparkProcess.scala:7)
  at com.fsx.bda.spark.core.Test1$.main(Test1.scala:95)
  at com.fsx.bda.spark.core.Test1.main(Test1.scala)
  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
  at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
  at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
  at java.lang.reflect.Method.invoke(Method.java:497)
  at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:569)
  at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:166)
  at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:189)
  at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:110)
  at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
  at org.apache.oozie.action.hadoop.SparkMain.runSpark(SparkMain.java:105)
  at org.apache.oozie.action.hadoop.SparkMain.run(SparkMain.java:96)
  at org.apache.oozie.action.hadoop.LauncherMain.run(LauncherMain.java:46)
  at org.apache.oozie.action.hadoop.SparkMain.main(SparkMain.java:40)
  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
  at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
  at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
  at java.lang.reflect.Method.invoke(Method.java:497)
  at org.apache.oozie.action.hadoop.LauncherMapper.map(LauncherMapper.java:228)
  at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
  at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:453)
  at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
  at org.apache.hadoop.mapred.LocalContainerLauncher$EventHandler.runSubtask(LocalContainerLauncher.java:370)
  at org.apache.hadoop.mapred.LocalContainerLauncher$EventHandler.runTask(LocalContainerLauncher.java:295)
  at org.apache.hadoop.mapred.LocalContainerLauncher$EventHandler.access$200(LocalContainerLauncher.java:181)
  at org.apache.hadoop.mapred.LocalContainerLauncher$EventHandler$1.run(LocalContainerLauncher.java:224)
  at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
  at java.util.concurrent.FutureTask.run(FutureTask.java:266)
  at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
  at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
  at java.lang.Thread.run(Thread.java:745)
  
  Oozie Launcher failed, finishing Hadoop job gracefully
  
  Oozie Launcher, uploading action data to HDFS sequence file: hdfs://MYRNDSVRVM350.bison.local:8020/user/teck.lee.tay/oozie-oozi/0000014-150910171550573-oozie-oozi-W/spark-21ed--spark/action-data.seq
  
  Oozie Launcher ends
              

Please help.

 

Thanks

 

Highlighted
Posts: 1,903
Kudos: 436
Solutions: 307
Registered: ‎07-31-2013

Re: Running Spark in Oozie using yarn-cluster

Please instead try adding the spark-csv_2.10-1.1.0.jar into your workflow, or workflow/lib/ directory.