Created 04-27-2018 10:18 PM
Hello All
I am having problems to run an oozie spark action, with spark-submit runs ok, this is the command
spark-submit --driver-class-path=/filesystem/path/terajdbc4.jar:/filesystem/path/tdgssconfig.jar --class myapp.MainClass my-app-1.jar
Now I created the ozzie action, this is:
<action name="cargarTD"> <spark xmlns="uri:oozie:spark-action:0.1"> <job-tracker>${jobTracker}</job-tracker> <name-node>${nameNode}</name-node> <job-xml>/some_path/generic/hive-site.xml</job-xml> <configuration> <property> <name>mapred.job.queue.name</name> <value>${queueName}</value> </property> <property> <name>oozie.use.system.libpath</name> <value>true</value> </property> <property> <name>oozie.libpath</name> <value>/some_other_path/oozie/share/lib/lib_20170127130322/hive</value> </property> </configuration> <master>yarn-master</master> <mode>cluster</mode> <name>Test spark oozie</name> <class>myapp.MainClass</class> <jar>hdfs://mycompanyhdfs/myAppHdfsPath/jar/my-app-1.jar</jar> <spark-opts>--driver-class-path hdfs://mycompanyhdfs/myAppHdfsPath/jar/terajdbc4.jar:hdfs://mycompanyhdfs/myAppHdfsPath/jar/tdgssconfig.jar: --conf "spark.executor.extraClassPath=hdfs://mycompanyhdfs/myAppHdfsPath/jar/terajdbc4.jar:hdfs://mycompanyhdfs/myAppHdfsPath/jar/tdgssconfig.jar" --conf "spark.driver.extraClassPath=hdfs://mycompanyhdfs/myAppHdfsPath/jar/terajdbc4.jar:hdfs://mycompanyhdfs/myAppHdfsPath/jar/tdgssconfig.jar"</spark-opts> </spark> <ok to="End"/> <error to="Kill"/> </action>
i run it, and have this error:
18/04/27 16:50:58 INFO ExecutorRunnable: =============================================================================== YARN executor launch context: env: CLASSPATH -> hdfs://mycompanyhdfs/myAppHdfsPath/jar/terajdbc4.jar:hdfs://mycompanyhdfs/myAppHdfsPath/jar/tdgssconfig.jar spark.driver.extraClassPath=hdfs://mycompanyhdfs/myAppHdfsPath/jar/terajdbc4.jar:hdfs://mycompanyhdfs/myAppHdfsPath/jar/tdgssconfig.jar<CPS>{{PWD}}<CPS>{{PWD}}/__spark_conf__<CPS>{{PWD}}/__spark__.jar<CPS>$HADOOP_CONF_DIR<CPS>/usr/hdp/current/hadoop-client/*<CPS>/usr/hdp/current/hadoop-client/lib/*<CPS>/usr/hdp/current/hadoop-hdfs-client/*<CPS>/usr/hdp/current/hadoop-hdfs-client/lib/*<CPS>/usr/hdp/current/hadoop-yarn-client/*<CPS>/usr/hdp/current/hadoop-yarn-client/lib/*<CPS>$PWD/mr-framework/hadoop/share/hadoop/mapreduce/*:$PWD/mr-framework/hadoop/share/hadoop/mapreduce/lib/*:$PWD/mr-framework/hadoop/share/hadoop/common/*:$PWD/mr-framework/hadoop/share/hadoop/common/lib/*:$PWD/mr-framework/hadoop/share/hadoop/yarn/*:$PWD/mr-framework/hadoop/share/hadoop/yarn/lib/*:$PWD/mr-framework/hadoop/share/hadoop/hdfs/*:$PWD/mr-framework/hadoop/share/hadoop/hdfs/lib/*:$PWD/mr-framework/hadoop/share/hadoop/tools/lib/*:/usr/hdp/2.5.0.0-1245/hadoop/lib/hadoop-lzo-0.6.0.2.5.0.0-1245.jar:/etc/hadoop/conf/secure SPARK_YARN_CACHE_ARCHIVES -> hdfs://mycompanyhdfs/user/dataloaderusr/.sparkStaging/application_1524487536165_7870/__spark_conf__1770230405461321022.zip#__spark_conf__ SPARK_LOG_URL_STDERR -> http://probighhww006:8042/node/containerlogs/container_e321_1524487536165_7870_01_000003/dataloaderu... SPARK_YARN_CACHE_FILES_FILE_SIZES -> 188727178,39194 SPARK_YARN_STAGING_DIR -> .sparkStaging/application_1524487536165_7870 SPARK_USER -> dataloaderusr SPARK_YARN_CACHE_ARCHIVES_FILE_SIZES -> 141577 SPARK_YARN_CACHE_FILES_VISIBILITIES -> PUBLIC,PRIVATE SPARK_YARN_CACHE_ARCHIVES_TIME_STAMPS -> 1524865844229 SPARK_YARN_MODE -> true SPARK_YARN_CACHE_FILES_TIME_STAMPS -> 1485286912562,1524773043221 SPARK_LOG_URL_STDOUT -> http://probighhww006:8042/node/containerlogs/container_e321_1524487536165_7870_01_000003/dataloaderu... SPARK_YARN_CACHE_ARCHIVES_VISIBILITIES -> PRIVATE SPARK_YARN_CACHE_FILES -> hdfs://mycompanyhdfs/hdp/apps/2.5.0.0-1245/spark/spark-hdp-assembly.jar#__spark__.jar,hdfs://mycompanyhdfs/myAppHdfsPath/jar/my-app-1.jar.jar#__app__.jar command: {{JAVA_HOME}}/bin/java -server -XX:OnOutOfMemoryError='kill %p' -Xms1024m -Xmx1024m '-Dlog4j.configuration=spark-log4j.properties' -Djava.io.tmpdir={{PWD}}/tmp '-Dspark.driver.port=46512' '-Dspark.ui.port=0' -Dspark.yarn.app.container.log.dir=<LOG_DIR> org.apache.spark.executor.CoarseGrainedExecutorBackend --driver-url spark://CoarseGrainedScheduler@30.30.30.27:46512 --executor-id 2 --hostname probighhww006 --cores 1 --app-id application_1524487536165_7870 --user-class-path file:$PWD/__app__.jar 1> <LOG_DIR>/stdout 2> <LOG_DIR>/stderr =============================================================================== 18/04/27 16:50:58 INFO ContainerManagementProtocolProxy: Opening proxy : probighhww006:45454 18/04/27 16:51:01 INFO AMRMClientImpl: Received new token for : probighhww003:45454 18/04/27 16:51:01 INFO YarnAllocator: Received 1 containers from YARN, launching executors on 0 of them. 18/04/27 16:51:02 INFO YarnClusterSchedulerBackend: Registered executor NettyRpcEndpointRef(null) (probighhww012:34654) with ID 1 18/04/27 16:51:02 INFO BlockManagerMasterEndpoint: Registering block manager probighhww012:39342 with 511.1 MB RAM, BlockManagerId(1, probighhww012, 39342) 18/04/27 16:51:03 INFO YarnClusterSchedulerBackend: Registered executor NettyRpcEndpointRef(null) (probighhww006:41530) with ID 2 18/04/27 16:51:03 INFO YarnClusterSchedulerBackend: SchedulerBackend is ready for scheduling beginning after reached minRegisteredResourcesRatio: 0.8 18/04/27 16:51:03 INFO YarnClusterScheduler: YarnClusterScheduler.postStartHook done java.lang.ClassNotFoundException: com.teradata.jdbc.TeraDriver at java.net.URLClassLoader.findClass(URLClassLoader.java:381) at java.lang.ClassLoader.loadClass(ClassLoader.java:424) at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
I am not sure why the driver is not in the executor classpath, I don't like put the teradata jdbc jar in the filesystem of all nodes.
I have tried with <mode>client</mode> but notingh.
¿Some one know what can be hapenning?
¿Is there an option to config that only run in the master node, so I can put the jars in a filesystem folder of master node?
Thank you so much
Created 05-24-2018 10:05 AM
Hi,
Try to add --jars PATH_TO_YOUR_TERADATA_DRIVER in spark-opts
Adil
Created 05-24-2018 01:49 PM
Hi @alvaro andres tovar martinez
You should add the teradata driver using jar element. This way jar will be placed on the driver and executor working dir and will be automatically added to classpath:
<jar>hdfs://mycompanyhdfs/myAppHdfsPath/jar/terajdbc4.jar</jar> <jar>hdfs://mycompanyhdfs/myAppHdfsPath/jar/tdgssconfig.jar</jar>
You dont need any of the spark-opts settings you were setting so you can leave this empty, or remote it.
<spark-opts></spark-opts>
HTH
*** If you found this answer addressed your question, please take a moment to login and click the "accept" link on the answer.
Created 07-17-2018 04:25 PM
The probleam was solved adding the next line to assembly.sbt
addSbtPlugin("com.eed3si9n" % "sbt-assembly" % "0.14.6")
And in build.sbt
libraryDependencies += "org.apache.spark" %% "spark-core" % "1.6.2" % "provided" libraryDependencies +="org.apache.spark" % "spark-sql_2.10" % "1.6.2" % "provided" libraryDependencies +="org.apache.spark" % "spark-hive_2.10" % "1.6.2" % "provided"
thank you so much