Support Questions

Find answers, ask questions, and share your expertise
Announcements
Check out our newest addition to the community, the Cloudera Data Analytics (CDA) group hub.

Help, oozie, spark, java.lang.ClassNotFoundException: com.teradata.jdbc.TeraDriver

Hello All

I am having problems to run an oozie spark action, with spark-submit runs ok, this is the command

spark-submit --driver-class-path=/filesystem/path/terajdbc4.jar:/filesystem/path/tdgssconfig.jar --class myapp.MainClass my-app-1.jar

Now I created the ozzie action, this is:

<action name="cargarTD">
        <spark xmlns="uri:oozie:spark-action:0.1">
             <job-tracker>${jobTracker}</job-tracker>
            <name-node>${nameNode}</name-node>
            <job-xml>/some_path/generic/hive-site.xml</job-xml>
            <configuration>
                <property>
                    <name>mapred.job.queue.name</name>
                    <value>${queueName}</value>
                </property>
                <property>
                    <name>oozie.use.system.libpath</name>
                    <value>true</value>
                </property>
                <property>
                    <name>oozie.libpath</name>
                    <value>/some_other_path/oozie/share/lib/lib_20170127130322/hive</value>
                </property>                
            </configuration>
            <master>yarn-master</master>
            <mode>cluster</mode>
            <name>Test spark oozie</name>
            <class>myapp.MainClass</class>
            <jar>hdfs://mycompanyhdfs/myAppHdfsPath/jar/my-app-1.jar</jar>
            <spark-opts>--driver-class-path hdfs://mycompanyhdfs/myAppHdfsPath/jar/terajdbc4.jar:hdfs://mycompanyhdfs/myAppHdfsPath/jar/tdgssconfig.jar: --conf "spark.executor.extraClassPath=hdfs://mycompanyhdfs/myAppHdfsPath/jar/terajdbc4.jar:hdfs://mycompanyhdfs/myAppHdfsPath/jar/tdgssconfig.jar" --conf "spark.driver.extraClassPath=hdfs://mycompanyhdfs/myAppHdfsPath/jar/terajdbc4.jar:hdfs://mycompanyhdfs/myAppHdfsPath/jar/tdgssconfig.jar"</spark-opts>            
        </spark>
        <ok to="End"/>
        <error to="Kill"/>
    </action>

i run it, and have this error:


18/04/27 16:50:58 INFO ExecutorRunnable: 
===============================================================================
YARN executor launch context:
  env:
    CLASSPATH -> hdfs://mycompanyhdfs/myAppHdfsPath/jar/terajdbc4.jar:hdfs://mycompanyhdfs/myAppHdfsPath/jar/tdgssconfig.jar spark.driver.extraClassPath=hdfs://mycompanyhdfs/myAppHdfsPath/jar/terajdbc4.jar:hdfs://mycompanyhdfs/myAppHdfsPath/jar/tdgssconfig.jar<CPS>{{PWD}}<CPS>{{PWD}}/__spark_conf__<CPS>{{PWD}}/__spark__.jar<CPS>$HADOOP_CONF_DIR<CPS>/usr/hdp/current/hadoop-client/*<CPS>/usr/hdp/current/hadoop-client/lib/*<CPS>/usr/hdp/current/hadoop-hdfs-client/*<CPS>/usr/hdp/current/hadoop-hdfs-client/lib/*<CPS>/usr/hdp/current/hadoop-yarn-client/*<CPS>/usr/hdp/current/hadoop-yarn-client/lib/*<CPS>$PWD/mr-framework/hadoop/share/hadoop/mapreduce/*:$PWD/mr-framework/hadoop/share/hadoop/mapreduce/lib/*:$PWD/mr-framework/hadoop/share/hadoop/common/*:$PWD/mr-framework/hadoop/share/hadoop/common/lib/*:$PWD/mr-framework/hadoop/share/hadoop/yarn/*:$PWD/mr-framework/hadoop/share/hadoop/yarn/lib/*:$PWD/mr-framework/hadoop/share/hadoop/hdfs/*:$PWD/mr-framework/hadoop/share/hadoop/hdfs/lib/*:$PWD/mr-framework/hadoop/share/hadoop/tools/lib/*:/usr/hdp/2.5.0.0-1245/hadoop/lib/hadoop-lzo-0.6.0.2.5.0.0-1245.jar:/etc/hadoop/conf/secure
    SPARK_YARN_CACHE_ARCHIVES -> hdfs://mycompanyhdfs/user/dataloaderusr/.sparkStaging/application_1524487536165_7870/__spark_conf__1770230405461321022.zip#__spark_conf__
    SPARK_LOG_URL_STDERR -> http://probighhww006:8042/node/containerlogs/container_e321_1524487536165_7870_01_000003/dataloaderu...
    SPARK_YARN_CACHE_FILES_FILE_SIZES -> 188727178,39194
    SPARK_YARN_STAGING_DIR -> .sparkStaging/application_1524487536165_7870
    SPARK_USER -> dataloaderusr
    SPARK_YARN_CACHE_ARCHIVES_FILE_SIZES -> 141577
    SPARK_YARN_CACHE_FILES_VISIBILITIES -> PUBLIC,PRIVATE
    SPARK_YARN_CACHE_ARCHIVES_TIME_STAMPS -> 1524865844229
    SPARK_YARN_MODE -> true
    SPARK_YARN_CACHE_FILES_TIME_STAMPS -> 1485286912562,1524773043221
    SPARK_LOG_URL_STDOUT -> http://probighhww006:8042/node/containerlogs/container_e321_1524487536165_7870_01_000003/dataloaderu...
    SPARK_YARN_CACHE_ARCHIVES_VISIBILITIES -> PRIVATE
    SPARK_YARN_CACHE_FILES -> hdfs://mycompanyhdfs/hdp/apps/2.5.0.0-1245/spark/spark-hdp-assembly.jar#__spark__.jar,hdfs://mycompanyhdfs/myAppHdfsPath/jar/my-app-1.jar.jar#__app__.jar


  command:
    {{JAVA_HOME}}/bin/java -server -XX:OnOutOfMemoryError='kill %p' -Xms1024m -Xmx1024m '-Dlog4j.configuration=spark-log4j.properties' -Djava.io.tmpdir={{PWD}}/tmp '-Dspark.driver.port=46512' '-Dspark.ui.port=0' -Dspark.yarn.app.container.log.dir=<LOG_DIR> org.apache.spark.executor.CoarseGrainedExecutorBackend --driver-url spark://CoarseGrainedScheduler@30.30.30.27:46512 --executor-id 2 --hostname probighhww006 --cores 1 --app-id application_1524487536165_7870 --user-class-path file:$PWD/__app__.jar 1> <LOG_DIR>/stdout 2> <LOG_DIR>/stderr
===============================================================================
      
18/04/27 16:50:58 INFO ContainerManagementProtocolProxy: Opening proxy : probighhww006:45454
18/04/27 16:51:01 INFO AMRMClientImpl: Received new token for : probighhww003:45454
18/04/27 16:51:01 INFO YarnAllocator: Received 1 containers from YARN, launching executors on 0 of them.
18/04/27 16:51:02 INFO YarnClusterSchedulerBackend: Registered executor NettyRpcEndpointRef(null) (probighhww012:34654) with ID 1
18/04/27 16:51:02 INFO BlockManagerMasterEndpoint: Registering block manager probighhww012:39342 with 511.1 MB RAM, BlockManagerId(1, probighhww012, 39342)
18/04/27 16:51:03 INFO YarnClusterSchedulerBackend: Registered executor NettyRpcEndpointRef(null) (probighhww006:41530) with ID 2
18/04/27 16:51:03 INFO YarnClusterSchedulerBackend: SchedulerBackend is ready for scheduling beginning after reached minRegisteredResourcesRatio: 0.8
18/04/27 16:51:03 INFO YarnClusterScheduler: YarnClusterScheduler.postStartHook done
java.lang.ClassNotFoundException: com.teradata.jdbc.TeraDriver
	at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:357)	

I am not sure why the driver is not in the executor classpath, I don't like put the teradata jdbc jar in the filesystem of all nodes.

I have tried with <mode>client</mode> but notingh.

¿Some one know what can be hapenning?

¿Is there an option to config that only run in the master node, so I can put the jars in a filesystem folder of master node?

Thank you so much

3 REPLIES 3

New Contributor

Hi,
Try to add --jars PATH_TO_YOUR_TERADATA_DRIVER in spark-opts

Adil

Hi @alvaro andres tovar martinez

You should add the teradata driver using jar element. This way jar will be placed on the driver and executor working dir and will be automatically added to classpath:

<jar>hdfs://mycompanyhdfs/myAppHdfsPath/jar/terajdbc4.jar</jar>
<jar>hdfs://mycompanyhdfs/myAppHdfsPath/jar/tdgssconfig.jar</jar>

You dont need any of the spark-opts settings you were setting so you can leave this empty, or remote it.

<spark-opts></spark-opts>

HTH

*** If you found this answer addressed your question, please take a moment to login and click the "accept" link on the answer.

The probleam was solved adding the next line to assembly.sbt

addSbtPlugin("com.eed3si9n" % "sbt-assembly" % "0.14.6")

And in build.sbt

libraryDependencies += "org.apache.spark" %% "spark-core" % "1.6.2" % "provided"

libraryDependencies +="org.apache.spark" % "spark-sql_2.10" % "1.6.2" % "provided"

libraryDependencies +="org.apache.spark" % "spark-hive_2.10" % "1.6.2" % "provided"

thank you so much

Take a Tour of the Community
Don't have an account?
Your experience may be limited. Sign in to explore more.