Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Please see the Cloudera blog for information on the Cloudera Response to CVE-2021-4428

Could not find the DB2Driver class running Spark app

Explorer

When I tried through spark-shell, it works.

> spark-shell --jars /usr/hdp/current/sqoop-client/lib/db2jcc.jar

then i could run: sqlContext.read.jdbc(url, table, prop) and it could query the db2 database.

But when I used spark-submit --jars /usr/hdp/current/sqoop-client/lib/db2jcc.jar .... to run spark job which will do the same jdbc reading, but it failed with the following errors:

16/12/07 17:13:03 WARN JDBCRDD: Couldn't find class "com.ibm.db2.jcc.DB2Driver" java.lang.ClassNotFoundException: "com.ibm.db2.jcc.DB2Driver" at java.net.URLClassLoader.findClass(URLClassLoader.java:381) at java.lang.ClassLoader.loadClass(ClassLoader.java:424) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331) at java.lang.ClassLoader.loadClass(ClassLoader.java:357) at org.apache.spark.sql.execution.datasources.jdbc.DriverRegistry$.register(DriverRegistry.scala:38) at org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD$anonfun$getConnector$1.apply(JDBCRDD.scala:183) at org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD$anonfun$getConnector$1.apply(JDBCRDD.scala:181) at org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD$.resolveTable(JDBCRDD.scala:121) at org.apache.spark.sql.execution.datasources.jdbc.JDBCRelation.<init>(JDBCRelation.scala:91) at org.apache.spark.sql.DataFrameReader.jdbc(DataFrameReader.scala:206) at org.apache.spark.sql.DataFrameReader.jdbc(DataFrameReader.scala:130) at com.bns.cbt.ingest.db.LoadDBDataApp$anonfun$runMain$1.apply(LoadDBDataApp.scala:98) at com.bns.cbt.ingest.db.LoadDBDataApp$anonfun$runMain$1.apply(LoadDBDataApp.scala:92) at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33) at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:108) at com.bns.cbt.ingest.db.LoadDBDataApp$.runMain(LoadDBDataApp.scala:92) at com.bns.cbt.ingest.db.LoadDBDataApp$delayedInit$body.apply(LoadDBDataApp.scala:68) at scala.Function0$class.apply$mcV$sp(Function0.scala:40) at scala.runtime.AbstractFunction0.apply$mcV$sp(AbstractFunction0.scala:12) at scala.App$anonfun$main$1.apply(App.scala:71) at scala.App$anonfun$main$1.apply(App.scala:71) at scala.collection.immutable.List.foreach(List.scala:318) at scala.collection.generic.TraversableForwarder$class.foreach(TraversableForwarder.scala:32) at scala.App$class.main(App.scala:71) at com.bns.cbt.ingest.db.LoadDBDataApp$.main(LoadDBDataApp.scala:28) at com.bns.cbt.ingest.db.LoadDBDataApp.main(LoadDBDataApp.scala)

19 REPLIES 19

Explorer

Even I built a fat jar which contains the db2jcc.jar, it still got the same issue...

Super Guru

https://www.linkedin.com/pulse/accessing-db2-data-spark-via-stadalone-scala-java-programs-tatini

What JDK and Scala versions are you using to build? Any build issues?

db2jcc.jar will need to be on every Spark node you are running on.

Does it work if you run it on just the current node?

Is that JAR in a viewable path with correct permissions?

See: http://stackoverflow.com/questions/29552799/spark-unable-to-find-jdbc-driver

spark.driver.extraClassPath = /usr/hdp/current/sqoop-client/lib/db2jcc.jar
spark.executor.extraClassPath = /usr/hdp/current/sqoop-client/lib/db2jcc.jar
export SPARK_CLASSPATH=/usr/hdp/current/sqoop-client/lib/db2jcc.jar

Explorer

I tried everything you mentioned... db2jcc.jar will be fine since it works for spark-shell.

Here is the snippet of my spark-submit:

spark-submit \ --conf spark.driver.extraJavaOptions=-Dlog4j.configuration=file:/home/jzhou5/ingest/test/cbt_ingest/ingest_log4j.properties \ --driver-class-path=$SPATH:$JDBCPATH/db2jcc.jar \ --conf spark.driver.extraClassPath=$SPATH:$JDBCPATH/db2jcc.jar \ --conf spark.executor.extraClassPath=$SPATH:$JDBCPATH/db2jcc.jar \ --jars /usr/hdp/current/sqoop-client/lib/db2jcc.jar \ --class com.bns.cbt.ingest.db.LoadDBDataApp \ /home/jzhou5/ingest/test/cbt_ingest/ingest-db-1.0-jar-with-dependencies.jar \ .....

Also I tried to use the local mode, but failed with the same failure. By the way I also added the Class.forName("com.ibm.db2.jcc.DB2Driver") in my code.

Explorer

In my code, I explicitly added the loadclass, but it still failed.... running out of options..

val urls: Array[URL] = Array(new URL("jar:file:///home/jzhou5/ingest/db2jcc.jar!/"))

val cl = new URLClassLoader(urls)

cl.loadClass("com.ibm.db2.jcc.DB2Driver")

Super Guru

is it in 777

what is !/

does this environment variable exist

$SPATH:$JDBCPATH

try to copy the db2 jar into the local path you are running from

is the jar valid? is it large enough? the correct JDK?

download a newer jar

http://www-01.ibm.com/support/docview.wss?uid=swg21363866

see http://www.worldofdb2.com/profiles/blogs/accessing-db2-data-via-standalone-scala-java-programs-in-ec...

val employeeDF = sqlContext.load("jdbc", Map(

"url" ->"jdbc:db2://localhost:50000/sample:currentSchema=pallavipr;user=pallavipr;password=XXXX;",

"driver" -> "com.ibm.db2.jcc.DB2Driver",

"dbtable" -> "pallavipr.employee"))

employeeDF.show();

}

Super Guru

Try --driver-class-path on the spark submit

http://spark.apache.org/docs/latest/submitting-applications.html

https://spark.apache.org/docs/1.6.1/configuration.html

only add it in one place, if it's in multiple places that's an issue

do you have access to your DB2 server from Spark server?

Explorer

I followed the one post to dump all jars in driver and executors. It looks good to me..

Driver classpath is: /etc/hadoop/2.3.4.78-6/0/
Driver classpath is: /home/jzhou5/ingest/test/cbt_ingest/db2jcc.jar
Driver classpath is: /etc/spark/2.3.4.78-6/0/
Driver classpath is: /usr/hdp/2.3.4.78-6/spark/lib/spark-assembly-1.5.2.2.3.4.78-6-hadoop2.7.1.2.3.4.78-6.jar
Driver classpath is: /usr/hdp/2.3.4.78-6/spark/lib/datanucleus-core-3.2.10.jar
Driver classpath is: /usr/hdp/2.3.4.78-6/spark/lib/datanucleus-rdbms-3.2.9.jar
Driver classpath is: /usr/hdp/2.3.4.78-6/spark/lib/datanucleus-api-jdo-3.2.6.jar
Driver classpath is: /etc/hadoop/2.3.4.78-6/0/
Executor classpath is:/etc/hadoop/2.3.4.78-6/0/
Executor classpath is:/etc/hadoop/2.3.4.78-6/0/
Executor classpath is:/etc/hadoop/2.3.4.78-6/0/
Executor classpath is:/home/jzhou5/ingest/test/cbt_ingest/db2jcc.jar
Executor classpath is:/etc/spark/2.3.4.78-6/0/
Executor classpath is:/usr/hdp/2.3.4.78-6/spark/lib/spark-assembly-1.5.2.2.3.4.78-6-hadoop2.7.1.2.3.4.78-6.jar
Executor classpath is:/usr/hdp/2.3.4.78-6/spark/lib/datanucleus-core-3.2.10.jar
Executor classpath is:/usr/hdp/2.3.4.78-6/spark/lib/datanucleus-rdbms-3.2.9.jar
Executor classpath is:/usr/hdp/2.3.4.78-6/spark/lib/datanucleus-api-jdo-3.2.6.jar
Executor classpath is:/etc/hadoop/2.3.4.78-6/0/
Executor classpath is:/etc/hadoop/2.3.4.78-6/0/
Executor classpath is:/home/jzhou5/ingest/test/cbt_ingest/db2jcc.jar
Executor classpath is:/etc/spark/2.3.4.78-6/0/
Executor classpath is:/usr/hdp/2.3.4.78-6/spark/lib/spark-assembly-1.5.2.2.3.4.78-6-hadoop2.7.1.2.3.4.78-6.jar
Executor classpath is:/usr/hdp/2.3.4.78-6/spark/lib/datanucleus-core-3.2.10.jar
Executor classpath is:/usr/hdp/2.3.4.78-6/spark/lib/datanucleus-rdbms-3.2.9.jar
Executor classpath is:/usr/hdp/2.3.4.78-6/spark/lib/datanucleus-api-jdo-3.2.6.jar
Executor classpath is:/etc/hadoop/2.3.4.78-6/0/
Executor classpath is:/home/jzhou5/ingest/test/cbt_ingest/db2jcc.jar
Executor classpath is:/etc/spark/2.3.4.78-6/0/
Executor classpath is:/usr/hdp/2.3.4.78-6/spark/lib/spark-assembly-1.5.2.2.3.4.78-6-hadoop2.7.1.2.3.4.78-6.jar
Executor classpath is:/usr/hdp/2.3.4.78-6/spark/lib/datanucleus-core-3.2.10.jar
Executor classpath is:/usr/hdp/2.3.4.78-6/spark/lib/datanucleus-rdbms-3.2.9.jar
Executor classpath is:/usr/hdp/2.3.4.78-6/spark/lib/datanucleus-api-jdo-3.2.6.jar
Executor classpath is:/etc/hadoop/2.3.4.78-6/0/
testObj.id: 1
Executor classpath is:/home/jzhou5/ingest/test/cbt_ingest/db2jcc.jar
testObj.id: 3
testObj.id: 4
Executor classpath is:/etc/spark/2.3.4.78-6/0/
Executor classpath is:/usr/hdp/2.3.4.78-6/spark/lib/spark-assembly-1.5.2.2.3.4.78-6-hadoop2.7.1.2.3.4.78-6.jar
Executor classpath is:/usr/hdp/2.3.4.78-6/spark/lib/datanucleus-core-3.2.10.jar
Executor classpath is:/usr/hdp/2.3.4.78-6/spark/lib/datanucleus-rdbms-3.2.9.jar
Executor classpath is:/usr/hdp/2.3.4.78-6/spark/lib/datanucleus-api-jdo-3.2.6.jar
Executor classpath is:/etc/hadoop/2.3.4.78-6/0/
testObj.id: 2

Super Guru

try it without the db2 code and see if it works. then try with a different jar (anything)

does this driver work with sqoop or regular scala or java code