Support Questions

Find answers, ask questions, and share your expertise

Phoenix driver not found in Spark job

avatar
Rising Star

I've created a Spark streaming application (and swear a month or 2 ago I had this working) and it runs fine in Eclipse. When I run the job using spark-submit and specify the --jars including my application jars and /usr/hdp/current/phoenix-client/phoenix-client.jar (or skip the link and use /usr/hdp/current/phoenix-4.7.0.2.5.3.0-37-client.jar) I get a error indicating classNotFound: org.apache.phoenix.jdbc.PhoenixDriver.

In the YARN log output I can see in directory.info the following entries:

lrwxrwxrwx 1 yarn hadoop 70 Mar 7 15:37 phoenix-client.jar -> /hadoop/yarn/local/usercache/jwatson/filecache/2288/phoenix-client.jar

3016594 100180 -r-x------ 1 yarn hadoop 102581542 Mar 7 15:37 ./phoenix-client.jar

in launch_container.sh I see the following:

ln -sf "/hadoop/yarn/local/usercache/jwatson/filecache/2288/phoenix-client.jar" "phoenix-client.jar"

So it seems the right things are happening. I finally broke down and put the following in the driver to see what I got for class files:

ClassLoader cl = ClassLoader.getSystemClassLoader(); URL[] urls = ((URLClassLoader)cl).getURLs(); for (URL url: urls) System.out.println(url.getFile());

And it shows none of the jar files I added via the --jars command for spark-submit. What am I missing.

As a corollary, should we build a fatjar instead and toss everything in that? What's the most efficient approach to not having to copy jar files that are already on the cluster servers (HDP 2.5.3)?

13 REPLIES 13

avatar
Rising Star

I did get this working on spark1 (spark2 is tech preview). The issue was needing to use both --jars as comma separated list as well as --conf as colon separated list.

However I'm back to failing with JDBC driver not found when using sparkContect.newAPIHadoopRDD. The Phoenix driver is definitely in the --jars and --conf command line args to spark-submit. I added Class.forName ("otg.apache.ohoenix.jdbc.PhoenixDriver"). This is a java app.

avatar
  • On the spark client node, create a symbolic link of 'hbase-site.xml' into /etc/spark/conf/
ln -s /etc/hbase/conf/hbase-site.xml /etc/spark/conf/hbase-site.xml
  • Add the following configurations in 'spark-defaults.conf' through Ambari and restart the Spark service:
spark.executor.extraClassPath /usr/hdp/current/hbase-client/lib/hbase-common.jar:/usr/hdp/current/hbase-client/lib/hbase-client.jar:/usr/hdp/current/hbase-client/lib/hbase-server.jar:/usr/hdp/current/hbase-client/lib/hbase-protocol.jar:/usr/hdp/current/hbase-client/lib/guava-12.0.1.jar:/usr/hdp/current/hbase-client/lib/htrace-core-3.1.0-incubating.jar:/usr/hdp/current/spark-client/lib/spark-assembly-1.6.1.2.4.2.0-258-hadoop2.7.1.2.4.2.0-258.jar:/usr/hdp/current/phoenix-client/phoenix-client.jar 


spark.driver.extraClassPath /usr/hdp/current/hbase-client/lib/hbase-common.jar:/usr/hdp/current/hbase-client/lib/hbase-client.jar:/usr/hdp/current/hbase-client/lib/hbase-server.jar:/usr/hdp/current/hbase-client/lib/hbase-protocol.jar:/usr/hdp/current/hbase-client/lib/guava-12.0.1.jar:/usr/hdp/current/hbase-client/lib/htrace-core-3.1.0-incubating.jar:/usr/hdp/current/spark-client/lib/spark-assembly-1.6.1.2.4.2.0-258-hadoop2.7.1.2.4.2.0-258.jar:/usr/hdp/current/phoenix-client/phoenix-client.jar

Note: Change the jar versions according to the cluster version and ensure that there is no space between the jars for the classpath

  1. For secure clusters, obtain a kerberos ticket using kinit command.
  2. Launch spark shell using below command:
spark-shell --master yarn-client --num-executors 2 --driver-memory 512m --executor-memory 512m --executor-cores 1
  1. To access Phoenix table, use the following sample code :
val df = sqlContext.load(
  "org.apache.phoenix.spark",
  Map("table" -> "TABLE1", "zkUrl" -> "<zk-host>:2181")
)
df.show()

avatar
Rising Star

No joy, I checked and the spark-submit job already contains those libraries. I'll post more above.

avatar
Rising Star

As you can see by the comments above, I have the libraries defined in the spark-submit command, although for kicks I added what you recommended in Ambari, but I got the same error. I'm writing in Java, which I'm calling through newAPIHadoopRDD() which is ultimately making a JDBC connection.