Support Questions

Find answers, ask questions, and share your expertise
Celebrating as our community reaches 100,000 members! Thank you!

Phoenix driver not found in Spark job

Rising Star

I've created a Spark streaming application (and swear a month or 2 ago I had this working) and it runs fine in Eclipse. When I run the job using spark-submit and specify the --jars including my application jars and /usr/hdp/current/phoenix-client/phoenix-client.jar (or skip the link and use /usr/hdp/current/phoenix- I get a error indicating classNotFound: org.apache.phoenix.jdbc.PhoenixDriver.

In the YARN log output I can see in the following entries:

lrwxrwxrwx 1 yarn hadoop 70 Mar 7 15:37 phoenix-client.jar -> /hadoop/yarn/local/usercache/jwatson/filecache/2288/phoenix-client.jar

3016594 100180 -r-x------ 1 yarn hadoop 102581542 Mar 7 15:37 ./phoenix-client.jar

in I see the following:

ln -sf "/hadoop/yarn/local/usercache/jwatson/filecache/2288/phoenix-client.jar" "phoenix-client.jar"

So it seems the right things are happening. I finally broke down and put the following in the driver to see what I got for class files:

ClassLoader cl = ClassLoader.getSystemClassLoader(); URL[] urls = ((URLClassLoader)cl).getURLs(); for (URL url: urls) System.out.println(url.getFile());

And it shows none of the jar files I added via the --jars command for spark-submit. What am I missing.

As a corollary, should we build a fatjar instead and toss everything in that? What's the most efficient approach to not having to copy jar files that are already on the cluster servers (HDP 2.5.3)?


Rising Star

I did get this working on spark1 (spark2 is tech preview). The issue was needing to use both --jars as comma separated list as well as --conf as colon separated list.

However I'm back to failing with JDBC driver not found when using sparkContect.newAPIHadoopRDD. The Phoenix driver is definitely in the --jars and --conf command line args to spark-submit. I added Class.forName ("otg.apache.ohoenix.jdbc.PhoenixDriver"). This is a java app.

  • On the spark client node, create a symbolic link of 'hbase-site.xml' into /etc/spark/conf/
ln -s /etc/hbase/conf/hbase-site.xml /etc/spark/conf/hbase-site.xml
  • Add the following configurations in 'spark-defaults.conf' through Ambari and restart the Spark service:
spark.executor.extraClassPath /usr/hdp/current/hbase-client/lib/hbase-common.jar:/usr/hdp/current/hbase-client/lib/hbase-client.jar:/usr/hdp/current/hbase-client/lib/hbase-server.jar:/usr/hdp/current/hbase-client/lib/hbase-protocol.jar:/usr/hdp/current/hbase-client/lib/guava-12.0.1.jar:/usr/hdp/current/hbase-client/lib/htrace-core-3.1.0-incubating.jar:/usr/hdp/current/spark-client/lib/spark-assembly- 

spark.driver.extraClassPath /usr/hdp/current/hbase-client/lib/hbase-common.jar:/usr/hdp/current/hbase-client/lib/hbase-client.jar:/usr/hdp/current/hbase-client/lib/hbase-server.jar:/usr/hdp/current/hbase-client/lib/hbase-protocol.jar:/usr/hdp/current/hbase-client/lib/guava-12.0.1.jar:/usr/hdp/current/hbase-client/lib/htrace-core-3.1.0-incubating.jar:/usr/hdp/current/spark-client/lib/spark-assembly-

Note: Change the jar versions according to the cluster version and ensure that there is no space between the jars for the classpath

  1. For secure clusters, obtain a kerberos ticket using kinit command.
  2. Launch spark shell using below command:
spark-shell --master yarn-client --num-executors 2 --driver-memory 512m --executor-memory 512m --executor-cores 1
  1. To access Phoenix table, use the following sample code :
val df = sqlContext.load(
  Map("table" -> "TABLE1", "zkUrl" -> "<zk-host>:2181")

Rising Star

No joy, I checked and the spark-submit job already contains those libraries. I'll post more above.

Rising Star

As you can see by the comments above, I have the libraries defined in the spark-submit command, although for kicks I added what you recommended in Ambari, but I got the same error. I'm writing in Java, which I'm calling through newAPIHadoopRDD() which is ultimately making a JDBC connection.