Support Questions

jeff_watson · ‎03-08-2017

I've created a Spark streaming application (and swear a month or 2 ago I had this working) and it runs fine in Eclipse. When I run the job using spark-submit and specify the --jars including my application jars and /usr/hdp/current/phoenix-client/phoenix-client.jar (or skip the link and use /usr/hdp/current/phoenix-4.7.0.2.5.3.0-37-client.jar) I get a error indicating classNotFound: org.apache.phoenix.jdbc.PhoenixDriver.

In the YARN log output I can see in directory.info the following entries:

lrwxrwxrwx 1 yarn hadoop 70 Mar 7 15:37 phoenix-client.jar -> /hadoop/yarn/local/usercache/jwatson/filecache/2288/phoenix-client.jar

3016594 100180 -r-x------ 1 yarn hadoop 102581542 Mar 7 15:37 ./phoenix-client.jar

in launch_container.sh I see the following:

ln -sf "/hadoop/yarn/local/usercache/jwatson/filecache/2288/phoenix-client.jar" "phoenix-client.jar"

So it seems the right things are happening. I finally broke down and put the following in the driver to see what I got for class files:

ClassLoader cl = ClassLoader.getSystemClassLoader(); URL[] urls = ((URLClassLoader)cl).getURLs(); for (URL url: urls) System.out.println(url.getFile());

And it shows none of the jar files I added via the --jars command for spark-submit. What am I missing.

As a corollary, should we build a fatjar instead and toss everything in that? What's the most efficient approach to not having to copy jar files that are already on the cluster servers (HDP 2.5.3)?

jeff_watson · ‎06-21-2017

I did get this working on spark1 (spark2 is tech preview). The issue was needing to use both --jars as comma separated list as well as --conf as colon separated list.

However I'm back to failing with JDBC driver not found when using sparkContect.newAPIHadoopRDD. The Phoenix driver is definitely in the --jars and --conf command line args to spark-submit. I added Class.forName ("otg.apache.ohoenix.jdbc.PhoenixDriver"). This is a java app.

sandyy006 · ‎06-25-2017

@Jeff Watson

On the spark client node, create a symbolic link of 'hbase-site.xml' into /etc/spark/conf/

ln -s /etc/hbase/conf/hbase-site.xml /etc/spark/conf/hbase-site.xml

Add the following configurations in 'spark-defaults.conf' through Ambari and restart the Spark service:

spark.executor.extraClassPath /usr/hdp/current/hbase-client/lib/hbase-common.jar:/usr/hdp/current/hbase-client/lib/hbase-client.jar:/usr/hdp/current/hbase-client/lib/hbase-server.jar:/usr/hdp/current/hbase-client/lib/hbase-protocol.jar:/usr/hdp/current/hbase-client/lib/guava-12.0.1.jar:/usr/hdp/current/hbase-client/lib/htrace-core-3.1.0-incubating.jar:/usr/hdp/current/spark-client/lib/spark-assembly-1.6.1.2.4.2.0-258-hadoop2.7.1.2.4.2.0-258.jar:/usr/hdp/current/phoenix-client/phoenix-client.jar 


spark.driver.extraClassPath /usr/hdp/current/hbase-client/lib/hbase-common.jar:/usr/hdp/current/hbase-client/lib/hbase-client.jar:/usr/hdp/current/hbase-client/lib/hbase-server.jar:/usr/hdp/current/hbase-client/lib/hbase-protocol.jar:/usr/hdp/current/hbase-client/lib/guava-12.0.1.jar:/usr/hdp/current/hbase-client/lib/htrace-core-3.1.0-incubating.jar:/usr/hdp/current/spark-client/lib/spark-assembly-1.6.1.2.4.2.0-258-hadoop2.7.1.2.4.2.0-258.jar:/usr/hdp/current/phoenix-client/phoenix-client.jar

Note: Change the jar versions according to the cluster version and ensure that there is no space between the jars for the classpath

For secure clusters, obtain a kerberos ticket using kinit command.
Launch spark shell using below command:

spark-shell --master yarn-client --num-executors 2 --driver-memory 512m --executor-memory 512m --executor-cores 1

To access Phoenix table, use the following sample code :

val df = sqlContext.load(
  "org.apache.phoenix.spark",
  Map("table" -> "TABLE1", "zkUrl" -> "<zk-host>:2181")
)
df.show()

jeff_watson · ‎07-04-2017

No joy, I checked and the spark-submit job already contains those libraries. I'll post more above.

jeff_watson · ‎07-04-2017

As you can see by the comments above, I have the libraries defined in the spark-submit command, although for kicks I added what you recommended in Ambari, but I got the same error. I'm writing in Java, which I'm calling through newAPIHadoopRDD() which is ultimately making a JDBC connection.

Cloudera Community

Support Questions

Phoenix driver not found in Spark job