Created 03-08-2017 10:41 PM
I've created a Spark streaming application (and swear a month or 2 ago I had this working) and it runs fine in Eclipse. When I run the job using spark-submit and specify the --jars including my application jars and /usr/hdp/current/phoenix-client/phoenix-client.jar (or skip the link and use /usr/hdp/current/phoenix-4.7.0.2.5.3.0-37-client.jar) I get a error indicating classNotFound: org.apache.phoenix.jdbc.PhoenixDriver.
In the YARN log output I can see in directory.info the following entries:
lrwxrwxrwx 1 yarn hadoop 70 Mar 7 15:37 phoenix-client.jar -> /hadoop/yarn/local/usercache/jwatson/filecache/2288/phoenix-client.jar
3016594 100180 -r-x------ 1 yarn hadoop 102581542 Mar 7 15:37 ./phoenix-client.jar
in launch_container.sh I see the following:
ln -sf "/hadoop/yarn/local/usercache/jwatson/filecache/2288/phoenix-client.jar" "phoenix-client.jar"
So it seems the right things are happening. I finally broke down and put the following in the driver to see what I got for class files:
ClassLoader cl = ClassLoader.getSystemClassLoader(); URL[] urls = ((URLClassLoader)cl).getURLs(); for (URL url: urls) System.out.println(url.getFile());
And it shows none of the jar files I added via the --jars command for spark-submit. What am I missing.
As a corollary, should we build a fatjar instead and toss everything in that? What's the most efficient approach to not having to copy jar files that are already on the cluster servers (HDP 2.5.3)?
Created 06-21-2017 06:36 PM
I did get this working on spark1 (spark2 is tech preview). The issue was needing to use both --jars as comma separated list as well as --conf as colon separated list.
However I'm back to failing with JDBC driver not found when using sparkContect.newAPIHadoopRDD. The Phoenix driver is definitely in the --jars and --conf command line args to spark-submit. I added Class.forName ("otg.apache.ohoenix.jdbc.PhoenixDriver"). This is a java app.
Created 06-25-2017 04:32 PM
ln -s /etc/hbase/conf/hbase-site.xml /etc/spark/conf/hbase-site.xml
spark.executor.extraClassPath /usr/hdp/current/hbase-client/lib/hbase-common.jar:/usr/hdp/current/hbase-client/lib/hbase-client.jar:/usr/hdp/current/hbase-client/lib/hbase-server.jar:/usr/hdp/current/hbase-client/lib/hbase-protocol.jar:/usr/hdp/current/hbase-client/lib/guava-12.0.1.jar:/usr/hdp/current/hbase-client/lib/htrace-core-3.1.0-incubating.jar:/usr/hdp/current/spark-client/lib/spark-assembly-1.6.1.2.4.2.0-258-hadoop2.7.1.2.4.2.0-258.jar:/usr/hdp/current/phoenix-client/phoenix-client.jar spark.driver.extraClassPath /usr/hdp/current/hbase-client/lib/hbase-common.jar:/usr/hdp/current/hbase-client/lib/hbase-client.jar:/usr/hdp/current/hbase-client/lib/hbase-server.jar:/usr/hdp/current/hbase-client/lib/hbase-protocol.jar:/usr/hdp/current/hbase-client/lib/guava-12.0.1.jar:/usr/hdp/current/hbase-client/lib/htrace-core-3.1.0-incubating.jar:/usr/hdp/current/spark-client/lib/spark-assembly-1.6.1.2.4.2.0-258-hadoop2.7.1.2.4.2.0-258.jar:/usr/hdp/current/phoenix-client/phoenix-client.jar
Note: Change the jar versions according to the cluster version and ensure that there is no space between the jars for the classpath
spark-shell --master yarn-client --num-executors 2 --driver-memory 512m --executor-memory 512m --executor-cores 1
val df = sqlContext.load( "org.apache.phoenix.spark", Map("table" -> "TABLE1", "zkUrl" -> "<zk-host>:2181") ) df.show()
Created 07-04-2017 12:04 AM
No joy, I checked and the spark-submit job already contains those libraries. I'll post more above.
Created 07-04-2017 12:48 AM
As you can see by the comments above, I have the libraries defined in the spark-submit command, although for kicks I added what you recommended in Ambari, but I got the same error. I'm writing in Java, which I'm calling through newAPIHadoopRDD() which is ultimately making a JDBC connection.