Created 07-14-2017 05:55 PM
I want to use Phoenix Apache Spark Plugin ( https://phoenix.apache.org/phoenix_spark.html) with Hortonworks Sandbox (Version: HDP_2.6_vmware_19_04_2017_20_25_43_hdp_ambari_2_5_0_5_1), and I found a very information link https://community.hortonworks.com/questions/1942/spark-to-phoenix.html to experiment it. I have no problems with JDBC; however, I have problems with "Load as DataFrame" and the error message is "java.lang.ClassNotFoundException: org.apache.spark.sql.DataFrame". After I Google it, I found that I should upgrade to Phoenix 4.10 to work with Spark 2.x, the link http://apache-phoenix-user-list.1124778.n5.nabble.com/Phoenix-4-9-0-with-Spark-2-0-td3602.html. So my question is how to upgrade the sandbox to Phoenix 4.10?
Below are my steps:
CREATE TABLE and UPSERT INTO in Phoenix |
cd /usr/hdp/current/phoenix-client/bin ./sqlline.py sandbox.hortonworks.com CREATE TABLE TABLE1 (ID BIGINT NOT NULL PRIMARY KEY, COL1 VARCHAR); UPSERT INTO TABLE1 (ID, COL1) VALUES (1, 'test_row_1'); UPSERT INTO TABLE1 (ID, COL1) VALUES (2, 'test_row_2'); |
Start spark-shell |
spark-shell \ --master yarn-client \ --jars \ /usr/hdp/2.6.0.3-8/phoenix/phoenix-4.7.0.2.6.0.3-8-client.jar, \ /usr/hdp/2.6.0.3-8/phoenix/lib/phoenix-spark-4.7.0.2.6.0.3-8.jar \ --conf \ "spark.executor.extraClassPath=/usr/hdp/2.6.0.3-8/phoenix/phoenix-4.7.0.2.6.0.3-8-client.jar:/usr/hdp/2.6.0.3-8/phoenix/phoenix-4.7.0.2.6.0.3-8-client.jar" |
Inside spark-shell |
import org.apache.phoenix.spark._ val df = spark.sqlContext.load("org.apache.phoenix.spark", Map("table" -> "TABLE1", "zkUrl" -> "sandbox.hortonworks.com:2181:/hbase-unsecure") ) |
Error Message |
warning: there was one deprecation warning; re-run with -deprecation for details java.lang.NoClassDefFoundError: org/apache/spark/sql/DataFrame at java.lang.Class.getDeclaredMethods0(Native Method) at java.lang.Class.privateGetDeclaredMethods(Class.java:2701) at java.lang.Class.getDeclaredMethod(Class.java:2128) at java.io.ObjectStreamClass.getPrivateMethod(ObjectStreamClass.java:1475) at java.io.ObjectStreamClass.access$1700(ObjectStreamClass.java:72) at java.io.ObjectStreamClass$2.run(ObjectStreamClass.java:498) at java.io.ObjectStreamClass$2.run(ObjectStreamClass.java:472) at java.security.AccessController.doPrivileged(Native Method) at java.io.ObjectStreamClass.<init>(ObjectStreamClass.java:472) at java.io.ObjectStreamClass.lookup(ObjectStreamClass.java:369) at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1134) at java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1548) at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1509) at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432) at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178) at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:348) at org.apache.spark.serializer.JavaSerializationStream.writeObject(JavaSerializer.scala:43) at org.apache.spark.serializer.JavaSerializerInstance.serialize(JavaSerializer.scala:100) at org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:295) at org.apache.spark.util.ClosureCleaner$.org$apache$spark$util$ClosureCleaner$clean(ClosureCleaner.scala:288) at org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:108) at org.apache.spark.SparkContext.clean(SparkContext.scala:2101) at org.apache.spark.rdd.RDD$anonfun$map$1.apply(RDD.scala:370) at org.apache.spark.rdd.RDD$anonfun$map$1.apply(RDD.scala:369) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112) at org.apache.spark.rdd.RDD.withScope(RDD.scala:362) at org.apache.spark.rdd.RDD.map(RDD.scala:369) at org.apache.phoenix.spark.PhoenixRDD.toDataFrame(PhoenixRDD.scala:119) at org.apache.phoenix.spark.PhoenixRelation.schema(PhoenixRelation.scala:59) at org.apache.spark.sql.execution.datasources.LogicalRelation.<init>(LogicalRelation.scala:40) at org.apache.spark.sql.SparkSession.baseRelationToDataFrame(SparkSession.scala:389) at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:146) at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:125) at org.apache.spark.sql.SQLContext.load(SQLContext.scala:965) ... 53 elided Caused by: java.lang.ClassNotFoundException: org.apache.spark.sql.DataFrame at java.net.URLClassLoader.findClass(URLClassLoader.java:381) at java.lang.ClassLoader.loadClass(ClassLoader.java:424) at java.lang.ClassLoader.loadClass(ClassLoader.java:357) |
Created 08-03-2017 09:31 PM
Same question about accessing Phoenix tables from full HDP 2.6 Spark 2 SQL.
Created 09-12-2017 05:52 PM
This would be really helpful to be able to use spark 2 with phoenix. Any ETA on when HDP will upgrade Phoenix?
Created 11-18-2017 06:05 AM
If you are just testing out on sandbox, this should really help: https://superuser.blog/upgrading-apache-phoenix-hdp/
Created 05-08-2018 06:10 AM
Hi , While trying to access the URL https://superuser.blog/upgrading-apache-phoenix-hdp/ its not opening at all
Created 05-09-2018 09:53 AM
I just checked and it is opening, can you try again? @Krishna Srinivas
Created 11-18-2017 05:36 PM
Hdp 2.6.2 already supports the latest phoenix stuff. It says 4.7 but it's a custom patched version. I just wasn't using the hdp phoenix driver.
That being said, it's very confusing for hdp to not just stay in sync with the naming conventions and versioning of the open source version. I'm assuming they forked it but time will only make it harder to sync back up.
Created 01-31-2018 06:07 PM
We tried this too on our HDP 2.6.3 cluster. Sure enough, we got the same issue:
/usr/hdp/current/spark2-client/bin/spark-shell --master yarn-client --driver-memory 3g --executor-memory 3g --num-executors 2 --executor-cores 2 --conf "spark.driver.extraClassPath=/usr/hdp/current/phoenix-client/phoenix-client.jar:/usr/hdp/current/phoenix-client/phoenix-spark2.jar:/etc/hbase/conf" --conf "spark.executor.extraClassPath=/usr/hdp/current/phoenix-client/phoenix-client.jar:/usr/hdp/current/phoenix-client/phoenix-spark2.jar:/etc/hbase/conf"
scala> val jobsDF = spark.read.format("org.apache.phoenix.spark").options(Map( | "table" -> "ns.Jobs", "zkUrl" -> zkUrl)).load ivysettings.xml file not found in HIVE_HOME or HIVE_CONF_DIR,file:/usr/hdp/2.6.3.0-235/phoenix/phoenix-4.7.0.2.6.3.0-235-client.jar!/ivysettings.xml will be used 2018-01-30 16:24:33,254 INFO [main] zookeeper.RecoverableZooKeeper: Process identifier=hconnection-0x79bb14d8 connecting to ZooKeeper ensemble=zkhost1:2181,zkhost2:2181,zkhost3:2181 java.lang.NoClassDefFoundError: org/apache/spark/sql/DataFrame at java.lang.Class.getDeclaredMethods0(Native Method) at java.lang.Class.privateGetDeclaredMethods(Class.java:2701) at java.lang.Class.getDeclaredMethod(Class.java:2128) at java.io.ObjectStreamClass.getPrivateMethod(ObjectStreamClass.java:1575) ... at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:146) ... 49 elided Caused by: java.lang.ClassNotFoundException: org.apache.spark.sql.DataFrame at java.net.URLClassLoader.findClass(URLClassLoader.java:381) at java.lang.ClassLoader.loadClass(ClassLoader.java:424) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:338) at java.lang.ClassLoader.loadClass(ClassLoader.java:357) ... 83 more
Tweaking extraClassPath and --jars using phoenix-client.jar, phoenix-4.7.0.2.6.3.0-235-spark2.jar, and spark-sql_2.11-2.2.0.2.6.3.0-235.jar made no difference. I am inclined to agree with this other fellow that Hortonworks' phoenix-client.jar is not actually Spark2-compatible, release note to the contrary.
Created 01-31-2018 06:07 PM
... forgot the Stack Overflow link
Created 01-31-2018 09:44 PM
We are seeing same issue with spark2 of NoClassDefFound using /usr/hdp/current/phoenix-client/phoenix-client.jar:/usr/hdp/current/phoenix-client/phoenix-spark2.jar, release notes says it includes patch for PHOENIX-3333 but it doesn't look like that is the case.
Created 02-01-2018 06:39 PM
Found the solution - place phoenix-spark2.jar before phoenix-client.jar, and everything worked.
The Spark2/Scala 2.11 versions of org.apache.phoenix.spark classes need to overlay those included in the main phoenix-client.jar.
Try it and let us know. 🙂