Support Questions

Find answers, ask questions, and share your expertise
Announcements
Check out our newest addition to the community, the Cloudera Data Analytics (CDA) group hub.

How to upgrade to Phoenix 4.10?

Contributor

I want to use Phoenix Apache Spark Plugin ( https://phoenix.apache.org/phoenix_spark.html) with Hortonworks Sandbox (Version: HDP_2.6_vmware_19_04_2017_20_25_43_hdp_ambari_2_5_0_5_1), and I found a very information link https://community.hortonworks.com/questions/1942/spark-to-phoenix.html to experiment it. I have no problems with JDBC; however, I have problems with "Load as DataFrame" and the error message is "java.lang.ClassNotFoundException: org.apache.spark.sql.DataFrame". After I Google it, I found that I should upgrade to Phoenix 4.10 to work with Spark 2.x, the link http://apache-phoenix-user-list.1124778.n5.nabble.com/Phoenix-4-9-0-with-Spark-2-0-td3602.html. So my question is how to upgrade the sandbox to Phoenix 4.10?

Below are my steps:

CREATE TABLE and UPSERT INTO in Phoenix
cd /usr/hdp/current/phoenix-client/bin 
./sqlline.py sandbox.hortonworks.com 
CREATE TABLE TABLE1 (ID BIGINT NOT NULL PRIMARY KEY, COL1 VARCHAR); 
UPSERT INTO TABLE1 (ID, COL1) VALUES (1, 'test_row_1'); 
UPSERT INTO TABLE1 (ID, COL1) VALUES (2, 'test_row_2');
Start spark-shell
spark-shell \ 
--master yarn-client \ 
--jars \ /usr/hdp/2.6.0.3-8/phoenix/phoenix-4.7.0.2.6.0.3-8-client.jar, \ 
/usr/hdp/2.6.0.3-8/phoenix/lib/phoenix-spark-4.7.0.2.6.0.3-8.jar \ 
--conf \ 
"spark.executor.extraClassPath=/usr/hdp/2.6.0.3-8/phoenix/phoenix-4.7.0.2.6.0.3-8-client.jar:/usr/hdp/2.6.0.3-8/phoenix/phoenix-4.7.0.2.6.0.3-8-client.jar"
Inside spark-shell
import org.apache.phoenix.spark._ 

val df = spark.sqlContext.load("org.apache.phoenix.spark", Map("table" -> "TABLE1", "zkUrl" -> "sandbox.hortonworks.com:2181:/hbase-unsecure") )
Error Message
warning: there was one deprecation warning; re-run with -deprecation for details java.lang.NoClassDefFoundError: org/apache/spark/sql/DataFrame at java.lang.Class.getDeclaredMethods0(Native Method) at java.lang.Class.privateGetDeclaredMethods(Class.java:2701) at java.lang.Class.getDeclaredMethod(Class.java:2128) at java.io.ObjectStreamClass.getPrivateMethod(ObjectStreamClass.java:1475) at java.io.ObjectStreamClass.access$1700(ObjectStreamClass.java:72) at java.io.ObjectStreamClass$2.run(ObjectStreamClass.java:498) at java.io.ObjectStreamClass$2.run(ObjectStreamClass.java:472) at java.security.AccessController.doPrivileged(Native Method) at java.io.ObjectStreamClass.<init>(ObjectStreamClass.java:472) at java.io.ObjectStreamClass.lookup(ObjectStreamClass.java:369) at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1134) at java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1548) at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1509) at java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432) at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178) at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:348) at org.apache.spark.serializer.JavaSerializationStream.writeObject(JavaSerializer.scala:43) at org.apache.spark.serializer.JavaSerializerInstance.serialize(JavaSerializer.scala:100) at org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:295) at org.apache.spark.util.ClosureCleaner$.org$apache$spark$util$ClosureCleaner$clean(ClosureCleaner.scala:288) at org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:108) at org.apache.spark.SparkContext.clean(SparkContext.scala:2101) at org.apache.spark.rdd.RDD$anonfun$map$1.apply(RDD.scala:370) at org.apache.spark.rdd.RDD$anonfun$map$1.apply(RDD.scala:369) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112) at org.apache.spark.rdd.RDD.withScope(RDD.scala:362) at org.apache.spark.rdd.RDD.map(RDD.scala:369) at org.apache.phoenix.spark.PhoenixRDD.toDataFrame(PhoenixRDD.scala:119) at org.apache.phoenix.spark.PhoenixRelation.schema(PhoenixRelation.scala:59) at org.apache.spark.sql.execution.datasources.LogicalRelation.<init>(LogicalRelation.scala:40) at org.apache.spark.sql.SparkSession.baseRelationToDataFrame(SparkSession.scala:389) at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:146) at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:125) at org.apache.spark.sql.SQLContext.load(SQLContext.scala:965) ... 53 elided Caused by: java.lang.ClassNotFoundException: org.apache.spark.sql.DataFrame at java.net.URLClassLoader.findClass(URLClassLoader.java:381) at java.lang.ClassLoader.loadClass(ClassLoader.java:424) at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
10 REPLIES 10

Contributor

Same question about accessing Phoenix tables from full HDP 2.6 Spark 2 SQL.

Explorer

This would be really helpful to be able to use spark 2 with phoenix. Any ETA on when HDP will upgrade Phoenix?

Contributor

If you are just testing out on sandbox, this should really help: https://superuser.blog/upgrading-apache-phoenix-hdp/

We did it on prod.

Expert Contributor

Hi , While trying to access the URL https://superuser.blog/upgrading-apache-phoenix-hdp/ its not opening at all

Contributor

I just checked and it is opening, can you try again? @Krishna Srinivas

Explorer

Hdp 2.6.2 already supports the latest phoenix stuff. It says 4.7 but it's a custom patched version. I just wasn't using the hdp phoenix driver.

That being said, it's very confusing for hdp to not just stay in sync with the naming conventions and versioning of the open source version. I'm assuming they forked it but time will only make it harder to sync back up.

Contributor

We tried this too on our HDP 2.6.3 cluster. Sure enough, we got the same issue:

/usr/hdp/current/spark2-client/bin/spark-shell --master yarn-client --driver-memory 3g --executor-memory 3g --num-executors 2 --executor-cores 2 --conf "spark.driver.extraClassPath=/usr/hdp/current/phoenix-client/phoenix-client.jar:/usr/hdp/current/phoenix-client/phoenix-spark2.jar:/etc/hbase/conf" --conf "spark.executor.extraClassPath=/usr/hdp/current/phoenix-client/phoenix-client.jar:/usr/hdp/current/phoenix-client/phoenix-spark2.jar:/etc/hbase/conf" 
scala> val jobsDF = spark.read.format("org.apache.phoenix.spark").options(Map(
     |       "table" -> "ns.Jobs", "zkUrl" -> zkUrl)).load
ivysettings.xml file not found in HIVE_HOME or HIVE_CONF_DIR,file:/usr/hdp/2.6.3.0-235/phoenix/phoenix-4.7.0.2.6.3.0-235-client.jar!/ivysettings.xml will be used
2018-01-30 16:24:33,254 INFO  [main] zookeeper.RecoverableZooKeeper: Process identifier=hconnection-0x79bb14d8 connecting to ZooKeeper ensemble=zkhost1:2181,zkhost2:2181,zkhost3:2181
java.lang.NoClassDefFoundError: org/apache/spark/sql/DataFrame
  at java.lang.Class.getDeclaredMethods0(Native Method)
  at java.lang.Class.privateGetDeclaredMethods(Class.java:2701)
  at java.lang.Class.getDeclaredMethod(Class.java:2128)
  at java.io.ObjectStreamClass.getPrivateMethod(ObjectStreamClass.java:1575)
...
  at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:146)
  ... 49 elided
Caused by: java.lang.ClassNotFoundException: org.apache.spark.sql.DataFrame
  at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
  at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
  at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:338)
  at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
  ... 83 more

Tweaking extraClassPath and --jars using phoenix-client.jar, phoenix-4.7.0.2.6.3.0-235-spark2.jar, and spark-sql_2.11-2.2.0.2.6.3.0-235.jar made no difference. I am inclined to agree with this other fellow that Hortonworks' phoenix-client.jar is not actually Spark2-compatible, release note to the contrary.

Contributor

... forgot the Stack Overflow link

New Contributor

We are seeing same issue with spark2 of NoClassDefFound using /usr/hdp/current/phoenix-client/phoenix-client.jar:/usr/hdp/current/phoenix-client/phoenix-spark2.jar, release notes says it includes patch for PHOENIX-3333 but it doesn't look like that is the case.

Contributor

Found the solution - place phoenix-spark2.jar before phoenix-client.jar, and everything worked.

The Spark2/Scala 2.11 versions of org.apache.phoenix.spark classes need to overlay those included in the main phoenix-client.jar.

Try it and let us know. 🙂

Take a Tour of the Community
Don't have an account?
Your experience may be limited. Sign in to explore more.