Archives of Support Questions (Read Only)

This is an archived board for historical reference. Information and links may no longer be available or relevant
Announcements
This board is archived and read-only for historical reference. To ask a new question, please post a new topic on the appropriate active board.

How to connect Spark 2.2.0 with Phoenix 4.7 in HDP 2.6.3 ?

avatar
New Member
 
1 ACCEPTED SOLUTION

avatar
@Ranjan Raut

Below are the steps to connect Spark 2.2 with phoenix in HDP 2.6.3.

1) Create a symlink of hbase-site.xml in spark2 conf

ln -s /etc/hbase/conf/hbase-site.xml /etc/spark2/conf/hbase-site.xml

2) Launch spark-shell using phoenix spark jars in extra classpath.

spark-shell --conf "spark.executor.extraClassPath=/usr/hdp/current/phoenix-client/phoenix-4.7.0.2.6.3.0-235-spark2.jar:/usr/hdp/current/phoenix-client/phoenix-client.jar" --conf "spark.driver.extraClassPath=/usr/hdp/current/phoenix-client/phoenix-4.7.0.2.6.3.0-235-spark2.jar:/usr/hdp/current/phoenix-client/phoenix-client.jar"

3) Create a phoenix connection and query the tables.

scala> import org.apache.spark.sql.SQLContext
import org.apache.spark.sql.SQLContext

scala> val sqlContext = new SQLContext(sc)
sqlContext: org.apache.spark.sql.SQLContext = org.apache.spark.sql.SQLContext@495e8a3

scala> val df = sqlContext.load("org.apache.phoenix.spark",Map("table" -> "TABLE1", "zkUrl" -> "localhost:2181"))
df: org.apache.spark.sql.DataFrame = [ID: string, COL1: string ... 1 more field]

scala> df.show()
+-----+----------+----+
|   ID|      COL1|COL2|
+-----+----------+----+
|test1|test_row_1|  10|
|test2|test_row_2|  20|
+-----+----------+----+

View solution in original post

7 REPLIES 7

avatar
@Ranjan Raut

Below are the steps to connect Spark 2.2 with phoenix in HDP 2.6.3.

1) Create a symlink of hbase-site.xml in spark2 conf

ln -s /etc/hbase/conf/hbase-site.xml /etc/spark2/conf/hbase-site.xml

2) Launch spark-shell using phoenix spark jars in extra classpath.

spark-shell --conf "spark.executor.extraClassPath=/usr/hdp/current/phoenix-client/phoenix-4.7.0.2.6.3.0-235-spark2.jar:/usr/hdp/current/phoenix-client/phoenix-client.jar" --conf "spark.driver.extraClassPath=/usr/hdp/current/phoenix-client/phoenix-4.7.0.2.6.3.0-235-spark2.jar:/usr/hdp/current/phoenix-client/phoenix-client.jar"

3) Create a phoenix connection and query the tables.

scala> import org.apache.spark.sql.SQLContext
import org.apache.spark.sql.SQLContext

scala> val sqlContext = new SQLContext(sc)
sqlContext: org.apache.spark.sql.SQLContext = org.apache.spark.sql.SQLContext@495e8a3

scala> val df = sqlContext.load("org.apache.phoenix.spark",Map("table" -> "TABLE1", "zkUrl" -> "localhost:2181"))
df: org.apache.spark.sql.DataFrame = [ID: string, COL1: string ... 1 more field]

scala> df.show()
+-----+----------+----+
|   ID|      COL1|COL2|
+-----+----------+----+
|test1|test_row_1|  10|
|test2|test_row_2|  20|
+-----+----------+----+

avatar
New Member

Thanks alot @Sandeep Nemuri. It worked (y)

avatar

@Ranjan Raut Glad that it helped you, Would you mind accepting this answer so that this thread will be marked as answered.

avatar
Expert Contributor

Note that phoenix-spark2.jar MUST precede phoenix-client.jar in extraClassPath, otherwise connection will fail with:

java.lang.NoClassDefFoundError: org/apache/spark/sql/DataFrame

avatar
New Member

How can I save this "df" to other table of Phoenix ?

avatar
New Member

I found answer by myself ...

use df.saveToPhoenix(Map("table" -> "OUTPUT_TABLE", "zkUrl" -> hbaseConnectionString))

avatar
Contributor