Support Questions

Find answers, ask questions, and share your expertise
Announcements
Check out our newest addition to the community, the Cloudera Data Analytics (CDA) group hub.

How to connect Spark 2.2.0 with Phoenix 4.7 in HDP 2.6.3 ?

 
1 ACCEPTED SOLUTION

@Ranjan Raut

Below are the steps to connect Spark 2.2 with phoenix in HDP 2.6.3.

1) Create a symlink of hbase-site.xml in spark2 conf

ln -s /etc/hbase/conf/hbase-site.xml /etc/spark2/conf/hbase-site.xml

2) Launch spark-shell using phoenix spark jars in extra classpath.

spark-shell --conf "spark.executor.extraClassPath=/usr/hdp/current/phoenix-client/phoenix-4.7.0.2.6.3.0-235-spark2.jar:/usr/hdp/current/phoenix-client/phoenix-client.jar" --conf "spark.driver.extraClassPath=/usr/hdp/current/phoenix-client/phoenix-4.7.0.2.6.3.0-235-spark2.jar:/usr/hdp/current/phoenix-client/phoenix-client.jar"

3) Create a phoenix connection and query the tables.

scala> import org.apache.spark.sql.SQLContext
import org.apache.spark.sql.SQLContext

scala> val sqlContext = new SQLContext(sc)
sqlContext: org.apache.spark.sql.SQLContext = org.apache.spark.sql.SQLContext@495e8a3

scala> val df = sqlContext.load("org.apache.phoenix.spark",Map("table" -> "TABLE1", "zkUrl" -> "localhost:2181"))
df: org.apache.spark.sql.DataFrame = [ID: string, COL1: string ... 1 more field]

scala> df.show()
+-----+----------+----+
|   ID|      COL1|COL2|
+-----+----------+----+
|test1|test_row_1|  10|
|test2|test_row_2|  20|
+-----+----------+----+

View solution in original post

7 REPLIES 7

@Ranjan Raut

Below are the steps to connect Spark 2.2 with phoenix in HDP 2.6.3.

1) Create a symlink of hbase-site.xml in spark2 conf

ln -s /etc/hbase/conf/hbase-site.xml /etc/spark2/conf/hbase-site.xml

2) Launch spark-shell using phoenix spark jars in extra classpath.

spark-shell --conf "spark.executor.extraClassPath=/usr/hdp/current/phoenix-client/phoenix-4.7.0.2.6.3.0-235-spark2.jar:/usr/hdp/current/phoenix-client/phoenix-client.jar" --conf "spark.driver.extraClassPath=/usr/hdp/current/phoenix-client/phoenix-4.7.0.2.6.3.0-235-spark2.jar:/usr/hdp/current/phoenix-client/phoenix-client.jar"

3) Create a phoenix connection and query the tables.

scala> import org.apache.spark.sql.SQLContext
import org.apache.spark.sql.SQLContext

scala> val sqlContext = new SQLContext(sc)
sqlContext: org.apache.spark.sql.SQLContext = org.apache.spark.sql.SQLContext@495e8a3

scala> val df = sqlContext.load("org.apache.phoenix.spark",Map("table" -> "TABLE1", "zkUrl" -> "localhost:2181"))
df: org.apache.spark.sql.DataFrame = [ID: string, COL1: string ... 1 more field]

scala> df.show()
+-----+----------+----+
|   ID|      COL1|COL2|
+-----+----------+----+
|test1|test_row_1|  10|
|test2|test_row_2|  20|
+-----+----------+----+

Thanks alot @Sandeep Nemuri. It worked (y)

@Ranjan Raut Glad that it helped you, Would you mind accepting this answer so that this thread will be marked as answered.

Contributor

Note that phoenix-spark2.jar MUST precede phoenix-client.jar in extraClassPath, otherwise connection will fail with:

java.lang.NoClassDefFoundError: org/apache/spark/sql/DataFrame

How can I save this "df" to other table of Phoenix ?

I found answer by myself ...

use df.saveToPhoenix(Map("table" -> "OUTPUT_TABLE", "zkUrl" -> hbaseConnectionString))

Contributor
Take a Tour of the Community
Don't have an account?
Your experience may be limited. Sign in to explore more.