Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

How to connect Spark 2.2.0 with Phoenix 4.7 in HDP 2.6.3 ?

Solved Go to solution

How to connect Spark 2.2.0 with Phoenix 4.7 in HDP 2.6.3 ?

New Contributor
 
1 ACCEPTED SOLUTION

Accepted Solutions

Re: How to connect Spark 2.2.0 with Phoenix 4.7 in HDP 2.6.3 ?

@Ranjan Raut

Below are the steps to connect Spark 2.2 with phoenix in HDP 2.6.3.

1) Create a symlink of hbase-site.xml in spark2 conf

ln -s /etc/hbase/conf/hbase-site.xml /etc/spark2/conf/hbase-site.xml

2) Launch spark-shell using phoenix spark jars in extra classpath.

spark-shell --conf "spark.executor.extraClassPath=/usr/hdp/current/phoenix-client/phoenix-4.7.0.2.6.3.0-235-spark2.jar:/usr/hdp/current/phoenix-client/phoenix-client.jar" --conf "spark.driver.extraClassPath=/usr/hdp/current/phoenix-client/phoenix-4.7.0.2.6.3.0-235-spark2.jar:/usr/hdp/current/phoenix-client/phoenix-client.jar"

3) Create a phoenix connection and query the tables.

scala> import org.apache.spark.sql.SQLContext
import org.apache.spark.sql.SQLContext

scala> val sqlContext = new SQLContext(sc)
sqlContext: org.apache.spark.sql.SQLContext = org.apache.spark.sql.SQLContext@495e8a3

scala> val df = sqlContext.load("org.apache.phoenix.spark",Map("table" -> "TABLE1", "zkUrl" -> "localhost:2181"))
df: org.apache.spark.sql.DataFrame = [ID: string, COL1: string ... 1 more field]

scala> df.show()
+-----+----------+----+
|   ID|      COL1|COL2|
+-----+----------+----+
|test1|test_row_1|  10|
|test2|test_row_2|  20|
+-----+----------+----+
7 REPLIES 7

Re: How to connect Spark 2.2.0 with Phoenix 4.7 in HDP 2.6.3 ?

@Ranjan Raut

Below are the steps to connect Spark 2.2 with phoenix in HDP 2.6.3.

1) Create a symlink of hbase-site.xml in spark2 conf

ln -s /etc/hbase/conf/hbase-site.xml /etc/spark2/conf/hbase-site.xml

2) Launch spark-shell using phoenix spark jars in extra classpath.

spark-shell --conf "spark.executor.extraClassPath=/usr/hdp/current/phoenix-client/phoenix-4.7.0.2.6.3.0-235-spark2.jar:/usr/hdp/current/phoenix-client/phoenix-client.jar" --conf "spark.driver.extraClassPath=/usr/hdp/current/phoenix-client/phoenix-4.7.0.2.6.3.0-235-spark2.jar:/usr/hdp/current/phoenix-client/phoenix-client.jar"

3) Create a phoenix connection and query the tables.

scala> import org.apache.spark.sql.SQLContext
import org.apache.spark.sql.SQLContext

scala> val sqlContext = new SQLContext(sc)
sqlContext: org.apache.spark.sql.SQLContext = org.apache.spark.sql.SQLContext@495e8a3

scala> val df = sqlContext.load("org.apache.phoenix.spark",Map("table" -> "TABLE1", "zkUrl" -> "localhost:2181"))
df: org.apache.spark.sql.DataFrame = [ID: string, COL1: string ... 1 more field]

scala> df.show()
+-----+----------+----+
|   ID|      COL1|COL2|
+-----+----------+----+
|test1|test_row_1|  10|
|test2|test_row_2|  20|
+-----+----------+----+

Re: How to connect Spark 2.2.0 with Phoenix 4.7 in HDP 2.6.3 ?

New Contributor

Thanks alot @Sandeep Nemuri. It worked (y)

Re: How to connect Spark 2.2.0 with Phoenix 4.7 in HDP 2.6.3 ?

@Ranjan Raut Glad that it helped you, Would you mind accepting this answer so that this thread will be marked as answered.

Highlighted

Re: How to connect Spark 2.2.0 with Phoenix 4.7 in HDP 2.6.3 ?

Contributor

Note that phoenix-spark2.jar MUST precede phoenix-client.jar in extraClassPath, otherwise connection will fail with:

java.lang.NoClassDefFoundError: org/apache/spark/sql/DataFrame

Re: How to connect Spark 2.2.0 with Phoenix 4.7 in HDP 2.6.3 ?

New Contributor

How can I save this "df" to other table of Phoenix ?

Re: How to connect Spark 2.2.0 with Phoenix 4.7 in HDP 2.6.3 ?

New Contributor

I found answer by myself ...

use df.saveToPhoenix(Map("table" -> "OUTPUT_TABLE", "zkUrl" -> hbaseConnectionString))

Re: How to connect Spark 2.2.0 with Phoenix 4.7 in HDP 2.6.3 ?

Contributor
Don't have an account?
Coming from Hortonworks? Activate your account here