Community Articles
Find and share helpful community-sourced technical articles
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.
Labels (2)

Following are the steps to connect to Phoenix tables using Spark2.

1) Create a symlink of hbase-site.xml in spark2 conf

ln -s /etc/hbase/conf/hbase-site.xml /etc/spark2/conf/hbase-site.xml

2) Launch spark-shell using phoenix spark jars in extra classpath.

spark-shell --conf "spark.executor.extraClassPath=/usr/hdp/current/phoenix-client/phoenix-4.7.0.2.6.3.0-235-spark2.jar:/usr/hdp/current/phoenix-client/phoenix-client.jar" --conf "spark.driver.extraClassPath=/usr/hdp/current/phoenix-client/phoenix-4.7.0.2.6.3.0-235-spark2.jar:/usr/hdp/current/phoenix-client/phoenix-client.jar"

3) Create a phoenix connection and query the tables.

scala> import org.apache.spark.sql.SQLContext
import org.apache.spark.sql.SQLContext

scala> val sqlContext = new SQLContext(sc)
sqlContext: org.apache.spark.sql.SQLContext = org.apache.spark.sql.SQLContext@495e8a3

scala> val df = sqlContext.load("org.apache.phoenix.spark",Map("table" -> "TABLE1", "zkUrl" -> "localhost:2181"))
df: org.apache.spark.sql.DataFrame = [ID: string, COL1: string ... 1 more field]

scala> df.show()
+-----+----------+----+
|   ID|      COL1|COL2|
+-----+----------+----+
|test1|test_row_1|  10|
|test2|test_row_2|  20|
+-----+----------+----+

Note: Spark2 and Phoenix integration is introduced from HDP 2.6.2.

2,885 Views
Comments
Not applicable

Hi Sandeep ,

Thanks for the post .By using the above steps I am able to read the data from phoenix. But unable to write. While trying to save a table using

the below mentioned command ....df.save("org.apache.phoenix.spark", SaveMode.Overwrite, Map("table" -> "tabel", "zkUrl" -> "zkurl")) ---its showing error ..

"error: value save is not a member of org.apache.spark.sql.DataFrame" .. Could you please suggest how to resolve this issue ?

New Contributor

Hi Anusuya,

In spark2, in order to save your dataframe to Phoenix table ...

instead of df.save("org.apache.phoenix.spark", SaveMode.Overwrite, Map("table" -> "OUTPUT_TABLE", "zkUrl" -> hbaseConnectionString))

use df.saveToPhoenix(Map("table" -> "OUTPUT_TABLE", "zkUrl" -> hbaseConnectionString))

Not applicable

Hello,

Thanks for the post. When I attempt the above commands I get the following error:

<console>:27: error: not found: value df

Any idea why that is?


Don't have an account?
Coming from Hortonworks? Activate your account here
Version history
Revision #:
1 of 1
Last update:
‎03-17-2018 05:58 AM
Updated by:
 
Contributors
Top Kudoed Authors