Community Articles

Find and share helpful community-sourced technical articles.
Announcements
Now Live: Explore expert insights and technical deep dives on the new Cloudera Community BlogsRead the Announcement
Labels (2)
avatar

Following are the steps to connect to Phoenix tables using Spark2.

1) Create a symlink of hbase-site.xml in spark2 conf

ln -s /etc/hbase/conf/hbase-site.xml /etc/spark2/conf/hbase-site.xml

2) Launch spark-shell using phoenix spark jars in extra classpath.

spark-shell --conf "spark.executor.extraClassPath=/usr/hdp/current/phoenix-client/phoenix-4.7.0.2.6.3.0-235-spark2.jar:/usr/hdp/current/phoenix-client/phoenix-client.jar" --conf "spark.driver.extraClassPath=/usr/hdp/current/phoenix-client/phoenix-4.7.0.2.6.3.0-235-spark2.jar:/usr/hdp/current/phoenix-client/phoenix-client.jar"

3) Create a phoenix connection and query the tables.

scala> import org.apache.spark.sql.SQLContext
import org.apache.spark.sql.SQLContext

scala> val sqlContext = new SQLContext(sc)
sqlContext: org.apache.spark.sql.SQLContext = org.apache.spark.sql.SQLContext@495e8a3

scala> val df = sqlContext.load("org.apache.phoenix.spark",Map("table" -> "TABLE1", "zkUrl" -> "localhost:2181"))
df: org.apache.spark.sql.DataFrame = [ID: string, COL1: string ... 1 more field]

scala> df.show()
+-----+----------+----+
|   ID|      COL1|COL2|
+-----+----------+----+
|test1|test_row_1|  10|
|test2|test_row_2|  20|
+-----+----------+----+

Note: Spark2 and Phoenix integration is introduced from HDP 2.6.2.

7,943 Views
Comments
avatar
New Member

Hi Sandeep ,

Thanks for the post .By using the above steps I am able to read the data from phoenix. But unable to write. While trying to save a table using

the below mentioned command ....df.save("org.apache.phoenix.spark", SaveMode.Overwrite, Map("table" -> "tabel", "zkUrl" -> "zkurl")) ---its showing error ..

"error: value save is not a member of org.apache.spark.sql.DataFrame" .. Could you please suggest how to resolve this issue ?

avatar
New Member

Hi Anusuya,

In spark2, in order to save your dataframe to Phoenix table ...

instead of df.save("org.apache.phoenix.spark", SaveMode.Overwrite, Map("table" -> "OUTPUT_TABLE", "zkUrl" -> hbaseConnectionString))

use df.saveToPhoenix(Map("table" -> "OUTPUT_TABLE", "zkUrl" -> hbaseConnectionString))

avatar
New Member

Hello,

Thanks for the post. When I attempt the above commands I get the following error:

<console>:27: error: not found: value df

Any idea why that is?


Version history
Last update:
‎03-17-2018 05:58 AM
Updated by:
Contributors