Community Articles

Find and share helpful community-sourced technical articles.
Announcements
Celebrating as our community reaches 100,000 members! Thank you!
Labels (2)
avatar

Following are the steps to connect to Phoenix tables using Spark2.

1) Create a symlink of hbase-site.xml in spark2 conf

ln -s /etc/hbase/conf/hbase-site.xml /etc/spark2/conf/hbase-site.xml

2) Launch spark-shell using phoenix spark jars in extra classpath.

spark-shell --conf "spark.executor.extraClassPath=/usr/hdp/current/phoenix-client/phoenix-4.7.0.2.6.3.0-235-spark2.jar:/usr/hdp/current/phoenix-client/phoenix-client.jar" --conf "spark.driver.extraClassPath=/usr/hdp/current/phoenix-client/phoenix-4.7.0.2.6.3.0-235-spark2.jar:/usr/hdp/current/phoenix-client/phoenix-client.jar"

3) Create a phoenix connection and query the tables.

scala> import org.apache.spark.sql.SQLContext
import org.apache.spark.sql.SQLContext

scala> val sqlContext = new SQLContext(sc)
sqlContext: org.apache.spark.sql.SQLContext = org.apache.spark.sql.SQLContext@495e8a3

scala> val df = sqlContext.load("org.apache.phoenix.spark",Map("table" -> "TABLE1", "zkUrl" -> "localhost:2181"))
df: org.apache.spark.sql.DataFrame = [ID: string, COL1: string ... 1 more field]

scala> df.show()
+-----+----------+----+
|   ID|      COL1|COL2|
+-----+----------+----+
|test1|test_row_1|  10|
|test2|test_row_2|  20|
+-----+----------+----+

Note: Spark2 and Phoenix integration is introduced from HDP 2.6.2.

6,764 Views
Comments
avatar
New Contributor

Hi Sandeep ,

Thanks for the post .By using the above steps I am able to read the data from phoenix. But unable to write. While trying to save a table using

the below mentioned command ....df.save("org.apache.phoenix.spark", SaveMode.Overwrite, Map("table" -> "tabel", "zkUrl" -> "zkurl")) ---its showing error ..

"error: value save is not a member of org.apache.spark.sql.DataFrame" .. Could you please suggest how to resolve this issue ?

avatar

Hi Anusuya,

In spark2, in order to save your dataframe to Phoenix table ...

instead of df.save("org.apache.phoenix.spark", SaveMode.Overwrite, Map("table" -> "OUTPUT_TABLE", "zkUrl" -> hbaseConnectionString))

use df.saveToPhoenix(Map("table" -> "OUTPUT_TABLE", "zkUrl" -> hbaseConnectionString))

avatar
New Contributor

Hello,

Thanks for the post. When I attempt the above commands I get the following error:

<console>:27: error: not found: value df

Any idea why that is?