Community Articles

sandyy006 · ‎03-17-2018

Following are the steps to connect to Phoenix tables using Spark2.

1) Create a symlink of hbase-site.xml in spark2 conf

ln -s /etc/hbase/conf/hbase-site.xml /etc/spark2/conf/hbase-site.xml

2) Launch spark-shell using phoenix spark jars in extra classpath.

spark-shell --conf "spark.executor.extraClassPath=/usr/hdp/current/phoenix-client/phoenix-4.7.0.2.6.3.0-235-spark2.jar:/usr/hdp/current/phoenix-client/phoenix-client.jar" --conf "spark.driver.extraClassPath=/usr/hdp/current/phoenix-client/phoenix-4.7.0.2.6.3.0-235-spark2.jar:/usr/hdp/current/phoenix-client/phoenix-client.jar"

3) Create a phoenix connection and query the tables.

scala> import org.apache.spark.sql.SQLContext
import org.apache.spark.sql.SQLContext

scala> val sqlContext = new SQLContext(sc)
sqlContext: org.apache.spark.sql.SQLContext = org.apache.spark.sql.SQLContext@495e8a3

scala> val df = sqlContext.load("org.apache.phoenix.spark",Map("table" -> "TABLE1", "zkUrl" -> "localhost:2181"))
df: org.apache.spark.sql.DataFrame = [ID: string, COL1: string ... 1 more field]

scala> df.show()
+-----+----------+----+
|   ID|      COL1|COL2|
+-----+----------+----+
|test1|test_row_1|  10|
|test2|test_row_2|  20|
+-----+----------+----+

Note: Spark2 and Phoenix integration is introduced from HDP 2.6.2.

anusuya_sahoo · ‎04-16-2018

Hi Sandeep ,

Thanks for the post .By using the above steps I am able to read the data from phoenix. But unable to write. While trying to save a table using

the below mentioned command ....df.save("org.apache.phoenix.spark", SaveMode.Overwrite, Map("table" -> "tabel", "zkUrl" -> "zkurl")) ---its showing error ..

"error: value save is not a member of org.apache.spark.sql.DataFrame" .. Could you please suggest how to resolve this issue ?

ranjan_raut18 · ‎05-31-2018

Hi Anusuya,

In spark2, in order to save your dataframe to Phoenix table ...

instead of df.save("org.apache.phoenix.spark", SaveMode.Overwrite, Map("table" -> "OUTPUT_TABLE", "zkUrl" -> hbaseConnectionString))

use df.saveToPhoenix(Map("table" -> "OUTPUT_TABLE", "zkUrl" -> hbaseConnectionString))

clferris · ‎06-20-2019

Hello,

Thanks for the post. When I attempt the above commands I get the following error:

<console>:27: error: not found: value df

Any idea why that is?

Cloudera Community

Community Articles

How to connect to Phoenix tables using Spark2

Apache Phoenix

Apache Spark

Re: How to connect to Phoenix tables using Spark2

Re: How to connect to Phoenix tables using Spark2

Re: How to connect to Phoenix tables using Spark2

Connect to Phoenix-HBase using DBVisualizer

Connecting to Phoenix Query Server via DBVisualize...

How to Integrate CDE with COD and Reading & Writin...

How to delete a Phoenix Table created on Top of Ex...

Connect to CDP DataHub Hive using Cloudera ODBC Dr...

How to create and connect to a dedicated Hive HS2 ...

Save Spark DataFrame table into Phoenix

Kafka in SSL+KERBEROS, and Spark2 connecting SSL S...

Using Spark3 in Jupyter notebook with an environme...

Ingesting osquery Into Apache Phoenix using Apache...