Created on 03-17-2018 05:58 AM
Following are the steps to connect to Phoenix tables using Spark2.
1) Create a symlink of hbase-site.xml in spark2 conf
ln -s /etc/hbase/conf/hbase-site.xml /etc/spark2/conf/hbase-site.xml
2) Launch spark-shell using phoenix spark jars in extra classpath.
spark-shell --conf "spark.executor.extraClassPath=/usr/hdp/current/phoenix-client/phoenix-4.7.0.2.6.3.0-235-spark2.jar:/usr/hdp/current/phoenix-client/phoenix-client.jar" --conf "spark.driver.extraClassPath=/usr/hdp/current/phoenix-client/phoenix-4.7.0.2.6.3.0-235-spark2.jar:/usr/hdp/current/phoenix-client/phoenix-client.jar"
3) Create a phoenix connection and query the tables.
scala> import org.apache.spark.sql.SQLContext import org.apache.spark.sql.SQLContext scala> val sqlContext = new SQLContext(sc) sqlContext: org.apache.spark.sql.SQLContext = org.apache.spark.sql.SQLContext@495e8a3 scala> val df = sqlContext.load("org.apache.phoenix.spark",Map("table" -> "TABLE1", "zkUrl" -> "localhost:2181")) df: org.apache.spark.sql.DataFrame = [ID: string, COL1: string ... 1 more field] scala> df.show() +-----+----------+----+ | ID| COL1|COL2| +-----+----------+----+ |test1|test_row_1| 10| |test2|test_row_2| 20| +-----+----------+----+
Note: Spark2 and Phoenix integration is introduced from HDP 2.6.2.
Created on 04-16-2018 11:49 AM
Hi Sandeep ,
Thanks for the post .By using the above steps I am able to read the data from phoenix. But unable to write. While trying to save a table using
the below mentioned command ....df.save("org.apache.phoenix.spark", SaveMode.Overwrite, Map("table" -> "tabel", "zkUrl" -> "zkurl")) ---its showing error ..
"error: value save is not a member of org.apache.spark.sql.DataFrame" .. Could you please suggest how to resolve this issue ?
Created on 05-31-2018 02:25 PM
Hi Anusuya,
In spark2, in order to save your dataframe to Phoenix table ...
instead of df.save("org.apache.phoenix.spark", SaveMode.Overwrite, Map("table" -> "OUTPUT_TABLE", "zkUrl" -> hbaseConnectionString))
use df.saveToPhoenix(Map("table" -> "OUTPUT_TABLE", "zkUrl" -> hbaseConnectionString))
Created on 06-20-2019 02:37 PM
Hello,
Thanks for the post. When I attempt the above commands I get the following error:
<console>:27: error: not found: value df
Any idea why that is?