Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

HBase PySpark Integration - Working Examples

HBase PySpark Integration - Working Examples

New Contributor

I am trying to write a dataframe back into an HBase table. I have been able to read data from an HBase table using the hortonworks spark-hbase connector from https://github.com/hortonworks-spark/shc

I have seen this article https://hortonworks.com/blog/spark-hbase-dataframe-based-hbase-connector/ which gives some code for saving data back into hbase. However, I need something similar on pyspark. Cannot find any working code, even for syntax and packages to import.

The only other way I think I can try is using phoenix jars. But trying to go purely with hortonworks libraries and see if this is possible.

Can anyone help?

2 REPLIES 2

Re: HBase PySpark Integration - Working Examples

@Sai Geetha M N

I don't know of a Python version tutorial/post of the linked you shared, but here are some resources worth checking out that are Python/Spark/Hive-with-Spark related:

https://hortonworks.com/tutorial/hands-on-tour-of-apache-spark-in-5-minutes/

https://hortonworks.com/tutorial/setting-up-a-spark-development-environment-with-python/

https://hortonworks.com/tutorial/using-hive-with-orc-from-apache-spark/

Re: HBase PySpark Integration - Working Examples

New Contributor

@Edgar Orendain

I have already done what the above tutorials are meant for Apache spark, python, hive ORC from spark etc.). I have a specific need for HBase-Spark integration only - even that I have been able to achieve read, I need to up able to write or update HBase from spark.