Using spark-hbase-connector Package with Pyspark

FrozenWave — Fri, 16 Sep 2022 10:22:14 GMT

Hi all, I wanted to experiment with the "it.nerdammer.bigdata:spark-hbase-connector_2.10:1.0.3" Package (you can find it at spark-packages.org ). It's an interesting addon giving RDD visibility/operativity on hBase tables via Spark.

If I run this extension library in a standard spark-shell (with scala support), everything works smoothly :

spark-shell --packages it.nerdammer.bigdata:spark-hbase-connector_2.10:1.0.3 \
--conf spark.hbase.host=<HBASE_HOST>

scala> import it.nerdammer.spark.hbase._
import it.nerdammer.spark.hbase._

If I try to run it in a Pyspark shell, therefore my goal is to use the extension with Python, I'm not able to import the Functions and I'm not able to use anything:

PYSPARK_DRIVER_PYTHON=ipython pyspark --packages it.nerdammer.bigdata:spark-hbase-connector_2.10:1.0.3 \
--conf spark.hbase.host=<HBASE_HOST>

In [1]: from it.nerdammer.spark.hbase import *
---------------------------------------------------------------------------
ImportError                               Traceback (most recent call last)
<ipython-input-1-37dd5a5ffba0> in <module>()
----> 1 from it.nerdammer.spark.hbase import *

ImportError: No module named it.nerdammer.spark.hbase

I have tried different combinations of environment variables, parameters, etc when launching Pyspark, but to no avail.

Maybe I'm just trying to do something deeply wrong here, maybe it's simply the fact that there is no Python API access to this Library. In a matter of fact, the examples on the Package's home page are all in Scala (but they say you can install the Package in Pyspark too, with the classic "--package" parameter).

Can anybody help out with the "ImportError: No module named it.nerdammer.spark.hbase" error message?

Thanks for any insight

Re: Using spark-hbase-connector Package with Pyspark

Harsh J — Wed, 27 Jul 2016 13:37:54 GMT

Here's one example that uses the native hbase-spark module via DataFrames in PySpark: http://community.cloudera.com/t5/Storage-Random-Access-HDFS/Include-latest-hbase-spark-in-CDH/m-p/43236/highlight/true#M2280

Re: Using spark-hbase-connector Package with Pyspark

FrozenWave — Thu, 28 Jul 2016 10:11:13 GMT

Thanks. Seems a good alternative, and in a matter of fact I was not aware of its availability in CDH 5.7

Marking the thread as solved, even if by now I don't know yet if all the features I'd need will be there in the native hbase-spark connector

question Using spark-hbase-connector Package with Pyspark in Archives of Support Questions (Read Only)

Using spark-hbase-connector Package with Pyspark

Re: Using spark-hbase-connector Package with Pyspark

Re: Using spark-hbase-connector Package with Pyspark