- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
Using spark-hbase-connector Package with Pyspark
- Labels:
-
Apache HBase
-
Apache Spark
Created on ‎05-29-2016 11:39 AM - edited ‎09-16-2022 03:22 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi all, I wanted to experiment with the "it.nerdammer.bigdata:spark-hbase-connector_2.10:1.0.3" Package (you can find it at spark-packages.org ). It's an interesting addon giving RDD visibility/operativity on hBase tables via Spark.
If I run this extension library in a standard spark-shell (with scala support), everything works smoothly :
spark-shell --packages it.nerdammer.bigdata:spark-hbase-connector_2.10:1.0.3 \ --conf spark.hbase.host=<HBASE_HOST> scala> import it.nerdammer.spark.hbase._ import it.nerdammer.spark.hbase._
If I try to run it in a Pyspark shell, therefore my goal is to use the extension with Python, I'm not able to import the Functions and I'm not able to use anything:
PYSPARK_DRIVER_PYTHON=ipython pyspark --packages it.nerdammer.bigdata:spark-hbase-connector_2.10:1.0.3 \ --conf spark.hbase.host=<HBASE_HOST> In [1]: from it.nerdammer.spark.hbase import * --------------------------------------------------------------------------- ImportError Traceback (most recent call last) <ipython-input-1-37dd5a5ffba0> in <module>() ----> 1 from it.nerdammer.spark.hbase import * ImportError: No module named it.nerdammer.spark.hbase
I have tried different combinations of environment variables, parameters, etc when launching Pyspark, but to no avail.
Maybe I'm just trying to do something deeply wrong here, maybe it's simply the fact that there is no Python API access to this Library. In a matter of fact, the examples on the Package's home page are all in Scala (but they say you can install the Package in Pyspark too, with the classic "--package" parameter).
Can anybody help out with the "ImportError: No module named it.nerdammer.spark.hbase" error message?
Thanks for any insight
Created ‎07-27-2016 06:37 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Created ‎07-27-2016 06:37 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Created ‎07-28-2016 03:11 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks. Seems a good alternative, and in a matter of fact I was not aware of its availability in CDH 5.7
Marking the thread as solved, even if by now I don't know yet if all the features I'd need will be there in the native hbase-spark connector
