Created on 05-29-2016 11:39 AM - edited 09-16-2022 03:22 AM
Hi all, I wanted to experiment with the "it.nerdammer.bigdata:spark-hbase-connector_2.10:1.0.3" Package (you can find it at spark-packages.org ). It's an interesting addon giving RDD visibility/operativity on hBase tables via Spark.
If I run this extension library in a standard spark-shell (with scala support), everything works smoothly :
spark-shell --packages it.nerdammer.bigdata:spark-hbase-connector_2.10:1.0.3 \ --conf spark.hbase.host=<HBASE_HOST> scala> import it.nerdammer.spark.hbase._ import it.nerdammer.spark.hbase._
If I try to run it in a Pyspark shell, therefore my goal is to use the extension with Python, I'm not able to import the Functions and I'm not able to use anything:
PYSPARK_DRIVER_PYTHON=ipython pyspark --packages it.nerdammer.bigdata:spark-hbase-connector_2.10:1.0.3 \ --conf spark.hbase.host=<HBASE_HOST> In [1]: from it.nerdammer.spark.hbase import * --------------------------------------------------------------------------- ImportError Traceback (most recent call last) <ipython-input-1-37dd5a5ffba0> in <module>() ----> 1 from it.nerdammer.spark.hbase import * ImportError: No module named it.nerdammer.spark.hbase
I have tried different combinations of environment variables, parameters, etc when launching Pyspark, but to no avail.
Maybe I'm just trying to do something deeply wrong here, maybe it's simply the fact that there is no Python API access to this Library. In a matter of fact, the examples on the Package's home page are all in Scala (but they say you can install the Package in Pyspark too, with the classic "--package" parameter).
Can anybody help out with the "ImportError: No module named it.nerdammer.spark.hbase" error message?
Thanks for any insight
Created 07-27-2016 06:37 AM
Created 07-27-2016 06:37 AM
Created 07-28-2016 03:11 AM
Thanks. Seems a good alternative, and in a matter of fact I was not aware of its availability in CDH 5.7
Marking the thread as solved, even if by now I don't know yet if all the features I'd need will be there in the native hbase-spark connector