Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Failed to connect Hbase to Pyspark

Failed to connect Hbase to Pyspark

Explorer

I tried to use this jar " spark-hbase-connector-2.10-0.9.jar " but It works only for scala, and I need to make it work for pyspark. So please what are the structure of the commands I should use to make it work for pyspark.

3 REPLIES 3
Highlighted

Re: Failed to connect Hbase to Pyspark

Hi @Meryem Moumen,

This Apache Hbase Connector seems to work with pyspark. Example code from here:

from pyspark import SparkContext
from pyspark.sql import SQLContext

sc = SparkContext()
sqlc = SQLContext(sc)

data_source_format = 'org.apache.hadoop.hbase.spark'

df = sc.parallelize([('a', '1.0'), ('b', '2.0')]).toDF(schema=['col0', 'col1'])

# ''.join(string.split()) in order to write a multi-line JSON string here.
catalog = ''.join("""{
    "table":{"namespace":"default", "name":"testtable"},
    "rowkey":"key",
    "columns":{
        "col0":{"cf":"rowkey", "col":"key", "type":"string"},
        "col1":{"cf":"cf", "col":"col1", "type":"string"}
    }
}""".split())


# Writing
df.write\
.options(catalog=catalog)\  # alternatively: .option('catalog', catalog)
.format(data_source_format)\
.save()

# Reading
df = sqlc.read\
.options(catalog=catalog)\
.format(data_source_format)\
.load()
Highlighted

Re: Failed to connect Hbase to Pyspark

Explorer

but we need a connector to do this :

'org.apache.hadoop.hbase.spark'

how can we install it ?

Highlighted

Re: Failed to connect Hbase to Pyspark

Hi @Meryem Moumen, from the documentation:
Don't have an account?
Coming from Hortonworks? Activate your account here