Support Questions
Find answers, ask questions, and share your expertise

Failed to connect Hbase to Pyspark

Highlighted

Failed to connect Hbase to Pyspark

Explorer

I tried to use this jar " spark-hbase-connector-2.10-0.9.jar " but It works only for scala, and I need to make it work for pyspark. So please what are the structure of the commands I should use to make it work for pyspark.

3 REPLIES 3
Highlighted

Re: Failed to connect Hbase to Pyspark

Hi @Meryem Moumen,

This Apache Hbase Connector seems to work with pyspark. Example code from here:

from pyspark import SparkContext
from pyspark.sql import SQLContext

sc = SparkContext()
sqlc = SQLContext(sc)

data_source_format = 'org.apache.hadoop.hbase.spark'

df = sc.parallelize([('a', '1.0'), ('b', '2.0')]).toDF(schema=['col0', 'col1'])

# ''.join(string.split()) in order to write a multi-line JSON string here.
catalog = ''.join("""{
    "table":{"namespace":"default", "name":"testtable"},
    "rowkey":"key",
    "columns":{
        "col0":{"cf":"rowkey", "col":"key", "type":"string"},
        "col1":{"cf":"cf", "col":"col1", "type":"string"}
    }
}""".split())


# Writing
df.write\
.options(catalog=catalog)\  # alternatively: .option('catalog', catalog)
.format(data_source_format)\
.save()

# Reading
df = sqlc.read\
.options(catalog=catalog)\
.format(data_source_format)\
.load()

Re: Failed to connect Hbase to Pyspark

Explorer

but we need a connector to do this :

'org.apache.hadoop.hbase.spark'

how can we install it ?

Highlighted

Re: Failed to connect Hbase to Pyspark

Hi @Meryem Moumen, from the documentation:
Don't have an account?