Support Questions

Find answers, ask questions, and share your expertise

Failed to connect Hbase to Pyspark


I tried to use this jar " spark-hbase-connector-2.10-0.9.jar " but It works only for scala, and I need to make it work for pyspark. So please what are the structure of the commands I should use to make it work for pyspark.


Hi @Meryem Moumen,

This Apache Hbase Connector seems to work with pyspark. Example code from here:

from pyspark import SparkContext
from pyspark.sql import SQLContext

sc = SparkContext()
sqlc = SQLContext(sc)

data_source_format = 'org.apache.hadoop.hbase.spark'

df = sc.parallelize([('a', '1.0'), ('b', '2.0')]).toDF(schema=['col0', 'col1'])

# ''.join(string.split()) in order to write a multi-line JSON string here.
catalog = ''.join("""{
    "table":{"namespace":"default", "name":"testtable"},
        "col0":{"cf":"rowkey", "col":"key", "type":"string"},
        "col1":{"cf":"cf", "col":"col1", "type":"string"}

# Writing
.options(catalog=catalog)\  # alternatively: .option('catalog', catalog)

# Reading
df =\


but we need a connector to do this :


how can we install it ?

Hi @Meryem Moumen, from the documentation: