Created 05-09-2017 09:57 AM
I tried to use this jar " spark-hbase-connector-2.10-0.9.jar " but It works only for scala, and I need to make it work for pyspark. So please what are the structure of the commands I should use to make it work for pyspark.
Created 05-09-2017 12:53 PM
Hi @Meryem Moumen,
This Apache Hbase Connector seems to work with pyspark. Example code from here:
from pyspark import SparkContext from pyspark.sql import SQLContext sc = SparkContext() sqlc = SQLContext(sc) data_source_format = 'org.apache.hadoop.hbase.spark' df = sc.parallelize([('a', '1.0'), ('b', '2.0')]).toDF(schema=['col0', 'col1']) # ''.join(string.split()) in order to write a multi-line JSON string here. catalog = ''.join("""{ "table":{"namespace":"default", "name":"testtable"}, "rowkey":"key", "columns":{ "col0":{"cf":"rowkey", "col":"key", "type":"string"}, "col1":{"cf":"cf", "col":"col1", "type":"string"} } }""".split()) # Writing df.write\ .options(catalog=catalog)\ # alternatively: .option('catalog', catalog) .format(data_source_format)\ .save() # Reading df = sqlc.read\ .options(catalog=catalog)\ .format(data_source_format)\ .load()
Created 05-09-2017 03:03 PM
but we need a connector to do this :
'org.apache.hadoop.hbase.spark'
how can we install it ?
Created 05-09-2017 04:27 PM