The first step in connecting to HBase is defining how Spark should interpret each row of data found in HBase. This is done by defining a catalog to use with the connector
val df = sqlContext
.read
.options(Map(HBaseTableCatalog.tableCatalog->catalog.toString))
.format("org.apache.spark.sql.execution.datasources.hbase")
.load()
Since the available data formats for the catalog ( I believe these are limited to Avro datatypes ) you will almost always need to cast something into a more meaningful datatype. Also you can see here that I am using "split" to break the composite row key into its two parts
Hi @wsalazar , Thanks for the nice explanation. I was wondering how you do it with composite keys ? We have spent sometime exploring phoenix encoder. Using this, data insertion is good but while reading and doing the range scan it somehow is very slow. Seems only the first part of the composite key is used for filter and rest of the key is not taken into account.