Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Predicate pushdown support in HortonWorks Hbase connector

Highlighted

Predicate pushdown support in HortonWorks Hbase connector

New Contributor

We are using HortonWorks Hbase connector for connecting to Hbase database from Spark. We are using couple of columns in the where clause when we load hbase table, we don't want to load the entire table and filter on top that, what we are looking is predicate pushdown support at the time of loading the hbase tables.

HDP version is 2.4.2, Spark is 1.6.1.

Sample Code:

val statusList = List("status1","status2","status3","status4")

val df = sqlContext.read
      .options(Map(HBaseTableCatalog.tableCatalog -> catalog))
      .format("org.apache.spark.sql.execution.datasources.hbase")
      .load()
val filteredDF = df.where((col("txn_date") === LocalDate.now().toString() || col("status").isin(statusList: _*)))

Columns used in the where clause is not part of Row Key, i would like to know how the Predicate Pushdown is working in this connector?

What is the correct way of using where clause in this above scenario?

Don't have an account?
Coming from Hortonworks? Activate your account here