Support Questions

Find answers, ask questions, and share your expertise

What's the best practice to get data from hbase and form dataframe for Python/R?

avatar
Contributor

What's the best practice to get data from hbase and form dataframe for Python/R? If we want to use our Panda/R libraries, how to get data from hbase and form dataframe automatically?

1 ACCEPTED SOLUTION

avatar

We have an experimental Spark HBase connector, https://github.com/zhzhan/shc

With the following features

  • First class support for DataFrame API
  • JSON based catalog with rich data type support
  • Performant, scalable, enterprise-ready
  • Partition Pruning
  • Predicate Pushdown
  • Scan optimizations
  • Data Locality
  • Composite Rowkey
  • Leverage existing work in the HBase community

Please take look at the README of the above project.

Also see example https://github.com/zhzhan/shc/blob/master/src/main/scala/org/apache/spark/sql/execution/datasources/...

View solution in original post

11 REPLIES 11

avatar
Explorer

@Artem Ervits, Is there any progress on the Spark on HBase by Hortonworks. We are using the HDP platform but I am not able to easily conclude from the internet that confirms there is progress beyond the above discussion in 2016.