Support Questions
Find answers, ask questions, and share your expertise

Performance of Joining big Hbase table on rowkey

Performance of Joining big Hbase table on rowkey

Expert Contributor

I have read in many places that Hbase does not perform well for joins but has good performance when performed a random read/write. My question is would it still give good performance if there is a bulk scan of Hbase table using full rowkey (like say scanning 30% of table where the scanned rowkeys are random and distributed in nature and not query just a few regions of the table)

Consider a Hbase table whose regions are equally distributed across many region servers. If a external table is created for such a table in Hive and this external table is joined with another Hive managed table based on Rowkey from Hbase table, would huge number of rowkey scans during the join on Hbase table be a performance bottleneck in this scenario?

If so, could you please explain why?



Re: Performance of Joining big Hbase table on rowkey


I can't speak for performance hit in this scenario but considering other workloads hitting HBase at the same time it might still be an issue. There's a great feature available where you map a hive schema to an HBase snapshot which promises a lot better performance than hitting HBase directly. Please take a look at for an example. Essentially, you map hive external table to an HBase snapshot, run your analysis and then remove snapshot. This bypasses HBase RS all together and uses MR instead.