Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Loading HBase from Hive ORC Tables

avatar
Rising Star

Looking for approaches for loading HBase tables if all I have is the data in an ORC backed Hive table.

I would prefer a bulk load approach, given there are several hundred million rows in the ORC backed Hive table.

I found the following, anyone have experience with Hive's HBase bulk load feature? Would it be better to create a CSV table and CTAS from ORC into the CSV table, and then use ImportTsv on the HBase side?

HiveHBaseBulkLoad

Any experiences here would be appreciated.

1 ACCEPTED SOLUTION

avatar

Hey

You can Bulk load into Hbase in several different manners.The importTsv tool has been out there for a while. However if your data is in ORC with a HIve table on top the Hive bulk load is an easier option with less moving parts.

This slide from nick has a lot of info http://fr.slideshare.net/HBaseCon/ecosystem-session-3a, slide 12 is the one you want to look at.

Essentially

set hive.hbase.generatehfiles=true

set hfile.family.path=/tmp/somewhere (this can also be a property)

this allows you to do insert into with the result of a sql statement a little more agile then having to go down the csv way. Careful the Hbase user will be picking up the generated files.

View solution in original post

5 REPLIES 5

avatar

Hey

You can Bulk load into Hbase in several different manners.The importTsv tool has been out there for a while. However if your data is in ORC with a HIve table on top the Hive bulk load is an easier option with less moving parts.

This slide from nick has a lot of info http://fr.slideshare.net/HBaseCon/ecosystem-session-3a, slide 12 is the one you want to look at.

Essentially

set hive.hbase.generatehfiles=true

set hfile.family.path=/tmp/somewhere (this can also be a property)

this allows you to do insert into with the result of a sql statement a little more agile then having to go down the csv way. Careful the Hbase user will be picking up the generated files.

avatar
Rising Star

While I've yet to use this on the large table, it worked very well on a small sample. There were some gotchas that aren't explicitly called out anywhere. I will put together a guide and post it to AH, and link it back here when ready.

I've scripted out an example of using this feature here:

https://github.com/sakserv/hive-hbase-generatehfiles

Thanks!

avatar
Rising Star

Demo article has been added here:

creating-hbase-hfiles-from-an-existing-hive-table

avatar
@Randy Gelhausen recently was able to get this to work after messing with classpath:
HADOOP_CLASSPATH=/usr/hdp/current/hbase-client/lib/hbase-protocol.jar:/etc/hbase/conf hadoop jar /usr/hdp/current/phoenix-client/phoenix-client.jar org.apache.phoenix.mapreduce.CsvBulkLoadTool --table test --input /user/root/test --zookeeper localhost:2181:/hbase-unsecure

avatar
Rising Star

This shows promise as well. I plan to give this a try soon. However, the accepted answer avoids needing to go from ORC back to Csv, so it gets the win. 🙂