Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Loading HBase from Hive ORC Tables

Solved Go to solution

Loading HBase from Hive ORC Tables

Contributor

Looking for approaches for loading HBase tables if all I have is the data in an ORC backed Hive table.

I would prefer a bulk load approach, given there are several hundred million rows in the ORC backed Hive table.

I found the following, anyone have experience with Hive's HBase bulk load feature? Would it be better to create a CSV table and CTAS from ORC into the CSV table, and then use ImportTsv on the HBase side?

HiveHBaseBulkLoad

Any experiences here would be appreciated.

1 ACCEPTED SOLUTION

Accepted Solutions

Re: Loading HBase from Hive ORC Tables

Hey

You can Bulk load into Hbase in several different manners.The importTsv tool has been out there for a while. However if your data is in ORC with a HIve table on top the Hive bulk load is an easier option with less moving parts.

This slide from nick has a lot of info http://fr.slideshare.net/HBaseCon/ecosystem-session-3a, slide 12 is the one you want to look at.

Essentially

set hive.hbase.generatehfiles=true

set hfile.family.path=/tmp/somewhere (this can also be a property)

this allows you to do insert into with the result of a sql statement a little more agile then having to go down the csv way. Careful the Hbase user will be picking up the generated files.

5 REPLIES 5

Re: Loading HBase from Hive ORC Tables

Hey

You can Bulk load into Hbase in several different manners.The importTsv tool has been out there for a while. However if your data is in ORC with a HIve table on top the Hive bulk load is an easier option with less moving parts.

This slide from nick has a lot of info http://fr.slideshare.net/HBaseCon/ecosystem-session-3a, slide 12 is the one you want to look at.

Essentially

set hive.hbase.generatehfiles=true

set hfile.family.path=/tmp/somewhere (this can also be a property)

this allows you to do insert into with the result of a sql statement a little more agile then having to go down the csv way. Careful the Hbase user will be picking up the generated files.

Re: Loading HBase from Hive ORC Tables

Contributor

While I've yet to use this on the large table, it worked very well on a small sample. There were some gotchas that aren't explicitly called out anywhere. I will put together a guide and post it to AH, and link it back here when ready.

I've scripted out an example of using this feature here:

https://github.com/sakserv/hive-hbase-generatehfiles

Thanks!

Re: Loading HBase from Hive ORC Tables

Contributor

Demo article has been added here:

creating-hbase-hfiles-from-an-existing-hive-table

Re: Loading HBase from Hive ORC Tables

@Randy Gelhausen recently was able to get this to work after messing with classpath:
HADOOP_CLASSPATH=/usr/hdp/current/hbase-client/lib/hbase-protocol.jar:/etc/hbase/conf hadoop jar /usr/hdp/current/phoenix-client/phoenix-client.jar org.apache.phoenix.mapreduce.CsvBulkLoadTool --table test --input /user/root/test --zookeeper localhost:2181:/hbase-unsecure

Re: Loading HBase from Hive ORC Tables

Contributor

This shows promise as well. I plan to give this a try soon. However, the accepted answer avoids needing to go from ORC back to Csv, so it gets the win. :)

Don't have an account?
Coming from Hortonworks? Activate your account here