Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Creation of HFIles using Hive to HBase integration

Creation of HFIles using Hive to HBase integration

Contributor

Hi Guys,

I have successfully implemented the logic provided under the link https://community.hortonworks.com/articles/2745/creating-hbase-hfiles-from-an-existing-hive-table.ht....

However, this example corresponds to a HBase table with a single column family. How can I go ahead with HBase table with multiple column families using similar examples?

Need you help.

Thanks and Regards,

Rajdip

4 REPLIES 4
Highlighted

Re: Creation of HFIles using Hive to HBase integration

hey rajdjp

this link light have more info for you : http://hortonworks.com/blog/hbase-via-hive-part-1/

essentially in the serde properties you will map hive columns to hbase columns including column families: Here column families are depicted fy 'f' so cf1:c1 is colmun family : cf1 and column c1

WITH SERDEPROPERTIES (‘hbase.columns.mapping’ = ‘:key,f:c1,f:c2’)

so if you did something like this assuming the column names coincide

CREATE TABLE foo(rowkey STRING, a STRING, b STRING) STORED BY ‘org.apache.hadoop.hive.hbase.HBaseStorageHandler’ WITH SERDEPROPERTIES (‘hbase.columns.mapping’ = ‘:key,cf1:a,cf2:b’) TBLPROPERTIES (‘hbase.table.name’ = ‘bar’);

hope this helps

Re: Creation of HFIles using Hive to HBase integration

Super Collaborator

It's not possible using the scenario with completebulkload because HBase storage handler in Hive is using old HFileOutputFormat.

Re: Creation of HFIles using Hive to HBase integration

Contributor

I have done the following steps, but stuck at the last step. Need your help on that.

1) Created a csv with 5 records

2) Created an external table in Hive pointing to the HDFS directory containing CSV

3) Created a Hive-HBase table with below DDL

CREATE TABLE hbase_cdc_poc.hbase_warehouse (w_warehouse_sk int, w_warehouse_id char(16), w_warehouse_name varchar(20), w_warehouse_sq_ft int, w_street_number char(10), w_street_name varchar(60), w_street_type char(15), w_suite_number char(10), w_city varchar(60), w_county varchar(30), w_state char(2), w_zip char(10), w_country varchar(20), w_gmt_offset decimal(5,2) ) stored as INPUTFORMAT 'org.apache.hadoop.mapred.TextInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.hbase.HiveHFileOutputFormat' TBLPROPERTIES ('hfile.family.path' = '/user/tcs_ge_user/wrhs_hfiles/cf1');

4) Loaded data into above table using below and verified the count. It matched

insert overwrite table hbase_cdc_poc.hbase_warehouse select * from hbase_cdc_poc.warehouse cluster by w_warehouse_sk;

5) HFile also got created at the path /user/tcs_ge_user/wrhs_hfiles/cf1. Screenshot attached.

10706-capture.png

6) Now using completebulkload to load data in HBase. Table warehouse is created in HBase already.

yarn jar /usr/hdp/current/hbase-client/lib/hbase-server.jar completebulkload /user/tcs_ge_user/wrhs_hfiles/cf1 warehouse

But while executing this command it gives me below error. Screenshot attached. Need your URGENT help in this.

10708-capture.png


capture.png

Re: Creation of HFIles using Hive to HBase integration

Super Collaborator

@rajdip chaudhuri You should not include column family in the path for completebulkload. So, try

yarn jar /usr/hdp/current/hbase-client/lib/hbase-server.jar completebulkload /user/tcs_ge_user/wrhs_hfiles warehouse