Support Questions
Find answers, ask questions, and share your expertise

Hive to HBase Data Migration Missing Data

New Contributor

I have loaded data from Hive to HBase. 

Hive source record has 3000 rows but after loading in Hbase . The HBase table has only 1200 records. 

I'm  not understanding the reason for it. Can anyone explain please.

CREATE TABLE events_Hbase(
src_util_id int,
event_log_id bigint,
event_id int,
event_text string,
partition_date date,
load_date date)
STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,cf:event_log_id ,cf:event_id ,
cf:event_text ,
cf:partition_date ,
cf:load_date")
TBLPROPERTIES ("hbase.table.name" = "HbaseEvents");


INSERT INTO TABLE events_Hbase select * from Meterevents;

select count(*) from Meterevents --3000(hive source table)

select count(*) from events_Hbase ---1200( hbase table)

 

Can someone please explain .

 

1 REPLY 1

Super Collaborator

Hello @Madhureddy 

 

Thanks for using Cloudera Community. Based on the post, Table "Meterevents" was loaded with 3K records & an Insert Select Operation was performed against "events_Hbase" from "Meterevents" table. The "events_Hbase" table is showing 1200 records. 

 

We wish to check upon the following details:

1. Connect to HBase Shell & confirm the count of "HbaseEvents" table,

2. If the count of "HbaseEvents" table is 1200, Check for the Uniqueness of the 1st Column being used as ":key" while loading the Table. It's likely the RowKey is being repeated, causing an updated Version being utilised, thereby reducing the row-count. 

3. Your team can check upon the above by creating 2 Tables & insert 10 unique rows (By RowKey Column) into 1 Table with 10 rows (Having, 5 Unique RowKey Values) into the 2nd Table. Next, Create 2 Hive Table using HBaseStorageHandler & perform the Insert Select SQL. Then, Check the Row Count. 

 

- Smarak

; ;