Support Questions

Find answers, ask questions, and share your expertise
Announcements
Welcome to the upgraded Community! Read this blog to see What’s New!

Hive to HBase Data Migration Missing Data

avatar
New Contributor

I have loaded data from Hive to HBase. 

Hive source record has 3000 rows but after loading in Hbase . The HBase table has only 1200 records. 

I'm  not understanding the reason for it. Can anyone explain please.

CREATE TABLE events_Hbase(
src_util_id int,
event_log_id bigint,
event_id int,
event_text string,
partition_date date,
load_date date)
STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,cf:event_log_id ,cf:event_id ,
cf:event_text ,
cf:partition_date ,
cf:load_date")
TBLPROPERTIES ("hbase.table.name" = "HbaseEvents");


INSERT INTO TABLE events_Hbase select * from Meterevents;

select count(*) from Meterevents --3000(hive source table)

select count(*) from events_Hbase ---1200( hbase table)

 

Can someone please explain .

 

1 REPLY 1

avatar
Super Collaborator

Hello @Madhureddy 

 

Thanks for using Cloudera Community. Based on the post, Table "Meterevents" was loaded with 3K records & an Insert Select Operation was performed against "events_Hbase" from "Meterevents" table. The "events_Hbase" table is showing 1200 records. 

 

We wish to check upon the following details:

1. Connect to HBase Shell & confirm the count of "HbaseEvents" table,

2. If the count of "HbaseEvents" table is 1200, Check for the Uniqueness of the 1st Column being used as ":key" while loading the Table. It's likely the RowKey is being repeated, causing an updated Version being utilised, thereby reducing the row-count. 

3. Your team can check upon the above by creating 2 Tables & insert 10 unique rows (By RowKey Column) into 1 Table with 10 rows (Having, 5 Unique RowKey Values) into the 2nd Table. Next, Create 2 Hive Table using HBaseStorageHandler & perform the Insert Select SQL. Then, Check the Row Count. 

 

- Smarak

Labels