Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Hive HBase snapshot query result can't insert into new table

Hive HBase snapshot query result can't insert into new table

New Contributor

Hi, we are using CDH 5.16.1 ,hbase-1.2.0,hive 1.1.0  .

I'm trying to load hbase data into hive, but i got an issue when i use hbase snapshot, it only load the row key into hive table. But if i donn't use the snapshot, HbaseStorageHandler can successfully load data into new hive table.

Here are steps i took:

 

first i have created a hbase table:

 

create 'user_data_table','personal_data','professional_data';

 

Then I inserted few records into the table as

put 'user_data_table','user1','personal_data:Location','IL'
put 'user_data_table','user1','personal_data:FName','Deb'
put 'user_data_table','user1','personal_data:LName','D'
put 'user_data_table','user1','professional_data:dept','IT'
put 'user_data_table','user1','professional_data:salary','2000'

put 'user_data_table','user2','personal_data:FName','CH'
put 'user_data_table','user2','personal_data:LName','AK'
put 'user_data_table','user2','professional_data:dept','IT'
put 'user_data_table','user2','professional_data:salary','80000'

 

Then i specify the  snapshot.name to make Hive query Hbase bypasses HBase servers, and directly accesses the underlying files.

set hive.hbase.snapshot.name=snapshot-day-1;
set hive.hbase.snapshot.restoredir=/tmp;

CREATE EXTERNAL TABLE if not exists default.hbase_user_data_snapshot1_table1(key string, Location string,FName string,LName string, dept string,salary string) 
STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,personal_data:Location,personal_data:FName,personal_data:LName,professional_data:dept,professional_data:salary")
TBLPROPERTIES ("hbase.table.name" = "user_data_table");

 

 

when i try to load the query result into hive table as below, the result are  weirdly,all the value of fields are NULL ,except the row key.

 

 

#show hbase table data
hive> select * from default.hbase_user_data_snapshot1_table;
OK
user1	IL	Deb	D	IT	2000
user2	NULL	CH	AK	IT	80000
Time taken: 0.199 seconds, Fetched: 2 row(s)

#load the query result into new table
hive> create table test as select * from default.hbase_user_data_snapshot1_table;

.... OK

#but all the value of  fields are NULL ,expect the row key.
hive> select * from test;
OK
user1	NULL		NULL	NULL	NULL
user2	NULL		NULL	NULL	NULL

 

 

well i also try to load data without the param " set hive.hbase.snapshot.name ", now it insert query result into table as i expected. (but in this way ,it was very slow, because HBaseStorageHandler query the hbase with full scan with hbase server api )

CREATE EXTERNAL TABLE if not exists default.hbase_user_data_snapshot1_table1(key string, Location string,FName string,LName string, dept string,salary string) 
STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,personal_data:Location,personal_data:FName,personal_data:LName,professional_data:dept,professional_data:salary")
TBLPROPERTIES ("hbase.table.name" = "user_data_table");

 create table test1 as select * from default.hbase_user_data_snapshot1_table;

select * from test1;
OK
user1	IL	Deb	D	IT	2000
user2	NULL	CH	AK	IT	80000

the snapshot feature link:

https://issues.apache.org/jira/browse/HBASE-8369

 

Is there anybody can help me? 

 

1 REPLY 1
Highlighted

Re: Hive HBase snapshot query result can't insert into new table

New Contributor
i also tried to specify the hive.hbase.snapshot.name in TBLPROPERTIES,but the hive query request the Hbase region server API instead of access the underline file directly. I looked the HbaseStorageHandler source code, it choose whether or not use the HbaseSnapshot by judging the hive conf property (HiveConf.ConfVars.HIVE_HBASE_SNAPSHOT_NAME)

So if you specify the hive.hbase.snapshot.name in any other place ,it will not available for hive to query hbase with snapshot. And i also changed all the hdfs dir permission to 777,but it still couldn't insert data into hive table,except i didn't use the snapshot. It was weirdly I checked the DEBUG log ,but there is not any error in it.

I'm wandering is this a bug about that feature?