Support Questions
Find answers, ask questions, and share your expertise

Hive HBase snapshot query result can't insert into new table

Hive HBase snapshot query result can't insert into new table

New Contributor
 

The target is i want to accelerate the speed of save some data from Hbase to hive. By default I used the HbaseStorageHandler, but it's very slow because it used full scan by request Hbase region server. So i tried to query Hbase bypass the region server, and directly access the data from underline file.

It was at least 6 times quicker than before when i query Hbase with snapshot. But, i can't write the data into hive table, all the fields were null except the row key.

drop table if exists test;
create table test as select * from default.hbase_user_data_snapshot1_table;

However if i don't use the snapshot, HbaseStorageHandler can successfully load data into new hive table.

Here are steps i took:

First i have created a Hbase table:

create 'user_data_table','personal_data','professional_data';

Then I inserted few records into the table as

put 'user_data_table','user1','personal_data:Location','IL'
put 'user_data_table','user1','personal_data:FName','Deb'
put 'user_data_table','user1','personal_data:LName','D'
put 'user_data_table','user1','professional_data:dept','IT'
put 'user_data_table','user1','professional_data:salary','2000'

put 'user_data_table','user2','personal_data:FName','CH'
put 'user_data_table','user2','personal_data:LName','AK'
put 'user_data_table','user2','professional_data:dept','IT'
put 'user_data_table','user2','professional_data:salary','80000'

Then i specify the snapshot.name to make Hive query Hbase bypasses Hbase server, and directly accesses the underlying files.

set hive.hbase.snapshot.name=snapshot-day-1;
set hive.hbase.snapshot.restoredir=/tmp;

CREATE EXTERNAL TABLE if not exists default.hbase_user_data_snapshot1_table1(key string, Location string,FName string,LName string, dept string, salary string) 
STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,personal_data:Location,personal_data:FName,personal_data:LName,professional_data:dept,professional_data:salary")
TBLPROPERTIES ("hbase.table.name" = "user_data_table");

Finally, when i try to load the query result into hive table as below, the result are weirdly, all the value of fields were NULL ,except the row key.

#show hbase table data
hive> select * from default.hbase_user_data_snapshot1_table;
OK
user1   IL  Deb D   IT  2000
user2   NULL    CH  AK  IT  80000
Time taken: 0.199 seconds, Fetched: 2 row(s)

#load the query result into new table
hive> create table test as select * from default.hbase_user_data_snapshot1_table;

.... OK

#but all the value of  fields are NULL ,expect the row key.
hive> select * from test;
OK
user1   NULL        NULL    NULL    NULL
user2   NULL        NULL    NULL    NULL

And i also try to load data without the param " set hive.hbase.snapshot.name ", now it insert query result into table as i expected. (but in this way ,it was very slow, because HBaseStorageHandler query the hbase with full scan with hbase server api )

CREATE EXTERNAL TABLE if not exists default.hbase_user_data_snapshot1_table1(key string, Location string,FName string,LName string, dept string,salary string) 
STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,personal_data:Location,personal_data:FName,personal_data:LName,professional_data:dept,professional_data:salary")
TBLPROPERTIES ("hbase.table.name" = "user_data_table");

 create table test1 as select * from default.hbase_user_data_snapshot1_table;

select * from test1;
OK
user1   IL  Deb D   IT  2000
user2   NULL    CH  AK  IT  80000
the snapshot feature link:

 

I also tried to specify the hive.hbase.snapshot.name in TBLPROPERTIES as TBLPROPERTIES ("hive.hbase.snapshot.name"="snapshot-day-1") instead of hbase.table.name in TBLPROPERTIES ,but the hive query request the Hbase region server API instead of access the underline file directly. I looked the HbaseStorageHandler source code, it choose whether or not use the HbaseSnapshot by judging the hive conf property (HiveConf.ConfVars.HIVE_HBASE_SNAPSHOT_NAME)

So if specify the hive.hbase.snapshot.name in any other place ,it will not available for hive to query hbase with snapshot.

And i also changed all the hdfs dir permission to 777,but it still couldn't insert data into hive table,except i didn't use the snapshot. It was weirdly I checked the DEBUG log ,but there is not any error in it.

 

I'm wandering maybe this is a bug about that feature.

(the snapshot feature link:https://issues.apache.org/jira/browse/HBASE-8369)

Anybody can help me ?

3 REPLIES 3

Re: Hive HBase snapshot query result can't insert into new table

New Contributor
Is there anybody can help me?

Re: Hive HBase snapshot query result can't insert into new table

New Contributor

@yaotian 

Were you able to figure out alternative way of loading the HBase snapshot in Hive instead of using the HBaseStorageHandler?

Re: Hive HBase snapshot query result can't insert into new table

New Contributor

Hello! Encountered the same problem, have you solved it? Thank you