Member since
03-24-2019
4
Posts
0
Kudos Received
0
Solutions
04-01-2019
01:58 AM
Is there anybody can help me?
... View more
03-27-2019
09:41 PM
The target is i want to accelerate the speed of save some data from Hbase to hive. By default I used the HbaseStorageHandler, but it's very slow because it used full scan by request Hbase region server. So i tried to query Hbase bypass the region server, and directly access the data from underline file.
It was at least 6 times quicker than before when i query Hbase with snapshot. But, i can't write the data into hive table, all the fields were null except the row key.
drop table if exists test;
create table test as select * from default.hbase_user_data_snapshot1_table;
However if i don't use the snapshot, HbaseStorageHandler can successfully load data into new hive table.
Here are steps i took:
First i have created a Hbase table:
create 'user_data_table','personal_data','professional_data';
Then I inserted few records into the table as
put 'user_data_table','user1','personal_data:Location','IL'
put 'user_data_table','user1','personal_data:FName','Deb'
put 'user_data_table','user1','personal_data:LName','D'
put 'user_data_table','user1','professional_data:dept','IT'
put 'user_data_table','user1','professional_data:salary','2000'
put 'user_data_table','user2','personal_data:FName','CH'
put 'user_data_table','user2','personal_data:LName','AK'
put 'user_data_table','user2','professional_data:dept','IT'
put 'user_data_table','user2','professional_data:salary','80000'
Then i specify the snapshot.name to make Hive query Hbase bypasses Hbase server, and directly accesses the underlying files.
set hive.hbase.snapshot.name=snapshot-day-1;
set hive.hbase.snapshot.restoredir=/tmp;
CREATE EXTERNAL TABLE if not exists default.hbase_user_data_snapshot1_table1(key string, Location string,FName string,LName string, dept string, salary string)
STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,personal_data:Location,personal_data:FName,personal_data:LName,professional_data:dept,professional_data:salary")
TBLPROPERTIES ("hbase.table.name" = "user_data_table");
Finally, when i try to load the query result into hive table as below, the result are weirdly, all the value of fields were NULL ,except the row key.
#show hbase table data
hive> select * from default.hbase_user_data_snapshot1_table;
OK
user1 IL Deb D IT 2000
user2 NULL CH AK IT 80000
Time taken: 0.199 seconds, Fetched: 2 row(s)
#load the query result into new table
hive> create table test as select * from default.hbase_user_data_snapshot1_table;
.... OK
#but all the value of fields are NULL ,expect the row key.
hive> select * from test;
OK
user1 NULL NULL NULL NULL
user2 NULL NULL NULL NULL
And i also try to load data without the param " set hive.hbase.snapshot.name ", now it insert query result into table as i expected. (but in this way ,it was very slow, because HBaseStorageHandler query the hbase with full scan with hbase server api )
CREATE EXTERNAL TABLE if not exists default.hbase_user_data_snapshot1_table1(key string, Location string,FName string,LName string, dept string,salary string)
STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,personal_data:Location,personal_data:FName,personal_data:LName,professional_data:dept,professional_data:salary")
TBLPROPERTIES ("hbase.table.name" = "user_data_table");
create table test1 as select * from default.hbase_user_data_snapshot1_table;
select * from test1;
OK
user1 IL Deb D IT 2000
user2 NULL CH AK IT 80000
the snapshot feature link:
I also tried to specify the hive.hbase.snapshot.name in TBLPROPERTIES as TBLPROPERTIES ("hive.hbase.snapshot.name"="snapshot-day-1") instead of hbase.table.name in TBLPROPERTIES ,but the hive query request the Hbase region server API instead of access the underline file directly. I looked the HbaseStorageHandler source code, it choose whether or not use the HbaseSnapshot by judging the hive conf property (HiveConf.ConfVars.HIVE_HBASE_SNAPSHOT_NAME)
So if specify the hive.hbase.snapshot.name in any other place ,it will not available for hive to query hbase with snapshot.
And i also changed all the hdfs dir permission to 777,but it still couldn't insert data into hive table,except i didn't use the snapshot. It was weirdly I checked the DEBUG log ,but there is not any error in it.
I'm wandering maybe this is a bug about that feature.
(the snapshot feature link:https://issues.apache.org/jira/browse/HBASE-8369)
Anybody can help me ?
... View more
Labels:
- Labels:
-
Apache Hadoop
-
Apache HBase
-
Apache Hive
03-26-2019
06:45 PM
i also tried to specify the hive.hbase.snapshot.name in TBLPROPERTIES,but the hive query request the Hbase region server API instead of access the underline file directly. I looked the HbaseStorageHandler source code, it choose whether or not use the HbaseSnapshot by judging the hive conf property (HiveConf.ConfVars.HIVE_HBASE_SNAPSHOT_NAME) So if you specify the hive.hbase.snapshot.name in any other place ,it will not available for hive to query hbase with snapshot. And i also changed all the hdfs dir permission to 777,but it still couldn't insert data into hive table,except i didn't use the snapshot. It was weirdly I checked the DEBUG log ,but there is not any error in it. I'm wandering is this a bug about that feature?
... View more
03-24-2019
11:07 PM
Hi, we are using CDH 5.16.1 ,hbase-1.2.0,hive 1.1.0 . I'm trying to load hbase data into hive, but i got an issue when i use hbase snapshot, it only load the row key into hive table. But if i donn't use the snapshot, HbaseStorageHandler can successfully load data into new hive table. Here are steps i took: first i have created a hbase table: create 'user_data_table','personal_data','professional_data'; Then I inserted few records into the table as put 'user_data_table','user1','personal_data:Location','IL'
put 'user_data_table','user1','personal_data:FName','Deb'
put 'user_data_table','user1','personal_data:LName','D'
put 'user_data_table','user1','professional_data:dept','IT'
put 'user_data_table','user1','professional_data:salary','2000'
put 'user_data_table','user2','personal_data:FName','CH'
put 'user_data_table','user2','personal_data:LName','AK'
put 'user_data_table','user2','professional_data:dept','IT'
put 'user_data_table','user2','professional_data:salary','80000' Then i specify the snapshot.name to make Hive query Hbase bypasses HBase servers, and directly accesses the underlying files. set hive.hbase.snapshot.name=snapshot-day-1;
set hive.hbase.snapshot.restoredir=/tmp;
CREATE EXTERNAL TABLE if not exists default.hbase_user_data_snapshot1_table1(key string, Location string,FName string,LName string, dept string,salary string)
STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,personal_data:Location,personal_data:FName,personal_data:LName,professional_data:dept,professional_data:salary")
TBLPROPERTIES ("hbase.table.name" = "user_data_table"); when i try to load the query result into hive table as below, the result are weirdly,all the value of fields are NULL ,except the row key. #show hbase table data
hive> select * from default.hbase_user_data_snapshot1_table;
OK
user1 IL Deb D IT 2000
user2 NULL CH AK IT 80000
Time taken: 0.199 seconds, Fetched: 2 row(s)
#load the query result into new table
hive> create table test as select * from default.hbase_user_data_snapshot1_table;
.... OK
#but all the value of fields are NULL ,expect the row key.
hive> select * from test;
OK
user1 NULL NULL NULL NULL
user2 NULL NULL NULL NULL well i also try to load data without the param " set hive.hbase.snapshot.name ", now it insert query result into table as i expected. (but in this way ,it was very slow, because HBaseStorageHandler query the hbase with full scan with hbase server api ) CREATE EXTERNAL TABLE if not exists default.hbase_user_data_snapshot1_table1(key string, Location string,FName string,LName string, dept string,salary string)
STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,personal_data:Location,personal_data:FName,personal_data:LName,professional_data:dept,professional_data:salary")
TBLPROPERTIES ("hbase.table.name" = "user_data_table");
create table test1 as select * from default.hbase_user_data_snapshot1_table;
select * from test1;
OK
user1 IL Deb D IT 2000
user2 NULL CH AK IT 80000 the snapshot feature link: https://issues.apache.org/jira/browse/HBASE-8369 Is there anybody can help me?
... View more
- Tags:
- snapshot