Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Hive External table for Hbase can we create in a compressed way?

avatar
Rising Star
 
1 ACCEPTED SOLUTION

avatar
Master Guru

Hi @Gayathri Devi

You don't have to mention any compression format property in create Hive table statement.

Because hive is just pointing to HBase table, if HBase table is compressed then Hive automatically picks up the compression format by default.

Just create table statement without compression formats property like below,

CREATE EXTERNAL TABLE tablename(hbid string,Mvdouble, COUNTRY string) ROW FORMAT SERDE 'org.apache.hadoop.hive.hbase.HBaseSerDe' STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' WITH SERDEPROPERTIES ("hbase.columns.mapping"=":key,RAW:Mv,RAW:COUNTRY") TBLPROPERTIES ("hbase.table.name"="tblname");

Example:-

i have created a HBase table with snappy compression and i put 3 records to it then scanned the table.

hbase(main)#create 'tbl_snp', { NAME => 'cf', COMPRESSION => 'SNAPPY' }
hbase(main)#put 'tbl_snp','1','cf:name','hcc'
hbase(main)#put 'tbl_snp','2','cf:name','hdp'
hbase(main)#put 'tbl_snp','3','cf:name','hdf'
hbase(main)#scan 'tbl_snp' 
ROW                                               COLUMN+CELL
 1                                                column=cf:name, timestamp=1507641820083, value=hcc
 2                                                column=cf:name, timestamp=1507641848288, value=hdp
 3                                                column=cf:name, timestamp=1507641855165, value=hdf
3 row(s) in 0.0190 seconds

Then i have created Hive table without compression property in the statement on top of HBase tbl_snp table

Create Table Statement:-

create external table default.tbl_snp(id int, name string)
STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' WITH SERDEPROPERTIES 
("hbase.columns.mapping"=":key,cf:name") TBLPROPERTIES ("hbase.table.name"="tbl_snp");
select * from default.tbl_snp;
+-------------+---------------+--+
| tbl_snp.id  | tbl_snp.name  |
+-------------+---------------+--+
| 1           | hcc           |
| 2           | hdp           |
| 3           | hdf           |
+-------------+---------------+--+
3 rows selected (0.876 seconds)

i did select from Hive table and we got all the records that existed in the HBase table, as i have created Hive table without compression property.

View solution in original post

1 REPLY 1

avatar
Master Guru

Hi @Gayathri Devi

You don't have to mention any compression format property in create Hive table statement.

Because hive is just pointing to HBase table, if HBase table is compressed then Hive automatically picks up the compression format by default.

Just create table statement without compression formats property like below,

CREATE EXTERNAL TABLE tablename(hbid string,Mvdouble, COUNTRY string) ROW FORMAT SERDE 'org.apache.hadoop.hive.hbase.HBaseSerDe' STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' WITH SERDEPROPERTIES ("hbase.columns.mapping"=":key,RAW:Mv,RAW:COUNTRY") TBLPROPERTIES ("hbase.table.name"="tblname");

Example:-

i have created a HBase table with snappy compression and i put 3 records to it then scanned the table.

hbase(main)#create 'tbl_snp', { NAME => 'cf', COMPRESSION => 'SNAPPY' }
hbase(main)#put 'tbl_snp','1','cf:name','hcc'
hbase(main)#put 'tbl_snp','2','cf:name','hdp'
hbase(main)#put 'tbl_snp','3','cf:name','hdf'
hbase(main)#scan 'tbl_snp' 
ROW                                               COLUMN+CELL
 1                                                column=cf:name, timestamp=1507641820083, value=hcc
 2                                                column=cf:name, timestamp=1507641848288, value=hdp
 3                                                column=cf:name, timestamp=1507641855165, value=hdf
3 row(s) in 0.0190 seconds

Then i have created Hive table without compression property in the statement on top of HBase tbl_snp table

Create Table Statement:-

create external table default.tbl_snp(id int, name string)
STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' WITH SERDEPROPERTIES 
("hbase.columns.mapping"=":key,cf:name") TBLPROPERTIES ("hbase.table.name"="tbl_snp");
select * from default.tbl_snp;
+-------------+---------------+--+
| tbl_snp.id  | tbl_snp.name  |
+-------------+---------------+--+
| 1           | hcc           |
| 2           | hdp           |
| 3           | hdf           |
+-------------+---------------+--+
3 rows selected (0.876 seconds)

i did select from Hive table and we got all the records that existed in the HBase table, as i have created Hive table without compression property.