- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
Hive External table for Hbase can we create in a compressed way?
- Labels:
-
Apache HBase
-
Apache Hive
Created 10-10-2017 09:00 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Created 10-10-2017 01:38 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
You don't have to mention any compression format property in create Hive table statement.
Because hive is just pointing to HBase table, if HBase table is compressed then Hive automatically picks up the compression format by default.
Just create table statement without compression formats property like below,
CREATE EXTERNAL TABLE tablename(hbid string,Mvdouble, COUNTRY string) ROW FORMAT SERDE 'org.apache.hadoop.hive.hbase.HBaseSerDe' STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' WITH SERDEPROPERTIES ("hbase.columns.mapping"=":key,RAW:Mv,RAW:COUNTRY") TBLPROPERTIES ("hbase.table.name"="tblname");
Example:-
i have created a HBase table with snappy compression and i put 3 records to it then scanned the table.
hbase(main)#create 'tbl_snp', { NAME => 'cf', COMPRESSION => 'SNAPPY' } hbase(main)#put 'tbl_snp','1','cf:name','hcc' hbase(main)#put 'tbl_snp','2','cf:name','hdp' hbase(main)#put 'tbl_snp','3','cf:name','hdf' hbase(main)#scan 'tbl_snp' ROW COLUMN+CELL 1 column=cf:name, timestamp=1507641820083, value=hcc 2 column=cf:name, timestamp=1507641848288, value=hdp 3 column=cf:name, timestamp=1507641855165, value=hdf 3 row(s) in 0.0190 seconds
Then i have created Hive table without compression property in the statement on top of HBase tbl_snp table
Create Table Statement:-
create external table default.tbl_snp(id int, name string) STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' WITH SERDEPROPERTIES ("hbase.columns.mapping"=":key,cf:name") TBLPROPERTIES ("hbase.table.name"="tbl_snp");
select * from default.tbl_snp; +-------------+---------------+--+ | tbl_snp.id | tbl_snp.name | +-------------+---------------+--+ | 1 | hcc | | 2 | hdp | | 3 | hdf | +-------------+---------------+--+ 3 rows selected (0.876 seconds)
i did select from Hive table and we got all the records that existed in the HBase table, as i have created Hive table without compression property.
Created 10-10-2017 01:38 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
You don't have to mention any compression format property in create Hive table statement.
Because hive is just pointing to HBase table, if HBase table is compressed then Hive automatically picks up the compression format by default.
Just create table statement without compression formats property like below,
CREATE EXTERNAL TABLE tablename(hbid string,Mvdouble, COUNTRY string) ROW FORMAT SERDE 'org.apache.hadoop.hive.hbase.HBaseSerDe' STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' WITH SERDEPROPERTIES ("hbase.columns.mapping"=":key,RAW:Mv,RAW:COUNTRY") TBLPROPERTIES ("hbase.table.name"="tblname");
Example:-
i have created a HBase table with snappy compression and i put 3 records to it then scanned the table.
hbase(main)#create 'tbl_snp', { NAME => 'cf', COMPRESSION => 'SNAPPY' } hbase(main)#put 'tbl_snp','1','cf:name','hcc' hbase(main)#put 'tbl_snp','2','cf:name','hdp' hbase(main)#put 'tbl_snp','3','cf:name','hdf' hbase(main)#scan 'tbl_snp' ROW COLUMN+CELL 1 column=cf:name, timestamp=1507641820083, value=hcc 2 column=cf:name, timestamp=1507641848288, value=hdp 3 column=cf:name, timestamp=1507641855165, value=hdf 3 row(s) in 0.0190 seconds
Then i have created Hive table without compression property in the statement on top of HBase tbl_snp table
Create Table Statement:-
create external table default.tbl_snp(id int, name string) STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' WITH SERDEPROPERTIES ("hbase.columns.mapping"=":key,cf:name") TBLPROPERTIES ("hbase.table.name"="tbl_snp");
select * from default.tbl_snp; +-------------+---------------+--+ | tbl_snp.id | tbl_snp.name | +-------------+---------------+--+ | 1 | hcc | | 2 | hdp | | 3 | hdf | +-------------+---------------+--+ 3 rows selected (0.876 seconds)
i did select from Hive table and we got all the records that existed in the HBase table, as i have created Hive table without compression property.
