Support Questions
Find answers, ask questions, and share your expertise

Read Hbase table into Hive

Read Hbase table into Hive

Expert Contributor

Hello All,

I am trying to read the hbase table into hive.

I have tried below command for read complete table data.

CREATE EXTERNAL TABLE hbase_table_2(key int, value string) STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'WITH SERDEPROPERTIES ("hbase.columns.mapping" = "cd:value")TBLPROPERTIES("hbase.table.name" = "companydetail", "hbase.mapred.output.outputtable" = "companydetail_hive");

Here cd is a columnfamily in my table, and its value.

Here problem is, when i run this command it only gives the particular value where i have mentioned in hbase columns mapping.

But i would like to load the complete table in hive which contains more than 5000 rows in hbase table, and also column family(cd) contains different values.

In above command, it only gets the single value but i need to get the complete table to load into hive.

how can i do this,

Please suggest me.

Thanks in advance.

Mohan.V

2 REPLIES 2
Highlighted

Re: Read Hbase table into Hive

Hi Mohan,

You can use wildcards in the hbase columns mapping. For example `"hbase.columns.mapping" = "cf:col_prefix.*,:key"`. More info here: https://cwiki.apache.org/confluence/display/Hive/HBaseIntegration#HBaseIntegration-HiveMAPtoHBaseCol...

Highlighted

Re: Read Hbase table into Hive

Expert Contributor

Thanks for the reply Ward Bekker.

I have tried what you have suggested but i still didnt get what i exactly need.

In my table column family is cd and table name is companydetail.

Sample table data:-

ROW COLUMN+CELL

\x00\x00\x00\x00\x00\x00\x06\xA6

column=cd:cct, timestamp=1475738991531, value=Atlanta

\x00\x00\x00\x00\x00\x00\x06\xA6

column=cd:cnt, timestamp=1475740226346, value=Network ICE Corp.

\x00\x00\x00\x00\x00\x00\x06\xA6

column=cd:ct, timestamp=1475740596684, value=ISYI srl

I believe here cct is a col_prefix. And here we are only giving the column family but we are not mentioning anywhere what is the table that we want to get the data.

When i tried this, I get nothing. 0 records.

CREATE TABLE hbase_11(value map<string,int>, row_key int) STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' WITH SERDEPROPERTIES ( "hbase.columns.mapping" = "cd:cct.*,:key" );

I have tried an another way.

CREATE EXTERNAL TABLE hbase_table_1(value map<string,int>, row_key int) STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' WITH SERDEPROPERTIES ( "hbase.columns.mapping" = "cd:cct.*,:key" ) TBLPROPERTIES("hbase.table.name" = "companydetail", "hbase.mapred.output.outputtable" = "companydetail_hive");

I get the output as

{"cct":null}   NULL
{"cct":null}   NULL
{"cct":null}   NULL
{"cct":null}   NULL
{}   NULL
{"cct":null}   NULL
{}   NULL
{"cct":null}   NULL
{"cct":null}   NULL
{"cct":null}   NULL
{"cct":null}   NULL

Time taken: 0.45 seconds, Fetched: 1291 row(s)

Table consist of 1291 rows, and if as i said cct is col_prefix for the table, then it is having 1291 col_prefixes.

I cant get the whole table data because i have to do it for all the 1291 prefixes.

Please help me what i am missing here.