Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Hive external table for Hbase columns loaded through Java Put API

Hive external table for Hbase columns loaded through Java Put API

Expert Contributor

I am using Hbase Put API to write a long datatype to Hbase using the below code

  p.add(Bytes.toBytes(this.hcol_fam_n), Bytes.toBytes(this.hcol_qual_n), Bytes.toBytes(this.newoffset)); 

When I run a scan command on Hbase shell, the values are displayed in hex format of binary representation

  value=\x00\x00\x00\x00\x00\x00\x00\x07 

when I use Hbase put command, the value is properly displayed as Integer. Now when I create a Hive external table on top of this hbase table and declare this column as String, I am unable to read the values that are loaded through Java code, they are displayed as Nulls.

Only those rows that are loaded through Hbase shell using put command are returned when queried through hive.

1. Any idea how to make hive display this column for all the records?

2. I dont understand why the shell displays values in hex format (of binary value) for the numbers inserted through Java and values are displayed in ascii for records inserted through put command. Any reason?

3. How do I make the hbase shell display the values in proper ascii format using scan or get command in Hbase shell? 4. Even in Java code, I am unable to convert the result from Get method to string using Bytes.toString() it returns null whereas Bytes.toLong works in java code. Why is this difference?

4 REPLIES 4

Re: Hive external table for Hbase columns loaded through Java Put API

bq. Any idea how to make hive display this column for all the records?

When you are adding long values through Java , declare them as long in hive to read.

bq.I dont understand why the shell displays values in hex format (of binary value) for the numbers inserted through Java and values are displayed in ascii for records inserted through put command. Any reason?

Because , when you do put from shell , it will always encode with string charset. And, scan displays string representation of the bytes without decoding them with the proper type because he doesn't know that whether your column is containing long value or string value. So, String value will be shown as it is but long has different encoding so String representation of bytes will appear in hex.

//difference in encoding of String and long value


public static byte[] toBytes(String s) {    return s.getBytes(UTF8_CHARSET);  }



public static byte[] toBytes(longval) {    

byte [] b = new byte[8];    

for (int i = 7; i > 0; i--) {     

 b[i] = (byte) val;     

 val >>>= 8;   

 }    

b[0] = (byte) val; 

   return b;  

}

bq. How do I make the hbase shell display the values in proper ascii format using scan or get command in Hbase shell? 4. Even in Java code, I am unable to convert the result from Get method to string using Bytes.toString() it returns null whereas Bytes.toLong works in java code. Why is this difference

In shell, you can't do but in Java , after doing Get, you need to do Bytes.toLong() to decode long value.

Re: Hive external table for Hbase columns loaded through Java Put API

To reinforce what Ankit has said: HBase *only* stores bytes. HBase has no concept of "representation".

The Hive storage handler stores numeric data as bytes. When you use the HBase shell to store the String "1", it's storing the bytes for the character "1".

Highlighted

Re: Hive external table for Hbase columns loaded through Java Put API

Expert Contributor

Thanks for your detailed reply. I created a Hive table with the column mapped to Bigint in Hive, even then the column is displayed as null. Only those columns that were loaded into Hbase as String are getting displayed in Hive (with hive column both as String/Bigint).

Can you please let me know why? Is it always better to write as String when i am planning to read through Hive or using Hbase Shell? or is there something I am missing here?

CREATE EXTERNAL TABLE `hbase_table_4`(
  `key` string COMMENT 'from deserializer', 
  `value` bigint COMMENT 'from deserializer')
ROW FORMAT SERDE 
  'org.apache.hadoop.hive.hbase.HBaseSerDe' 
STORED BY 
  'org.apache.hadoop.hive.hbase.HBaseStorageHandler' 
WITH SERDEPROPERTIES ( 
  'hbase.columns.mapping'='offset:value', 
  'serialization.format'='1')
TBLPROPERTIES (
  'hbase.table.name'='kafka_conn', 
  'transient_lastDdlTime'='1483506197')

hive> select * from hbase_table_4;
OK
test_0909	NULL
test_0910	NULL
test_0911	111
test_0919	5
test_0920	5

hbase(main):004:0> scan 'kafka_conn', {VERSIONS => 10}
ROW                                                   COLUMN+CELL                        
 test_0909                                            column=offset:value, timestamp=1483558076087, value=\x00\x00\x00\x00\x00\x00\x00\x0F
 test_0910                                            column=offset:value, timestamp=1483498353863, value=\x00\x00\x00\x00\x00\x00\x00\x0A
 test_0911                                            column=offset:value, timestamp=1483504038021, value=111                             
 test_0919                                            column=offset:value, timestamp=1483505296398, value=5             
 test_0920                                            column=offset:value, timestamp=1483505356278, value=5             

Re: Hive external table for Hbase columns loaded through Java Put API

New Contributor

Can you please share how did you solved this problem.I am facing the same.

Don't have an account?
Coming from Hortonworks? Activate your account here