Created 02-08-2016 07:58 PM
I have created a view to map to an existing HBase table as following:
create view "wanghai:obsweather"(pk VARCHAR primary key, "d"."stationid" VARCHAR, "d"."temp" Float, ...);
In Hbase shell, my query on a row/column yields the following:
hbase(main):004:0> get 'wanghai:obsweather', 'AP611_mesonet|2014-02-01 00:01:30', 'd:temp:toFloat' COLUMN CELL d:temp timestamp=1454132157194, value=274.261108398437 1 row(s) in 0.0210 seconds
In Phoenix sqlline.py, I got the following:
0: jdbc:phoenix:localhost:2181:/hbase-unsecur> select pk, "d"."stationid", "d"."temp" from "wanghai:obsweather" . . . . . . . . . . . . . . . . . . . . . . .> where pk='AP611_mesonet|2014-02-01 00:01:30'; 'PK','stationid','temp' 'AP611_mesonet|2014-02-01 00:01:30','AP611_mesonet','-0.015067715' 1 row selected (0.11 seconds)
Any idea for the same (row, cf:q), why I get "274.2611" from Hbase shell, but "-0.015067715" in Phoenix?
What is wrong?
Created 02-08-2016 08:01 PM
HBase and Phoenix encode floats and some other data types differently.
This is from https://phoenix.apache.org/
The other caveat is that the way the bytes were serialized in HBase must match the way the bytes are expected to be serialized by Phoenix. For VARCHAR,CHAR, and UNSIGNED_* types, Phoenix uses the HBase Bytes utility methods to perform serialization. The CHAR type expects only single-byte characters and the UNSIGNED types expect values greater than or equal to zero.
Our composite row keys are formed by simply concatenating the values together, with a zero byte character used as a separator after a variable length type. For more information on our type system, see the Data Type.
Created 02-08-2016 08:01 PM
HBase and Phoenix encode floats and some other data types differently.
This is from https://phoenix.apache.org/
The other caveat is that the way the bytes were serialized in HBase must match the way the bytes are expected to be serialized by Phoenix. For VARCHAR,CHAR, and UNSIGNED_* types, Phoenix uses the HBase Bytes utility methods to perform serialization. The CHAR type expects only single-byte characters and the UNSIGNED types expect values greater than or equal to zero.
Our composite row keys are formed by simply concatenating the values together, with a zero byte character used as a separator after a variable length type. For more information on our type system, see the Data Type.
Created 02-08-2016 08:11 PM
Can you please elaborate on the encoding of float is different? I did not see that in Phoenix manual.
The most important thing, How do I handle this to make phoenix useful for float value in my Hbase table? It would be useless if I can not see the correct value in Phoenix. Is there any configuration to tweak?
Created 02-08-2016 08:18 PM
Enis was saying that Phoenix expects the value of your FLOAT column to be serialized using https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/util/Bytes.html#toBytes%28float%29. The "ascii" representation of the float which you have in the value is not what Phoenix is expecting and, thus, is parsing it to a value that you do not expect.
Created 02-08-2016 09:11 PM
I am using Bytes class to serialize float. And pay attention How I get it from Hbase shell, I am telling HBase shell that the column d:temp is a float and use toFloat to deserialize it
get'wanghai:obsweather','AP611_mesonet|2014-02-01 00:01:30','d:temp:toFloat'
Created 02-08-2016 09:26 PM
Ah, sorry, I missed that detail. I thought you were just doing a normal get on the column.
Created 02-08-2016 10:32 PM
The above still applies. HBase's float -> byte[] serialization is different than Phoenix's FLOAT -> byte[] serialization. HBase's one does not sort correctly for negative values in the byte[] serialized format. That is why if you want to interoperate between HBase and Phoenix, you should use "compatible" types.
If you are doing serialization from HBase, you should declare the type as UNSIGNED_FLOAT in Phoenix. Check out the documentation:
https://phoenix.apache.org/language/datatypes.html#unsigned_float_type:
Created 02-08-2016 09:56 PM
One more information, these data was bulk loaded into HBase from CSV files. I verified CSV file, Hbase shell is returning the correct value. During bulk load process, Bytes.toBytes(float) was used to serialize all float type values.
Created 02-09-2016 02:51 PM
@Enis I think I finally understand your reply. For float serialized in HBase, if the float are non-negative, map it to Phoenix Unsigned_Float, you will get correct value in Phoenix.
If float serialized in HBase can be negative, if I map it to FLOAT in Phoenix, I got wrong value because of different encoding method. If I map to to Unsigned_Float, then I got run time Exception(
java.sql.SQLException: ERROR 201 (22000): Illegal data.)
when I try to get a negative value.
So, seems there is no way to properly handle existing negative float value serialized by HBase in Phoenix. Am I right?
Created 02-09-2016 07:08 PM
Yes, HBase's Bytes.toXXX utility methods are not designed to be used for negative values unfortunately. You can still use those for storing negative values as cell values. If you use it for storing primary key or column names for example, the byte[] sort orders will not be correct (negative values will be sorted AFTER the positive values). If you want to interoperate HBase and Phoenix, you can use PXXX classes in Phoenix (PFloat, PInteger) etc in your Java code instead of using Bytes.toBytes().