Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Created Phoenix view to map existing HBase table, but got different float value on a row

avatar

I have created a view to map to an existing HBase table as following:

create view "wanghai:obsweather"(pk  VARCHAR primary key,

"d"."stationid" VARCHAR,

"d"."temp"  Float,

...);

In Hbase shell, my query on a row/column yields the following:

hbase(main):004:0> get 'wanghai:obsweather', 'AP611_mesonet|2014-02-01 00:01:30', 'd:temp:toFloat'

COLUMN  CELL    

d:temp  timestamp=1454132157194, value=274.261108398437   

1 row(s) in 0.0210 seconds

In Phoenix sqlline.py, I got the following:

0: jdbc:phoenix:localhost:2181:/hbase-unsecur> select pk, "d"."stationid", "d"."temp" from "wanghai:obsweather"

. . . . . . . . . . . . . . . . . . . . . . .> where pk='AP611_mesonet|2014-02-01 00:01:30';

'PK','stationid','temp'

'AP611_mesonet|2014-02-01 00:01:30','AP611_mesonet','-0.015067715'

1 row selected (0.11 seconds)

Any idea for the same (row, cf:q), why I get "274.2611" from Hbase shell, but "-0.015067715" in Phoenix?

What is wrong?

1 ACCEPTED SOLUTION

avatar
Guru

HBase and Phoenix encode floats and some other data types differently.

This is from https://phoenix.apache.org/

The other caveat is that the way the bytes were serialized in HBase must match the way the bytes are expected to be serialized by Phoenix. For VARCHAR,CHAR, and UNSIGNED_* types, Phoenix uses the HBase Bytes utility methods to perform serialization. The CHAR type expects only single-byte characters and the UNSIGNED types expect values greater than or equal to zero.

Our composite row keys are formed by simply concatenating the values together, with a zero byte character used as a separator after a variable length type. For more information on our type system, see the Data Type.

View solution in original post

11 REPLIES 11

avatar
Guru

HBase and Phoenix encode floats and some other data types differently.

This is from https://phoenix.apache.org/

The other caveat is that the way the bytes were serialized in HBase must match the way the bytes are expected to be serialized by Phoenix. For VARCHAR,CHAR, and UNSIGNED_* types, Phoenix uses the HBase Bytes utility methods to perform serialization. The CHAR type expects only single-byte characters and the UNSIGNED types expect values greater than or equal to zero.

Our composite row keys are formed by simply concatenating the values together, with a zero byte character used as a separator after a variable length type. For more information on our type system, see the Data Type.

avatar

Can you please elaborate on the encoding of float is different? I did not see that in Phoenix manual.

The most important thing, How do I handle this to make phoenix useful for float value in my Hbase table? It would be useless if I can not see the correct value in Phoenix. Is there any configuration to tweak?

avatar
Super Guru

Enis was saying that Phoenix expects the value of your FLOAT column to be serialized using https://hbase.apache.org/apidocs/org/apache/hadoop/hbase/util/Bytes.html#toBytes%28float%29. The "ascii" representation of the float which you have in the value is not what Phoenix is expecting and, thus, is parsing it to a value that you do not expect.

avatar

I am using Bytes class to serialize float. And pay attention How I get it from Hbase shell, I am telling HBase shell that the column d:temp is a float and use toFloat to deserialize it

get'wanghai:obsweather','AP611_mesonet|2014-02-01 00:01:30','d:temp:toFloat'

avatar
Super Guru

Ah, sorry, I missed that detail. I thought you were just doing a normal get on the column.

avatar
Guru

The above still applies. HBase's float -> byte[] serialization is different than Phoenix's FLOAT -> byte[] serialization. HBase's one does not sort correctly for negative values in the byte[] serialized format. That is why if you want to interoperate between HBase and Phoenix, you should use "compatible" types.

If you are doing serialization from HBase, you should declare the type as UNSIGNED_FLOAT in Phoenix. Check out the documentation:

https://phoenix.apache.org/language/datatypes.html#unsigned_float_type:

avatar

One more information, these data was bulk loaded into HBase from CSV files. I verified CSV file, Hbase shell is returning the correct value. During bulk load process, Bytes.toBytes(float) was used to serialize all float type values.

avatar

@Enis I think I finally understand your reply. For float serialized in HBase, if the float are non-negative, map it to Phoenix Unsigned_Float, you will get correct value in Phoenix.

If float serialized in HBase can be negative, if I map it to FLOAT in Phoenix, I got wrong value because of different encoding method. If I map to to Unsigned_Float, then I got run time Exception(

java.sql.SQLException: ERROR 201 (22000): Illegal data.)

when I try to get a negative value.

So, seems there is no way to properly handle existing negative float value serialized by HBase in Phoenix. Am I right?

avatar
Guru

Yes, HBase's Bytes.toXXX utility methods are not designed to be used for negative values unfortunately. You can still use those for storing negative values as cell values. If you use it for storing primary key or column names for example, the byte[] sort orders will not be correct (negative values will be sorted AFTER the positive values). If you want to interoperate HBase and Phoenix, you can use PXXX classes in Phoenix (PFloat, PInteger) etc in your Java code instead of using Bytes.toBytes().