Created on 05-12-2018 08:30 AM - edited 09-16-2022 06:13 AM
Issue is pretty simple and I am not sure if there is a simple thing that I seem to be missing something pretty basic. CDH5.12 is being used.
1. Create Hive Table over HBase by specifying the numerical columns as binary.
2. Ingest data into the table through any means.
3. Query the table through hive and results are fine.
4. Impala gives an error.
1. Hive Table Created over HBase with following syntax:
hive -e "CREATE TABLE WeatherData(\
key STRING, \
wsid STRING, \
year INT,\
month INT,\
day INT,\
hour INT,\
temperature DOUBLE,\
dewpoint DOUBLE,\
pressure DOUBLE,\
windDirection INT,\
windSpeed DOUBLE,\
skyCondition INT,\
skyConditionText STRING, \
oneHourPrecip DOUBLE, \
sixHourPrecip DOUBLE) \
STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'\
WITH SERDEPROPERTIES (\"hbase.columns.mapping\" = \":key, weather:wsid, weather:year#b, weather:month#b, weather:day#b, weather:hour#b, weather:temperature#b, weather:dewpoint#b, weather:pressure#b, weather:windDirection#b, weather:windSpeed#b, weather:skyCondition#b, weather:skyConditionText, weather:oneHourPrecip#b, weather:sixHourPrecip#b \")\
TBLPROPERTIES (\"hbase.table.name\" = \"WeatherData\", \"hbase.mapred.output.outputtable\" = \"WeatherData\");"
2. Ingest data into the HBase table through any preferred means.
3. Results of query through Hive are as below:
hive> SELECT * FROM default.WeatherData WHERE key="002020ed-8496-4780-8ae2-fdc820d0e4e0";
Query ID = cloudera_20180512144646_baded4e1-8a5c-4fb4-bd47-8293fc910b44
Total MapReduce CPU Time Spent: 8 seconds 10 msec
OK
002020ed-8496-4780-8ae2-fdc820d0e4e0 725030:14732 2008 11 17 12 3.9 -4.4 1016.9 270 5.7 0 0.0 0.0 NULL
Time taken: 24.62 seconds, Fetched: 1 row(s)
4. Same query using Impala gives the below issue.
[quickstart.cloudera:21000] > SELECT * FROM default.WeatherData WHERE key="002020ed-8496-4780-8ae2-fdc820d0e4e0";
Query: select * FROM default.WeatherData WHERE key="002020ed-8496-4780-8ae2-fdc820d0e4e0"
Query submitted at: 2018-05-12 14:47:47 (Coordinator: http://quickstart.cloudera:25000)
ERROR: AnalysisException: Failed to load metadata for table: 'default.WeatherData'
CAUSED BY: TableLoadingException: Failed to load metadata for HBase table: weatherdata
CAUSED BY: SerDeException: Error: A column storage specification is one of the following: '-', a prefix of 'string', or a prefix of 'binary'. b is not a valid storage option specification for sixhourprecip
Looking into the Impala code, it seems that binary data type is not supported as column storage type. See link: https://github.com/cloudera/Impala/blob/0c713cf67959b9633d2fe6f5c21af218a43e4214/fe/src/main/java/or...
How are others dealing with this requirement?
Created 05-15-2018 07:57 AM
Hi,
I suppose the issue is that you have an additional space character in your query - right after "weather:sixHourPrecip#b"
Created 05-15-2018 07:57 AM
Hi,
I suppose the issue is that you have an additional space character in your query - right after "weather:sixHourPrecip#b"
Created 05-15-2018 09:29 AM
Vilyam
You were spot ON. This issue has been fixed. Excellent pair of eyes. I am getting my eyes inspected again :).
Thanks,
Kabeer.