Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Impala is not able to query hive table over HBase with binary column.

avatar
Contributor

Issue is pretty simple and I am not sure if there is a simple thing that I seem to be missing something pretty basic. CDH5.12 is being used.

 

1. Create Hive Table over HBase by specifying the numerical columns as binary.

2. Ingest data into the table through any means.

3. Query the table through hive and results are fine.

4. Impala gives an error.

 

1. Hive Table Created over HBase with following syntax:

 

hive -e "CREATE TABLE WeatherData(\
                key  STRING, \
                wsid STRING, \
                year INT,\
                month INT,\
                day INT,\
                hour INT,\
                temperature DOUBLE,\
                dewpoint DOUBLE,\
                pressure DOUBLE,\
                windDirection INT,\
                windSpeed DOUBLE,\
                skyCondition INT,\
                skyConditionText STRING, \
                oneHourPrecip DOUBLE, \
                sixHourPrecip DOUBLE) \
        STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'\
        WITH SERDEPROPERTIES (\"hbase.columns.mapping\" = \":key, weather:wsid, weather:year#b, weather:month#b, weather:day#b, weather:hour#b, weather:temperature#b, weather:dewpoint#b, weather:pressure#b, weather:windDirection#b, weather:windSpeed#b, weather:skyCondition#b, weather:skyConditionText, weather:oneHourPrecip#b, weather:sixHourPrecip#b \")\
        TBLPROPERTIES (\"hbase.table.name\" = \"WeatherData\", \"hbase.mapred.output.outputtable\" = \"WeatherData\");"

 

2. Ingest data into the HBase table through any preferred means.

 

3. Results of query through Hive are as below:

 

hive> SELECT * FROM default.WeatherData WHERE key="002020ed-8496-4780-8ae2-fdc820d0e4e0";
Query ID = cloudera_20180512144646_baded4e1-8a5c-4fb4-bd47-8293fc910b44

Total MapReduce CPU Time Spent: 8 seconds 10 msec
OK
002020ed-8496-4780-8ae2-fdc820d0e4e0    725030:14732    2008    11    17    12    3.9    -4.4    1016.9    270    5.7    0    0.0    0.0    NULL
Time taken: 24.62 seconds, Fetched: 1 row(s)

 

4. Same query using Impala gives the below issue.

 

[quickstart.cloudera:21000] > SELECT * FROM default.WeatherData WHERE key="002020ed-8496-4780-8ae2-fdc820d0e4e0";
Query: select * FROM default.WeatherData WHERE key="002020ed-8496-4780-8ae2-fdc820d0e4e0"
Query submitted at: 2018-05-12 14:47:47 (Coordinator: http://quickstart.cloudera:25000)
ERROR: AnalysisException: Failed to load metadata for table: 'default.WeatherData'
CAUSED BY: TableLoadingException: Failed to load metadata for HBase table: weatherdata
CAUSED BY: SerDeException: Error: A column storage specification is one of the following: '-', a prefix of 'string', or a prefix of 'binary'. b  is not a valid storage option specification for sixhourprecip

 

 

Looking into the Impala code, it seems that binary data type is not supported as column storage type. See link: https://github.com/cloudera/Impala/blob/0c713cf67959b9633d2fe6f5c21af218a43e4214/fe/src/main/java/or...

 

How are others dealing with this requirement? 

1 ACCEPTED SOLUTION

avatar
Explorer

Hi,

 

I suppose the issue is that you have an additional space character in your query - right after "weather:sixHourPrecip#b"

View solution in original post

2 REPLIES 2

avatar
Explorer

Hi,

 

I suppose the issue is that you have an additional space character in your query - right after "weather:sixHourPrecip#b"

avatar
Contributor

Vilyam

 

You were spot ON. This issue has been fixed. Excellent pair of eyes. I am getting my eyes inspected again :).

 

Thanks,

Kabeer.