Created 07-13-2016 02:38 PM
Hello everyone,
I am trying to integrate Hive with an HBase instance that uses Avro to store data. I have followed the procedure described in the documentation here. In HBase I have one table called 'events' with one column family (masterdata) and one payload column (payload).
The stack is running on Hortonworks 2.4:
Hive: 1.2.1,
Hbase: 1.1.2
HDFS: 2.7.1
The procedure to create the table in Hive is stored as setup.hql:
CREATE EXTERNAL TABLE masterdata_events ROW FORMAT SERDE 'org.apache.hadoop.hive.hbase.HBaseSerDe' STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler' WITH SERDEPROPERTIES ( "hbase.columns.mapping" = ":key,masterdata:payload", "masterdata.payload.serialization.type"="avro", "masterdata.payload.avro.schema.url" = 'hdfs://cluster/schema.avsc', "hive.serialization.extend.nesting.levels" = "true" ) TBLPROPERTIES ("hbase.table.name" = "events", "hbase.struct.autogenerate"="true");
I run it with 'hive -f setup.hql'. The output is as follows:
OK Time taken: 2.625 seconds OK Time taken: 3.301 seconds OK Time taken: 0.066 seconds FAILED: IllegalArgumentException Error: : expected at the end of 'string:struct<product:struct<productmain:struct<id:string,...so on and so on ...>'
Now, I would like to point out that the amount of characters between the two single quotes is 4008. After a search on google, I have found an answer from another user in this community stating that you have to increase the size of the SERDE_PARAMS in the Hive Metadata store. Now, despite the workaround, I am still experiencing the same issue. I have also tried to use "masterdata.payload.avro.schema.literal" as well, but nothing has changed. The avro schema is valid json (verified).
Could you please give me any hints?
Created 07-14-2016 11:25 AM
have you added following property in TBLPROPERTIES
"hbase.struct.autogenerate"= "true"
This allows you to avoid manually creating the columns and types for Avro schemas
And ,I think hive.serialization.extend.nesting.levels may not be in affect as it is used by lazySimpleSerde
Created 07-14-2016 11:25 AM
have you added following property in TBLPROPERTIES
"hbase.struct.autogenerate"= "true"
This allows you to avoid manually creating the columns and types for Avro schemas
And ,I think hive.serialization.extend.nesting.levels may not be in affect as it is used by lazySimpleSerde
Created 07-14-2016 01:57 PM
Thanks a lot for your answer, @Ankit Singhal! Yes, I have added the property
"hbase.struct.autogenerate" = "true"
in the setup.hql file. In regards to the "hive.serialization.extend.nesting.levels", I had to add it, because I was receiving an error message like this:
ERROR ql.Driver: FAILED: SemanticException org.apache.hadoop.hive.serde2.SerDeException: Number of levels of nesting supported for LazySimpleSerde is 8 Unable to work with 9 levels
or something similar. After I added that line, it started working just to break to IllegalArgumentException - I am not sure if they are somehow related (?).