Created 07-12-2018 08:55 PM
I am trying to create an external Hive table that points to an avro schema file (.avsc) that lives on the local file system. I know this is possible on Cloudera, but not so sure about Hortonworks. Typically, most 'avro.schema.url' examples point to 'hdfs:///', but this is not what I am hoping to accomplish. I am attempting to use 'file:///'.
The functionality I am attempting to mimic can be found here.
Any help would be greatly appreciated!
Created 07-12-2018 09:30 PM
Hey @Shane B!
It should work with avro schemas located in your local fs. Both use the same serde to deal with avro typos.
Here's an example:
[hive@node3 ~]$ cat user.avsc {"namespace": "example.avro", "type": "record", "name": "User", "fields": [ {"name": "name", "type": "string"}, {"name": "favorite_number", "type": ["int", "null"]}, {"name": "favorite_color", "type": ["string", "null"]} ] } 0: jdbc:hive2://node3:10000/default> CREATE TABLE test 0: jdbc:hive2://node3:10000/default> ROW FORMAT SERDE 0: jdbc:hive2://node3:10000/default> 'org.apache.hadoop.hive.serde2.avro.AvroSerDe' 0: jdbc:hive2://node3:10000/default> STORED as AVRO 0: jdbc:hive2://node3:10000/default> TBLPROPERTIES ( 0: jdbc:hive2://node3:10000/default> 'avro.schema.url'='file:///home/hive/user.avsc'); No rows affected (1.492 seconds) 0: jdbc:hive2://node3:10000/default> show create table test; +------------------------------------------------------------------+--+ | createtab_stmt | +------------------------------------------------------------------+--+ | CREATE TABLE `test`( | | `name` string COMMENT '', | | `favorite_number` int COMMENT '', | | `favorite_color` string COMMENT '') | | ROW FORMAT SERDE | | 'org.apache.hadoop.hive.serde2.avro.AvroSerDe' | | STORED AS INPUTFORMAT | | 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat' | | OUTPUTFORMAT | | 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat' | | LOCATION | | 'hdfs://Admin-TrainingNS/apps/hive/warehouse/test' | | TBLPROPERTIES ( | | 'COLUMN_STATS_ACCURATE'='{\"BASIC_STATS\":\"true\"}', | | 'avro.schema.url'='file:///home/hive/user.avsc', | | 'numFiles'='0', | | 'numRows'='0', | | 'rawDataSize'='0', | | 'totalSize'='0', | | 'transient_lastDdlTime'='1531430559') | +------------------------------------------------------------------+--+ 20 rows selected (0.775 seconds)Hope this helps!
Created 07-18-2018 05:02 AM
Hello @Shane B!
Thanks for the words, but you know, I'm just a humble guy trying to help here, but I appreciate it 😄
I'm not a hive specialist, but, looking at the apache hive, you may find something on these links:
https://github.com/apache/hive/blob/cacb1c09574c89ac07fcffc0b8c3fad18e283aec/serde/src/java/org/apac...
https://github.com/apache/hive/blob/ae008b79b5d52ed6a38875b73025a505725828eb/serde/src/java/org/apac...
https://github.com/apache/hive/blob/ae008b79b5d52ed6a38875b73025a505725828eb/serde/src/java/org/apac...
https://github.com/apache/hive/blob/ff67cdda1c538dc65087878eeba3e165cf3230f4/serde/src/java/org/apac...
https://github.com/apache/hive/blob/c2940a07cf0891e922672782b73ec22551a7eedd/ql/src/java/org/apache/...
Hope this helps!