Created 07-12-2018 08:55 PM
I am trying to create an external Hive table that points to an avro schema file (.avsc) that lives on the local file system. I know this is possible on Cloudera, but not so sure about Hortonworks. Typically, most 'avro.schema.url' examples point to 'hdfs:///', but this is not what I am hoping to accomplish. I am attempting to use 'file:///'.
The functionality I am attempting to mimic can be found here.
Any help would be greatly appreciated!
Created 07-12-2018 09:30 PM
Hey @Shane B!
It should work with avro schemas located in your local fs. Both use the same serde to deal with avro typos.
Here's an example:
[hive@node3 ~]$ cat user.avsc
{"namespace": "example.avro",
"type": "record",
"name": "User",
"fields": [
{"name": "name", "type": "string"},
{"name": "favorite_number", "type": ["int", "null"]},
{"name": "favorite_color", "type": ["string", "null"]}
]
}
0: jdbc:hive2://node3:10000/default> CREATE TABLE test
0: jdbc:hive2://node3:10000/default> ROW FORMAT SERDE
0: jdbc:hive2://node3:10000/default> 'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
0: jdbc:hive2://node3:10000/default> STORED as AVRO
0: jdbc:hive2://node3:10000/default> TBLPROPERTIES (
0: jdbc:hive2://node3:10000/default> 'avro.schema.url'='file:///home/hive/user.avsc');
No rows affected (1.492 seconds)
0: jdbc:hive2://node3:10000/default> show create table test;
+------------------------------------------------------------------+--+
| createtab_stmt |
+------------------------------------------------------------------+--+
| CREATE TABLE `test`( |
| `name` string COMMENT '', |
| `favorite_number` int COMMENT '', |
| `favorite_color` string COMMENT '') |
| ROW FORMAT SERDE |
| 'org.apache.hadoop.hive.serde2.avro.AvroSerDe' |
| STORED AS INPUTFORMAT |
| 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat' |
| OUTPUTFORMAT |
| 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat' |
| LOCATION |
| 'hdfs://Admin-TrainingNS/apps/hive/warehouse/test' |
| TBLPROPERTIES ( |
| 'COLUMN_STATS_ACCURATE'='{\"BASIC_STATS\":\"true\"}', |
| 'avro.schema.url'='file:///home/hive/user.avsc', |
| 'numFiles'='0', |
| 'numRows'='0', |
| 'rawDataSize'='0', |
| 'totalSize'='0', |
| 'transient_lastDdlTime'='1531430559') |
+------------------------------------------------------------------+--+
20 rows selected (0.775 seconds)
Hope this helps!
Created 07-18-2018 05:02 AM
Hello @Shane B!
Thanks for the words, but you know, I'm just a humble guy trying to help here, but I appreciate it 😄
I'm not a hive specialist, but, looking at the apache hive, you may find something on these links:
https://github.com/apache/hive/blob/cacb1c09574c89ac07fcffc0b8c3fad18e283aec/serde/src/java/org/apac...
https://github.com/apache/hive/blob/ae008b79b5d52ed6a38875b73025a505725828eb/serde/src/java/org/apac...
https://github.com/apache/hive/blob/ae008b79b5d52ed6a38875b73025a505725828eb/serde/src/java/org/apac...
https://github.com/apache/hive/blob/ff67cdda1c538dc65087878eeba3e165cf3230f4/serde/src/java/org/apac...
https://github.com/apache/hive/blob/c2940a07cf0891e922672782b73ec22551a7eedd/ql/src/java/org/apache/...
Hope this helps!