Support Questions

Find answers, ask questions, and share your expertise

Hive table pointing to Avro schema on local file system

avatar

I am trying to create an external Hive table that points to an avro schema file (.avsc) that lives on the local file system. I know this is possible on Cloudera, but not so sure about Hortonworks. Typically, most 'avro.schema.url' examples point to 'hdfs:///', but this is not what I am hoping to accomplish. I am attempting to use 'file:///'.

The functionality I am attempting to mimic can be found here.

Any help would be greatly appreciated!

1 ACCEPTED SOLUTION

avatar

Hey @Shane B!
It should work with avro schemas located in your local fs. Both use the same serde to deal with avro typos.
Here's an example:

[hive@node3 ~]$ cat  user.avsc 
{"namespace": "example.avro",
 "type": "record",
 "name": "User",
 "fields": [
     {"name": "name", "type": "string"},
     {"name": "favorite_number",  "type": ["int", "null"]},
     {"name": "favorite_color", "type": ["string", "null"]}
 ]
}
0: jdbc:hive2://node3:10000/default>  CREATE TABLE test
0: jdbc:hive2://node3:10000/default>   ROW FORMAT SERDE
0: jdbc:hive2://node3:10000/default>   'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
0: jdbc:hive2://node3:10000/default>   STORED as AVRO
0: jdbc:hive2://node3:10000/default>   TBLPROPERTIES (
0: jdbc:hive2://node3:10000/default>     'avro.schema.url'='file:///home/hive/user.avsc');
No rows affected (1.492 seconds)
0: jdbc:hive2://node3:10000/default> show create table test;
+------------------------------------------------------------------+--+
|                          createtab_stmt                          |
+------------------------------------------------------------------+--+
| CREATE TABLE `test`(                                             |
|   `name` string COMMENT '',                                      |
|   `favorite_number` int COMMENT '',                              |
|   `favorite_color` string COMMENT '')                            |
| ROW FORMAT SERDE                                                 |
|   'org.apache.hadoop.hive.serde2.avro.AvroSerDe'                 |
| STORED AS INPUTFORMAT                                            |
|   'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'   |
| OUTPUTFORMAT                                                     |
|   'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'  |
| LOCATION                                                         |
|   'hdfs://Admin-TrainingNS/apps/hive/warehouse/test'             |
| TBLPROPERTIES (                                                  |
|   'COLUMN_STATS_ACCURATE'='{\"BASIC_STATS\":\"true\"}',          |
|   'avro.schema.url'='file:///home/hive/user.avsc',               |
|   'numFiles'='0',                                                |
|   'numRows'='0',                                                 |
|   'rawDataSize'='0',                                             |
|   'totalSize'='0',                                               |
|   'transient_lastDdlTime'='1531430559')                          |
+------------------------------------------------------------------+--+
20 rows selected (0.775 seconds)
Hope this helps!

View solution in original post

10 REPLIES 10