Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Hive table pointing to Avro schema on local file system

avatar

I am trying to create an external Hive table that points to an avro schema file (.avsc) that lives on the local file system. I know this is possible on Cloudera, but not so sure about Hortonworks. Typically, most 'avro.schema.url' examples point to 'hdfs:///', but this is not what I am hoping to accomplish. I am attempting to use 'file:///'.

The functionality I am attempting to mimic can be found here.

Any help would be greatly appreciated!

1 ACCEPTED SOLUTION

avatar

Hey @Shane B!
It should work with avro schemas located in your local fs. Both use the same serde to deal with avro typos.
Here's an example:

[hive@node3 ~]$ cat  user.avsc 
{"namespace": "example.avro",
 "type": "record",
 "name": "User",
 "fields": [
     {"name": "name", "type": "string"},
     {"name": "favorite_number",  "type": ["int", "null"]},
     {"name": "favorite_color", "type": ["string", "null"]}
 ]
}
0: jdbc:hive2://node3:10000/default>  CREATE TABLE test
0: jdbc:hive2://node3:10000/default>   ROW FORMAT SERDE
0: jdbc:hive2://node3:10000/default>   'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
0: jdbc:hive2://node3:10000/default>   STORED as AVRO
0: jdbc:hive2://node3:10000/default>   TBLPROPERTIES (
0: jdbc:hive2://node3:10000/default>     'avro.schema.url'='file:///home/hive/user.avsc');
No rows affected (1.492 seconds)
0: jdbc:hive2://node3:10000/default> show create table test;
+------------------------------------------------------------------+--+
|                          createtab_stmt                          |
+------------------------------------------------------------------+--+
| CREATE TABLE `test`(                                             |
|   `name` string COMMENT '',                                      |
|   `favorite_number` int COMMENT '',                              |
|   `favorite_color` string COMMENT '')                            |
| ROW FORMAT SERDE                                                 |
|   'org.apache.hadoop.hive.serde2.avro.AvroSerDe'                 |
| STORED AS INPUTFORMAT                                            |
|   'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'   |
| OUTPUTFORMAT                                                     |
|   'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'  |
| LOCATION                                                         |
|   'hdfs://Admin-TrainingNS/apps/hive/warehouse/test'             |
| TBLPROPERTIES (                                                  |
|   'COLUMN_STATS_ACCURATE'='{\"BASIC_STATS\":\"true\"}',          |
|   'avro.schema.url'='file:///home/hive/user.avsc',               |
|   'numFiles'='0',                                                |
|   'numRows'='0',                                                 |
|   'rawDataSize'='0',                                             |
|   'totalSize'='0',                                               |
|   'transient_lastDdlTime'='1531430559')                          |
+------------------------------------------------------------------+--+
20 rows selected (0.775 seconds)
Hope this helps!

View solution in original post

10 REPLIES 10