Archives of Support Questions (Read Only)

This is an archived board for historical reference. Information and links may no longer be available or relevant
Announcements
This board is archived and read-only for historical reference. To ask a new question, please post a new topic on the appropriate active board.

Hive table pointing to Avro schema on local file system

avatar

I am trying to create an external Hive table that points to an avro schema file (.avsc) that lives on the local file system. I know this is possible on Cloudera, but not so sure about Hortonworks. Typically, most 'avro.schema.url' examples point to 'hdfs:///', but this is not what I am hoping to accomplish. I am attempting to use 'file:///'.

The functionality I am attempting to mimic can be found here.

Any help would be greatly appreciated!

1 ACCEPTED SOLUTION

avatar

Hey @Shane B!
It should work with avro schemas located in your local fs. Both use the same serde to deal with avro typos.
Here's an example:

[hive@node3 ~]$ cat  user.avsc 
{"namespace": "example.avro",
 "type": "record",
 "name": "User",
 "fields": [
     {"name": "name", "type": "string"},
     {"name": "favorite_number",  "type": ["int", "null"]},
     {"name": "favorite_color", "type": ["string", "null"]}
 ]
}
0: jdbc:hive2://node3:10000/default>  CREATE TABLE test
0: jdbc:hive2://node3:10000/default>   ROW FORMAT SERDE
0: jdbc:hive2://node3:10000/default>   'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
0: jdbc:hive2://node3:10000/default>   STORED as AVRO
0: jdbc:hive2://node3:10000/default>   TBLPROPERTIES (
0: jdbc:hive2://node3:10000/default>     'avro.schema.url'='file:///home/hive/user.avsc');
No rows affected (1.492 seconds)
0: jdbc:hive2://node3:10000/default> show create table test;
+------------------------------------------------------------------+--+
|                          createtab_stmt                          |
+------------------------------------------------------------------+--+
| CREATE TABLE `test`(                                             |
|   `name` string COMMENT '',                                      |
|   `favorite_number` int COMMENT '',                              |
|   `favorite_color` string COMMENT '')                            |
| ROW FORMAT SERDE                                                 |
|   'org.apache.hadoop.hive.serde2.avro.AvroSerDe'                 |
| STORED AS INPUTFORMAT                                            |
|   'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'   |
| OUTPUTFORMAT                                                     |
|   'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'  |
| LOCATION                                                         |
|   'hdfs://Admin-TrainingNS/apps/hive/warehouse/test'             |
| TBLPROPERTIES (                                                  |
|   'COLUMN_STATS_ACCURATE'='{\"BASIC_STATS\":\"true\"}',          |
|   'avro.schema.url'='file:///home/hive/user.avsc',               |
|   'numFiles'='0',                                                |
|   'numRows'='0',                                                 |
|   'rawDataSize'='0',                                             |
|   'totalSize'='0',                                               |
|   'transient_lastDdlTime'='1531430559')                          |
+------------------------------------------------------------------+--+
20 rows selected (0.775 seconds)
Hope this helps!

View solution in original post

10 REPLIES 10