Support Questions

Find answers, ask questions, and share your expertise

How to build Avro Hive table based on avro files with schema inside?

avatar
Explorer

Hi All,

We have a dataset in Avro format with schema inside each Avro file.

I want to build Hive table on top of these files,

I got the below recommendation from an old question asked here in the community:

CREATE EXTERNAL TABLE sampe_table STORED AS AVRO LOCATION 'hdfs:///user/hive/;

But whenever I try it I get always the error:

Any Suggestions? or do you know any online tool that can split Schema from the file?

      java.lang.Exception: java.sql.SQLException: Error while processing statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. java.lang.RuntimeException: MetaException(message:org.apache.hadoop.hive.serde2.SerDeException Encountered AvroSerdeException determining schema. Returning signal schema to indicate problem: Neither avro.schema.literal nor avro.schema.url specified, can't determine table schema)
1 ACCEPTED SOLUTION

avatar
@Kareem Amin

There are two types of files when we talk about avro.

  • Avro files - which have the data
  • avsc files - avro schema files

Looks like you have the avro data files but not the avro schema. Follows the steps which will help you to get the avro schema, the avsc files, from your data files and create table on top of them.

//Take a few lines from your avro file
hdfs dfs -cat <your avro file name> | head --bytes 10K > $SAMPLE_FILE 

//Extract the avro schema from your avro data file
java -jar $AVRO_TOOLS_PATH/avro-tools-1.7.7.jar getschema $SAMPLE_FILE > $AVRO_SCHEMA_FILE 

//Upload the schema to hdfs
hdfs dfs -put $AVRO_SCHEMA_FILE $AVRO_SCHEMA_DIR

//Create the hive table using avro schema
CREATE EXTERNAL TABLE sampe_table 
STORED AS AVRO 
LOCATION 'hdfs:///user/hive/' 
TBLPROPERTIES ('avro.schema.url'='<your avro schema path here>');

PS - If you already have the avro schema files, you can skip all the schema creation and steps and simply use the last step to create your table.

Let know if that works for you.

View solution in original post

5 REPLIES 5

avatar
@Kareem Amin

There are two types of files when we talk about avro.

  • Avro files - which have the data
  • avsc files - avro schema files

Looks like you have the avro data files but not the avro schema. Follows the steps which will help you to get the avro schema, the avsc files, from your data files and create table on top of them.

//Take a few lines from your avro file
hdfs dfs -cat <your avro file name> | head --bytes 10K > $SAMPLE_FILE 

//Extract the avro schema from your avro data file
java -jar $AVRO_TOOLS_PATH/avro-tools-1.7.7.jar getschema $SAMPLE_FILE > $AVRO_SCHEMA_FILE 

//Upload the schema to hdfs
hdfs dfs -put $AVRO_SCHEMA_FILE $AVRO_SCHEMA_DIR

//Create the hive table using avro schema
CREATE EXTERNAL TABLE sampe_table 
STORED AS AVRO 
LOCATION 'hdfs:///user/hive/' 
TBLPROPERTIES ('avro.schema.url'='<your avro schema path here>');

PS - If you already have the avro schema files, you can skip all the schema creation and steps and simply use the last step to create your table.

Let know if that works for you.

avatar
Contributor

@Rahul Soni , I am creating one avro file using flume regex interceptor and multiplexing. But that file contains value something like below and when I am trying to generate schema using avro-tools getschema option it is giving only "headers" and "body" as 2 fields. Please advise how to resolve this.

Objavro.codenullavro.schema▒{"type":"record","name":"Event","fields":[{"name":"headers","type":{"type":"map","values":"string"}},{"name":"body","type":"bytes"}]}▒LA▒▒;ڍ▒(▒▒▒=▒YBigDatJava▒Y{"created_at":"Thu Sep 27 11:40:44 +0000 2018","id":1045277052822269952,"id_str":"1045277052822269952","text":"RT @SebasthSeppel: #Jugh \ud83d\udce3 heute ist wieder JUGH !\nHeute haben wir @gschmutz bei uns mit dem spannenden Thema: Streaming Data Ingestion in\u2026","source":"\u003ca href=\"http:\/\/twitter.com\/download\/android\" rel=\"nofollow\"\u003eTwitter for Android\u003c\/a\u003e","truncated":false,"in_reply_to_status_id":null,"in_reply_to_status_id_str":null,"in_reply_to_user_id":null,"in_reply_to_user_id_str":null,"in_reply_to_screen_name":null,"user":{"id":716199874157547520,"id_str":"716199874157547520","name":"Alexander Gomes","screen_name":"nEinsKnull","location":"Kassel, Hessen","url":null,"description":"CTO of family, work at @Micromata, loves #tec #rc #mountainbikes and #paragliding","trans.........

Thanking you

Santanu

avatar
Explorer

@Rahul Soni

Thank you for your answer,

I will try your suggestions, but overall, to my knowledge, it is also possible that you have a schema inside the Avro Data files as header, and you can build Hive tables directly on top of these files.That wasn't available in the first releases of Avro but I think it is now available.


But I'll try your solution anyway 🙂

Thanks,

avatar

Yes, the schema is available in the avro data file and you have to extract it to pass to the hive ddl. Hive ddl expects either the schema path or the schema literal to establish the schema in metastore

avatar
Explorer

Yes Rahul, you are right, I have tested it that way and it worked properly.

Thanks,