Created 03-25-2018 05:48 PM
Hi All,
We have a dataset in Avro format with schema inside each Avro file.
I want to build Hive table on top of these files,
I got the below recommendation from an old question asked here in the community:
CREATE EXTERNAL TABLE sampe_table STORED AS AVRO LOCATION 'hdfs:///user/hive/;
But whenever I try it I get always the error:
Any Suggestions? or do you know any online tool that can split Schema from the file?
java.lang.Exception: java.sql.SQLException: Error while processing statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. java.lang.RuntimeException: MetaException(message:org.apache.hadoop.hive.serde2.SerDeException Encountered AvroSerdeException determining schema. Returning signal schema to indicate problem: Neither avro.schema.literal nor avro.schema.url specified, can't determine table schema)
Created 03-25-2018 06:19 PM
There are two types of files when we talk about avro.
Looks like you have the avro data files but not the avro schema. Follows the steps which will help you to get the avro schema, the avsc files, from your data files and create table on top of them.
//Take a few lines from your avro file hdfs dfs -cat <your avro file name> | head --bytes 10K > $SAMPLE_FILE //Extract the avro schema from your avro data file java -jar $AVRO_TOOLS_PATH/avro-tools-1.7.7.jar getschema $SAMPLE_FILE > $AVRO_SCHEMA_FILE //Upload the schema to hdfs hdfs dfs -put $AVRO_SCHEMA_FILE $AVRO_SCHEMA_DIR //Create the hive table using avro schema CREATE EXTERNAL TABLE sampe_table STORED AS AVRO LOCATION 'hdfs:///user/hive/' TBLPROPERTIES ('avro.schema.url'='<your avro schema path here>');
PS - If you already have the avro schema files, you can skip all the schema creation and steps and simply use the last step to create your table.
Let know if that works for you.
Created 03-25-2018 06:19 PM
There are two types of files when we talk about avro.
Looks like you have the avro data files but not the avro schema. Follows the steps which will help you to get the avro schema, the avsc files, from your data files and create table on top of them.
//Take a few lines from your avro file hdfs dfs -cat <your avro file name> | head --bytes 10K > $SAMPLE_FILE //Extract the avro schema from your avro data file java -jar $AVRO_TOOLS_PATH/avro-tools-1.7.7.jar getschema $SAMPLE_FILE > $AVRO_SCHEMA_FILE //Upload the schema to hdfs hdfs dfs -put $AVRO_SCHEMA_FILE $AVRO_SCHEMA_DIR //Create the hive table using avro schema CREATE EXTERNAL TABLE sampe_table STORED AS AVRO LOCATION 'hdfs:///user/hive/' TBLPROPERTIES ('avro.schema.url'='<your avro schema path here>');
PS - If you already have the avro schema files, you can skip all the schema creation and steps and simply use the last step to create your table.
Let know if that works for you.
Created 09-30-2018 10:01 AM
@Rahul Soni , I am creating one avro file using flume regex interceptor and multiplexing. But that file contains value something like below and when I am trying to generate schema using avro-tools getschema option it is giving only "headers" and "body" as 2 fields. Please advise how to resolve this.
Objavro.codenullavro.schema▒{"type":"record","name":"Event","fields":[{"name":"headers","type":{"type":"map","values":"string"}},{"name":"body","type":"bytes"}]}▒LA▒▒;ڍ▒(▒▒▒=▒YBigDatJava▒Y{"created_at":"Thu Sep 27 11:40:44 +0000 2018","id":1045277052822269952,"id_str":"1045277052822269952","text":"RT @SebasthSeppel: #Jugh \ud83d\udce3 heute ist wieder JUGH !\nHeute haben wir @gschmutz bei uns mit dem spannenden Thema: Streaming Data Ingestion in\u2026","source":"\u003ca href=\"http:\/\/twitter.com\/download\/android\" rel=\"nofollow\"\u003eTwitter for Android\u003c\/a\u003e","truncated":false,"in_reply_to_status_id":null,"in_reply_to_status_id_str":null,"in_reply_to_user_id":null,"in_reply_to_user_id_str":null,"in_reply_to_screen_name":null,"user":{"id":716199874157547520,"id_str":"716199874157547520","name":"Alexander Gomes","screen_name":"nEinsKnull","location":"Kassel, Hessen","url":null,"description":"CTO of family, work at @Micromata, loves #tec #rc #mountainbikes and #paragliding","trans.........
Thanking you
Santanu
Created 03-25-2018 06:24 PM
Thank you for your answer,
I will try your suggestions, but overall, to my knowledge, it is also possible that you have a schema inside the Avro Data files as header, and you can build Hive tables directly on top of these files.That wasn't available in the first releases of Avro but I think it is now available.
But I'll try your solution anyway 🙂
Thanks,
Created 03-25-2018 07:05 PM
Yes, the schema is available in the avro data file and you have to extract it to pass to the hive ddl. Hive ddl expects either the schema path or the schema literal to establish the schema in metastore
Created 03-26-2018 07:58 PM
Yes Rahul, you are right, I have tested it that way and it worked properly.
Thanks,