Support Questions
Find answers, ask questions, and share your expertise

Avro file creation from a Fixed width format file - approach

Avro file creation from a Fixed width format file - approach

Explorer

Hi,

I would like to load a fixed width record (records are delimited by \n) to Hive table(avro).

I have few questions on avro.

1. I have seen .avro and .avsc file where the data and schema exist in separate file. I have also seen file where the schema and data exist in the same file. Which one is the best approach to load into avro hive?

2. Secondly, would like to understand more from the schema evolution. I understand from the other postings that using SERDEPROPERTIES is better than TBLPROPERTIES. Does the schema evolution include only adding column at the end or it also includes changing the data types and/or adding column inbetween?

3. I am writing a mapper to convert the fixed width to delimited file and then convert the delimited to .avro file. This is required since we have to filter few records. Converting delimited file to avro would it be better to have it as a separate Java application or have it inside the mapper?

Appreciate for the details.

4. Is there any tool/utility to generate .avsc file from record format(text file)?

Thank you.

3 REPLIES 3

Re: Avro file creation from a Fixed width format file - approach

Contributor

You can load the file into spark -> apply filters -> write the rest of the df to avro..

Re: Avro file creation from a Fixed width format file - approach

Explorer

Thank you. We do not want to go for Spark for transformation. Any details on avro file format?

Re: Avro file creation from a Fixed width format file - approach

Explorer

I have used TextInformat and Multi-avro output format. This worked for me. Thank you.