Support Questions

Find answers, ask questions, and share your expertise

NiFI ConvertRecord : AvroRecordSetWriter Producing Invalid Avro

avatar
Master Guru

java -jar avro-tools-1.8.2.jar getschema test.avro log4j:WARN No appenders could be found for logger (org.apache.hadoop.metrics2.lib.MutableMetricsFactory). log4j:WARN Please initialize the log4j system properly. log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info. Exception in thread "main" java.io.IOException: Not a data file. at org.apache.avro.file.DataFileStream.initialize(DataFileStream.java:105) at org.apache.avro.file.DataFileReader.<init>(DataFileReader.java:97) at org.apache.avro.tool.DataFileGetSchemaTool.run(DataFileGetSchemaTool.java:47) at org.apache.avro.tool.Main.run(Main.java:87) at org.apache.avro.tool.Main.main(Main.java:76)

No errors until I try to convert that file to ORC or if I download it and look at it in avro tools.

1 Jungnickel Rd Ganado���&2017-08-04 15:26:54MovementBFAK@WE�=@�"ƖX�&2017-08-04 15:06:561243142403�TX&2017-08-04 14:15:31 77962%

1 ACCEPTED SOLUTION

avatar
Master Guru

Can you share the configuration of AvroRecordSetWriter? That file doesn't look like it has a schema embedded in it (you can usually see the schema as JSON near the beginning of the file contents). You may need to configure the writer to embed the schema for use by ConvertAvroToORC or avro-tools (if you don't separately provide the schema to the latter).

View solution in original post

4 REPLIES 4

avatar
Master Guru

Can you share the configuration of AvroRecordSetWriter? That file doesn't look like it has a schema embedded in it (you can usually see the schema as JSON near the beginning of the file contents). You may need to configure the writer to embed the schema for use by ConvertAvroToORC or avro-tools (if you don't separately provide the schema to the latter).

avatar
Master Guru

The files are really small too as seen above and don't seem complete.

I have done that conversion before with no issues.

I am wondering if this is related to having a bunch of null fields.

27421-convertrecord.png

27422-jsontreereader.png

27423-avrorecordsetwriter.png

27424-schema1.png

avatar
Master Guru

I think the issue is with the HWX Content-Encoded Schema Reference, this is a special "header" in an avro file which makes it easy to integrate with HWX Schema Registry serializers and deserializers, but likely precludes it from being understood by Apache Avro readers such as the one in ConvertAvroToORC or avro-tools. If you can, try setting the Schema Write Strategy to Embed Avro Schema; this will result in larger flow files but should work in downstream processors. If/when there is a OrcRecordSetWriter, you should be able to reuse the HWX schema reference option there.

avatar
Master Guru

+1 for an OrcRecordSetWriter