Created 08-11-2017 03:05 PM
java -jar avro-tools-1.8.2.jar getschema test.avro log4j:WARN No appenders could be found for logger (org.apache.hadoop.metrics2.lib.MutableMetricsFactory). log4j:WARN Please initialize the log4j system properly. log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info. Exception in thread "main" java.io.IOException: Not a data file. at org.apache.avro.file.DataFileStream.initialize(DataFileStream.java:105) at org.apache.avro.file.DataFileReader.<init>(DataFileReader.java:97) at org.apache.avro.tool.DataFileGetSchemaTool.run(DataFileGetSchemaTool.java:47) at org.apache.avro.tool.Main.run(Main.java:87) at org.apache.avro.tool.Main.main(Main.java:76)
No errors until I try to convert that file to ORC or if I download it and look at it in avro tools.
1 Jungnickel Rd Ganado���&2017-08-04 15:26:54MovementBFAK@WE�=@�"ƖX�&2017-08-04 15:06:561243142403�TX&2017-08-04 14:15:31 77962%
Created 08-11-2017 03:15 PM
Can you share the configuration of AvroRecordSetWriter? That file doesn't look like it has a schema embedded in it (you can usually see the schema as JSON near the beginning of the file contents). You may need to configure the writer to embed the schema for use by ConvertAvroToORC or avro-tools (if you don't separately provide the schema to the latter).
Created 08-11-2017 03:15 PM
Can you share the configuration of AvroRecordSetWriter? That file doesn't look like it has a schema embedded in it (you can usually see the schema as JSON near the beginning of the file contents). You may need to configure the writer to embed the schema for use by ConvertAvroToORC or avro-tools (if you don't separately provide the schema to the latter).
Created on 08-11-2017 03:26 PM - edited 08-17-2019 08:03 PM
The files are really small too as seen above and don't seem complete.
I have done that conversion before with no issues.
I am wondering if this is related to having a bunch of null fields.
Created 08-11-2017 03:48 PM
I think the issue is with the HWX Content-Encoded Schema Reference, this is a special "header" in an avro file which makes it easy to integrate with HWX Schema Registry serializers and deserializers, but likely precludes it from being understood by Apache Avro readers such as the one in ConvertAvroToORC or avro-tools. If you can, try setting the Schema Write Strategy to Embed Avro Schema; this will result in larger flow files but should work in downstream processors. If/when there is a OrcRecordSetWriter, you should be able to reuse the HWX schema reference option there.
Created 08-11-2017 06:17 PM
+1 for an OrcRecordSetWriter