I created a data flow with putting avro files to hdfs. Everything is working, except if I set compression to SNAPPY. With snappy, it creates a avro.snappy file, but when I try to read it with the avro tools, then I getting following exception.
java -jar avro-tools-1.8.1.jar tojson hour\=21/000000_0.avro.snappy Exception in thread "main" java.io.IOException: Not a data file. at org.apache.avro.file.DataFileStream.initialize(DataFileStream.java:105) at org.apache.avro.file.DataFileStream.<init>(DataFileStream.java:84) at org.apache.avro.tool.DataFileReadTool.run(DataFileReadTool.java:71) at org.apache.avro.tool.Main.run(Main.java:87) at org.apache.avro.tool.Main.main(Main.java:76)
Any idea what could be wrong?
Thanks and regards,
There should be a flag, I think --codec where you specify that its a snappy file, otherwise its assuming it is regular uncompressed Avro (the file extension does not mean anything).
the only option for tojson is --pretty. When I compress a avro file with avro tools, then I can read it with out specifying a codec