Have been using the AvroSequenceFileOutputFormat to write data to HDFS. It works really well with MapReduce, but support elsewhere seems a bit poor. In particular, I've not found an easy way of integrating AvroSequenceFile's into Hive, Pig or Flume.
Before I go down the route of writing extensions to these tools to read AvroSequenceFiles, it occurs to me there might be a reason support isn't great. Is this an 'unloved' format, where the general concensus is that plain Avro datafiles should be used (AvroKeyOutputFormat in MR)?
I like it because it allows me, in theory, to store Avro in the key and a regular Writable (e.g. BytesWritable) in the value. Useful if I'm doing image processing and need to store the original file path data in the key. Avro makes sense here as the schema versioning support will be useful in future. Perhaps there's a better approach though, happy to take recommendations!
Many thanks in advance,