Support Questions
Find answers, ask questions, and share your expertise
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

AvroSequenceFile support in Pig, Hive, Flume


AvroSequenceFile support in Pig, Hive, Flume


Have been using the AvroSequenceFileOutputFormat to write data to HDFS. It works really well with MapReduce, but support elsewhere seems a bit poor. In particular, I've not found an easy way of integrating AvroSequenceFile's into Hive, Pig or Flume.


Before I go down the route of writing extensions to these tools to read AvroSequenceFiles, it occurs to me there might be a reason support isn't great. Is this an 'unloved' format, where the general concensus is that plain Avro datafiles should be used (AvroKeyOutputFormat in MR)?


I like it because it allows me, in theory, to store Avro in the key and a regular Writable (e.g. BytesWritable) in the value. Useful if I'm doing image processing and need to store the original file path data in the key.   Avro makes sense here as the schema versioning support will be useful in future.  Perhaps there's a better approach though, happy to take recommendations!


Many thanks in advance,



Re: AvroSequenceFile support in Pig, Hive, Flume



Think I may have posted this on the wrong forum - feel free to move it for me.

Don't have an account?
Coming from Hortonworks? Activate your account here