Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

AvroSequenceFile support in Pig, Hive, Flume

AvroSequenceFile support in Pig, Hive, Flume

Explorer


Have been using the AvroSequenceFileOutputFormat to write data to HDFS. It works really well with MapReduce, but support elsewhere seems a bit poor. In particular, I've not found an easy way of integrating AvroSequenceFile's into Hive, Pig or Flume.

 

Before I go down the route of writing extensions to these tools to read AvroSequenceFiles, it occurs to me there might be a reason support isn't great. Is this an 'unloved' format, where the general concensus is that plain Avro datafiles should be used (AvroKeyOutputFormat in MR)?

 

I like it because it allows me, in theory, to store Avro in the key and a regular Writable (e.g. BytesWritable) in the value. Useful if I'm doing image processing and need to store the original file path data in the key.   Avro makes sense here as the schema versioning support will be useful in future.  Perhaps there's a better approach though, happy to take recommendations!

 

Many thanks in advance,

Tom

1 REPLY 1

Re: AvroSequenceFile support in Pig, Hive, Flume

Explorer

 

Think I may have posted this on the wrong forum - feel free to move it for me.