Created 09-25-2013 02:01 AM
Guys,
I see there's already a Flume morphline solr sink but how can I use a morphline with Flume to write to HDFS without Solr. I'm currently using Flume to partition avro data (AvroFlumeEventSerializer) but I'd like to use the ExtractAvroPath morphline to flatten the complex types so IMPALA can query them. Is this possible?
Thanks
Andrew
Created 09-25-2013 10:21 AM
Created 09-26-2013 03:09 AM
Created 09-26-2013 09:05 AM
Created 09-26-2013 12:15 PM
Ok, I see, I can embed JAVA in the config file to open a file and write the records. But will I not be opening lots of files? How can I batch it and roll the file like the HDFS Flume sink?
Created 09-26-2013 12:43 PM
Created 09-27-2013 07:07 AM
Thanks, I'll have a look.
I do however think you missed a trick by not having this as a stock command. Some of my colleagues dismissed morphines as only applicable to Solr. They wanted to do the same as me (have Flume flatten Avro when ingesting the data) but by-passed them and have PIG scripts running instead.
HDFS is the backbone of Hadoop so standard way for morphlines to write to it outside Solr, would have been great!
Created 09-27-2013 09:01 AM
Created 09-29-2013 11:28 PM
I could in the meantime write a custom morphline command to post the flattened avro onto another Flume agent?
Created 09-29-2013 11:35 PM