Because I don't think it works the way my supervisor think it works.
We're taking in a series of about 8 csv files from an FTP and these files are rather small (under 1MB). He's (rightfully) concerned that cluster size on HDFS is going to be wasted. So he wants to use the Merge Content processor to resolve this. He seems to believe that the Merge Content processor will 'collate' files with the same name, making a bigger single file.
To clarify: The way he wants it to work is if today's "sales_report.csv" comes in and there's already a "sales_report.csv" existing in the directory, he wants the new data from today's "sales_report.csv" to be added as new rows to the existing file. I hope that makes sense.
Instead, I'm getting very different results. I have the flow set up so that it picks the files up from the FTP, creates a directory on HDFS based on the folder, and then a subfolder based on the year. When I leave the MC processor out, this all works perfectly. When I put the MC processor in, I get three files - one of them has its original name and two of them have a long string of random characters. We're using the default settings for the Merge Content processor.
Sorry, I've written a bit of an essay. But based on what I've described above it does it sound like the MC processor is what he's looking for?