Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

puthdfs merge avro corrups file

puthdfs merge avro corrups file

New Contributor

Hi, I am working with a big customer on a nifi flow that reads from kafka and is supposed to merge messages into an avro file on hdfs. To do so we use the hdfs put processor with the append setting. Sadly the append breaks the avro files and reading it using avro-tools gives invalid sync! exception. I looked at the code in the puthdfs processor which uses FileSystem.append. There is no magic in there to merge avro files. How should i proceed?

Regards

2 REPLIES 2
Highlighted

Re: puthdfs merge avro corrups file

You should use the MergeContent processor with a Merge Format of Avro before PutHDFS.

Re: puthdfs merge avro corrups file

New Contributor

Hi Byran ,

I have avro data in kafka which I have to consume and write to hdfs, I'm able to consume data from kafka, but while using MergeContent processor, to combine the avro data and write to hdfs in large files I'm facing error.

For,

MergeStrategy - Defragment

Merge Format - Avro

Error: Failed to process bundle of 0 files due to not open; rolling back sessions:org.apache.avro.AvroRuntimeException:not open


For,

MergeStrategy - Bin Packing Algorithm

Merge Format - Avro

Error: java.io.IOException: Not a data file


Can you please suggest if I'm missing any configuration/s to make it work.

Thanks in advance.

Don't have an account?
Coming from Hortonworks? Activate your account here