Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

NiFi: Avro message processing from Kafka

Solved Go to solution
Highlighted

NiFi: Avro message processing from Kafka

Rising Star

Each incoming Avro messages received from Kafka, will contain the schema in itself. However, when persisting I wanted to be able to group them to a sizeable chunk (say 250MB each file) and persist in HDFS. However, if we combine along with schema the entire file becomes unparsable because of the schema repeats? Is it possible to strip the schema and have a static reference, but instead write only the data content from Avro message?

Can SplitContent processor be used to strip the schema part?

1 ACCEPTED SOLUTION

Accepted Solutions

Re: NiFi: Avro message processing from Kafka

The MergeContent processor has a merge type of Avro which will merge together Avro messages that have the same schema. If you are sending in Avro messages with different schemas you will want to use the Correlation Attribute property to only merge messages of the same schema.

4 REPLIES 4

Re: NiFi: Avro message processing from Kafka

The MergeContent processor has a merge type of Avro which will merge together Avro messages that have the same schema. If you are sending in Avro messages with different schemas you will want to use the Correlation Attribute property to only merge messages of the same schema.

Re: NiFi: Avro message processing from Kafka

Rising Star

As MergeContent is just concatenating the binary content of the files, the resultant Avro file can no longer be parsed because there would be more than one header line with the schema defined. Is there an option/workaround to just strip the schema header from each message binary content, before they can be merged? We have the schema (assume just same schema) in a static file that can be referenced anytime, but wanted the final merged file just to have the content and not the header details with schema.

Re: NiFi: Avro message processing from Kafka

When you choose "Merge Format" of "Avro" it is not doing binary concatenation. It is merging all of the Avro files into a new valid Avro file with single header/schema entry. What you described would be "Merge Format" of "Binary Concatenation".

You can see the possible values for "Merge Format" here: https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi.processors.standard.MergeContent/i...

There are also unit tests that show it merging Avro records and then parsing the resulting Avro:

https://github.com/apache/nifi/blob/master/nifi-nar-bundles/nifi-standard-bundle/nifi-standard-proce...

Re: NiFi: Avro message processing from Kafka

Rising Star

Thank you. Avro - merge format matches our requirement!

Don't have an account?
Coming from Hortonworks? Activate your account here