Support Questions

don_diego70 · ‎05-05-2019

I am having trouble with my dataflow using Nifi, Sure because I didn't understand how Wit/Notify Works. Here is my basic data flow:
- I listed my s3 bucket containing zip files.
- Extracted my Json file from the zip. Each Json is 2Gb up to 3 Gb.
- I then split my json several times to avoid 'Out of memory' to obtain sing records FlowFiles(Here comes the trouble).
- I want to flatten each flowfile
- I want to merge my flattened FF to re obtain the original Json. I know I cannot do with 'Merge Content' as my json goes through several split processors.

Can you please explain how to use the Wait/Notify?

Does the wait automatically merges the flowfiles?

Because I did not understand. I also looked at this post but it is still not clear.
Thanks

MattWho · ‎05-08-2019

@3nomis

MergeContent should be using Defragment.
There is no default value for Max Bin Age, so not sure what you set there. If left blank, processor will wait for ever to merge a bin unless you run out of bins.
Also make sure you adjust the object and size thresholds on the connections feeding the MergeContent processors so that they are large enough to accommodate the number of splits that need to be merged.
Considering the size of the FlowFiles being merged, it may take time to merge all of them.
as far as bins, try setting 21 of them.

Thanks,

Matt

Cloudera Community

Support Questions

Merge big Json using Wait/Notify in Apache Nifi