Member since
04-16-2020
2
Posts
0
Kudos Received
0
Solutions
05-05-2020
03:56 AM
Hi Matt, Thanks for your response! 1. We don't have any success relationship twice. I know if we drag it twice then it will create duplicate copy of flow file. Yes, you are right in my case I am using multiple ValidateCsv processor to apply different validation on CSV data. Note that I have connected the ValidateCsv processor one by one i.e. each processor is responsible to do different kind of validation(Ex: Null Check, Empty Check,Unique,Equals etc.). 2. In this case if each flow file contains few invalid record then it will split into two Flow Files (one with invalid records and another with some valid records). Again these two Flow File are input to next ValidateCsv processor, here also again if both Flow File contains few invalid record then again they are going to split into two and so on. Here my problem is each Flow File after split having same attributes like fragment.count, fragment.identifier,fragment.index, I want to avoid these duplicate Flow Files after all validation is done. 3. I am not trying to reassemble the exact original Flow File, I simply want to create a single Flow File which is going to trigger downstream processing/processor only once. 4. MergeContent processor configuration Merge Strategy : Defragment Attribute Strategy : Keep All Unique Attributes (Note all other properties are default one) Important: We have clustered NiFi instance. 5. I have tried by using Bin-Packing Algorithm and set fragment.identifier as the "Correlation Attribute Name". And by setting the "max bin age" It is working. But we don't have any idea about the amount of data we are going to process and how much time it will take to process data as we are creating these kind of flows dynamically using NiFi REST API in different environment. In this case we can't set approximate value of "max bin age" instead of that we want to trigger downstream processing immediately after first processing is complete. I am looking for a logic which will provide me a single Flow File out of these multiple Flow File. Thank you, Prashant @MattWho @sahoo_samarendr
... View more
04-29-2020
06:48 AM
Hi, I have NiFi flow where I am doing data validation using ValidateCsv processor(Line by line validation). I have N number of processor connected one after another i.e. serially. Note that I am connecting both valid and invalid relationship to next ValidateCsv processor and so on. So here each ValidateCsv processor could create 2 flow file i.e 1 valid and another is invalid or only valid or invalid with common attributes. After doing validation I need to trigger next processor only once(Ex. Execute-SQL to ingest data from another table), to achieve this I am using merge content (Using Defragment strategy) processor to combine all flow-file to 1. But all flow-file are not merging together as few flow-file are having same attribute like fragment.index. Can you please suggest how can I wait and combine all flow-files to 1 flow-file so that I can trigger next processor only once. @sahoo_samarendr
... View more
Labels:
- Labels:
-
Apache NiFi