Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

NiFi split text processor behaviour

Highlighted

NiFi split text processor behaviour

New Contributor

Hi,

I am using SplitText processor to split the files based on the line count. Does this processor always create the split files in the order of records present in the file? Below is an example for my query,

Say I have a file with 100 records & I have specified the line count to be 10.

1. Will the first 10 records always go to the first split file & the downstream processor will get this file immediately to be processed?
2. In our example there will be 10 split files created. Will the order be maintained in the subsequent processing(assuming that I have set the concurrency as '1') like split-0,split-1...split-10.


Thanks & Regards,
R.Rohit

1 REPLY 1

Re: NiFi split text processor behaviour

Master Guru

@Rohitravi 

The None of NiFi's processors will release any FlowFiles to a downstream connection until the end of the thread operation.  This is to protect users from dataloss and in some cases data duplication in the result of a failure.

In the case of a SplitText processor you have configured to split on every 10 lines.  The processor will stream the content of the first 10 lines in to a content claim in the content_repository and create a new FlowFile record pointing at that claim.  The next 10 lines may or may not go into that same content claim and another FlowFile record is created. above process continues until all splits have been created.  Then The processor releases all FlowFile created to the downstream connection at the same time.

NiFi does not guarantee FlowFile processing order. You can adding the FirstInFirstOutPrioritizer to the downstream connections to help with ordering some.

Hope this helps,

Matt

Don't have an account?
Coming from Hortonworks? Activate your account here