Support Questions

Find answers, ask questions, and share your expertise

Divide file in nifi

avatar
Contributor

I've got a data file that needs to be divided.
Inside the text file, the sections are divided:

  • # @ - the beginning of the section with data,
  • # $ is the start of the data block,
  • Next # means the end of the Section.

How can I do it?

1 ACCEPTED SOLUTION

avatar
Super Collaborator

Is your data on HDFS? If so, you would use the GetHDFS processor to load your file into a FlowFile. If your data is on your local NiFi node, then you would use a GetFile processor to load the file.

Next if you want to split by newline, you could use SplitText processor to split your file into multiple FlowFiles. If you only want to split by your '#@' and '#$' you can use the SplitContent processor. That processor will split based on a sequence of text characters (set the 'Byte Sequence Format' to 'text') so you can put in '#@' to split on. I'm not sure exactly how you'd like to divide your data but that should give you a starting point. You can chain multiple of these SplitContent processors together to split on multiple character sequences. Ultimately, your one file on disk will be converted into multiple FlowFiles in NiFi.

Take a look at the SplitContent processor for more info: https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi/nifi-standard-nar/1.6.0/org.apache...

View solution in original post

4 REPLIES 4

avatar

Hey @Vladislav Shcherbakov!
You can try to use the ExtratText processor and add a parameter for each value that you wanna get using Regex.
https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi/nifi-standard-nar/1.5.0/org.apache...

Hope this helps

avatar
Contributor

If you could show me an example, it would be rly nice, 'cause I don't quite understand the principle of action... Thanks!

avatar
Super Collaborator

Is your data on HDFS? If so, you would use the GetHDFS processor to load your file into a FlowFile. If your data is on your local NiFi node, then you would use a GetFile processor to load the file.

Next if you want to split by newline, you could use SplitText processor to split your file into multiple FlowFiles. If you only want to split by your '#@' and '#$' you can use the SplitContent processor. That processor will split based on a sequence of text characters (set the 'Byte Sequence Format' to 'text') so you can put in '#@' to split on. I'm not sure exactly how you'd like to divide your data but that should give you a starting point. You can chain multiple of these SplitContent processors together to split on multiple character sequences. Ultimately, your one file on disk will be converted into multiple FlowFiles in NiFi.

Take a look at the SplitContent processor for more info: https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi/nifi-standard-nar/1.6.0/org.apache...

avatar
Contributor

Files on FTP server. I got them with ListFTP and FetchFTP.
Then I use RouteText(I think) to filter them by name.
And after it I need to parsing data with divide into parts(with SplitContent, I'll try).
And unload data on sql server with PutDatabaseRecord.