Created 07-03-2018 10:56 AM
Created on 07-03-2018 11:27 AM - edited 08-18-2019 02:41 AM
To parse Fixed width file you can use Replace Text processor and keep the matching regex that extracts each field into a capture group then use some delimiter while replacing the data
as shown above if you know how many words that each field is going to be then capture each field into one capture group then replace the content with some delimiter. In addition if your file having space delimiter then use this (.*)\s(.*) regex and replace with some delimiter.
Change the ReplaceText evolution mode to line-by-line, Now we are reading fixed width file then replacing the contents of flowfile with some delimiter.
Once you have delimiter on field then you can use Convert Record processor to read and write the data in your required format.
If the file is couple of gigs then it's better to split the file to small chunks before ReplaceText processor then feed the splitted file to Replace Text processor.
In addition there is scripted reader/writer controller service in ConvertRecord processor which allows to read the incoming flowfile by using the script that you have given and writes the flowfile contents as per your ScriptedWriter controller service configured.
Some references regarding parsing fixed width file are here and here,
References regarding splitting big csv file into smaller chunks are here and here
Created 07-04-2018 07:07 AM
Thanks @shu for timely help. Can you please help me around some docs with Nifi implementation at enterprise level especially clustered Nifi setup, parameters for performance & respositories set up in Oracle BDA.
Created 07-04-2018 08:41 PM
Please refer to this and this links describes how to install NiFi as Service and this link to setup high performance NiFi.