Created 09-04-2018 03:25 PM
I am new to nifi. I would like to parse a fixed width file. The format of file will be something like below. There can be four record types.
First 3 digits will indicate the record type.
100 - File header
101 - Detail record
102 - Summary record
103 - File trailer
Sample content of File :
100
101John Doe0 001 N Front St, Pittsburgh, PA-12345 Engineer 111 W Encanto Blvd, Pitssburgh, PA-54321
101John Doe1 002 N Front St, Pittsburgh, PA-12345 Engineer 222 W Encanto Blvd, Pitssburgh, PA-54321
101John Doe2 003 N Front St, Pittsburgh, PA-12345 Engineer 333 W Encanto Blvd, Pitssburgh, PA-54321
101John Doe3 004 N Front St, Pittsburgh, PA-12345 Engineer 444 W Encanto Blvd, Pitssburgh, PA-54321
102Pittsburgh 4 200
101Chris Doe0 111 N Front St, Pittsburgh, PA-12345 Engineer 111 W Encanto Blvd, Pitssburgh, PA-54321
101Chris Doe1 222 N Front St, Pittsburgh, PA-12345 Engineer 222 W Encanto Blvd, Pitssburgh, PA-54321
102Pittsburgh 2 200
103 6 600
Format of Detail record
Column 4 to 17 - Name
Column 18 to 60 - Home Address
Column 61 to 79 - Role
Column 80 to 128 - Work Address
How do I identify record type ?
How do I perform substring logic on the above using "ReplaceText" processor or any other processors ?
I would like to convert this to json format.
Created 09-05-2018 01:10 AM
Use Query Record processor and read the incoming csv file(with some delimiter that doesn't exist in your data) then processor will read the whole line as one field.
Now we are making use of substring function we can prepare each field value from the csv file Add new query like
select substring(<start_position>,<end_position>) col1 ...,substring(<start_position>,<end_position>) coln from flowfile
We can also add case statements to add record type value If 100 -> FileHeader ..etc.
Configure the Record Writer controller service as JsonSetWriter then the processor will writes the output flow file in json format.
Refer to this and this for more details regards to QueryRecord Processor usage.
(or)
We can extract only the first line of the csv file using Extract text processor and add as attribute to the flow file, by using the attribute value you can identify record type.
To parse fixed width file add regex that captures the characters for the fields and replace with some delimiter.
Then by using Convert record processor we can convert to json format.
Refer to this for more details regards to ReplaceText configs.
-
If the Answer helped to resolve your issue, Click on Accept button below to accept the answer, That would be great help to Community users to find solution quickly for these kind of issues.
Created 01-06-2020 08:07 AM
@Shu_ashu
I have different incoming files and i have the schema of the file in an attribute but i cannot set the schema manually in the schema registry as it's generated after receiving the file.
Is there any way i can add the schema into the schema registry dynamically or on the fly instead of adding it manually.
Or any other way i leverage the schema and convert the file to Avro ?
assuming the files may or may not have a header.
Created 01-06-2020 04:40 PM
@J1TEN Please open a new case/question versus responding to old topic.
Also, take a look at the articles section, I just posted how to use Schema Registry API and another example how to do it in NiFi