Support Questions

satya_h2004 · ‎09-04-2018

I am new to nifi. I would like to parse a fixed width file. The format of file will be something like below. There can be four record types.

First 3 digits will indicate the record type.

100 - File header

101 - Detail record

102 - Summary record

103 - File trailer

Sample content of File :

100

101John Doe0 001 N Front St, Pittsburgh, PA-12345 Engineer 111 W Encanto Blvd, Pitssburgh, PA-54321

101John Doe1 002 N Front St, Pittsburgh, PA-12345 Engineer 222 W Encanto Blvd, Pitssburgh, PA-54321

101John Doe2 003 N Front St, Pittsburgh, PA-12345 Engineer 333 W Encanto Blvd, Pitssburgh, PA-54321

101John Doe3 004 N Front St, Pittsburgh, PA-12345 Engineer 444 W Encanto Blvd, Pitssburgh, PA-54321

102Pittsburgh 4 200

101Chris Doe0 111 N Front St, Pittsburgh, PA-12345 Engineer 111 W Encanto Blvd, Pitssburgh, PA-54321

101Chris Doe1 222 N Front St, Pittsburgh, PA-12345 Engineer 222 W Encanto Blvd, Pitssburgh, PA-54321

102Pittsburgh 2 200

103 6 600

Format of Detail record

Column 4 to 17 - Name

Column 18 to 60 - Home Address

Column 61 to 79 - Role

Column 80 to 128 - Work Address

How do I identify record type ?

How do I perform substring logic on the above using "ReplaceText" processor or any other processors ?

I would like to convert this to json format.

Shu_ashu · ‎09-05-2018

@Satya H

Use Query Record processor and read the incoming csv file(with some delimiter that doesn't exist in your data) then processor will read the whole line as one field.

Now we are making use of substring function we can prepare each field value from the csv file Add new query like

select substring(<start_position>,<end_position>) col1 ...,substring(<start_position>,<end_position>) coln from flowfile

We can also add case statements to add record type value If 100 -> FileHeader ..etc.

Configure the Record Writer controller service as JsonSetWriter then the processor will writes the output flow file in json format.

Refer to this and this for more details regards to QueryRecord Processor usage.

(or)

We can extract only the first line of the csv file using Extract text processor and add as attribute to the flow file, by using the attribute value you can identify record type.

To parse fixed width file add regex that captures the characters for the fields and replace with some delimiter.

Then by using Convert record processor we can convert to json format.

Refer to this for more details regards to ReplaceText configs.

-

If the Answer helped to resolve your issue, Click on Accept button below to accept the answer, That would be great help to Community users to find solution quickly for these kind of issues.

J1TEN · ‎01-06-2020

@Shu_ashu
I have different incoming files and i have the schema of the file in an attribute but i cannot set the schema manually in the schema registry as it's generated after receiving the file.

Is there any way i can add the schema into the schema registry dynamically or on the fly instead of adding it manually.

Or any other way i leverage the schema and convert the file to Avro ?
assuming the files may or may not have a header.

stevenmatison · ‎01-06-2020

@J1TEN Please open a new case/question versus responding to old topic.

Also, take a look at the articles section, I just posted how to use Schema Registry API and another example how to do it in NiFi

Cloudera Community

Support Questions

nifi - Fixed length file parse based on record type