Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

How to parse w/ fixed width instead of char delimited contents?

avatar
Contributor

I am trying to parse data from file contents that are generated by fixed width instead of by a delimiter. As a simplified example, the value for data attribute 1 is in position 1-2, for attribute 2 is in position 3-6, and attribute 3 is in position 7-8 in each line. Then, the file contents should be transformed as below.

Before

AABBBBCC

DDEEEEFF

After

AA;BBBB;CC

DD;EEEE;FF

I assume there may be a way to capture substrings per line? Please assist.

1 ACCEPTED SOLUTION

avatar
Expert Contributor

Kausha,

You can use ReplaceText to do this. In your example above, you can use a Replacement Strategy of "Regex Replace".

Set Evaluation Mode to "Line-by-Line"

The Search Value would then be:

(.{2})(.{4})(.{2})

And the Replacement Value would be:

$1;$2;$3

Does that help?

View solution in original post

19 REPLIES 19

avatar
Expert Contributor

Kausha,

You can use ReplaceText to do this. In your example above, you can use a Replacement Strategy of "Regex Replace".

Set Evaluation Mode to "Line-by-Line"

The Search Value would then be:

(.{2})(.{4})(.{2})

And the Replacement Value would be:

$1;$2;$3

Does that help?

avatar
Contributor

Yes. This works well, but is there a way to store the values as attributes. Ultimately, I want to use the AttributesToJSON processor.

avatar
Expert Contributor

You can use the ExtractText processor and provide it a regex also in order to pull the values into attributes. For example, you could have:

field1: (.{2}).{6}

field2: .{2}(.{4}).{2}

field3: .{6}(.{2})

This assumes, though, that each FlowFile has only a single line. You could use SplitText, for example, to split each FlowFile into a separate line perhaps? I think we may need more context about what you're trying to accomplish to provide a more detailed answer.

avatar
Contributor

I have assumed the following flow: GetFile --> ExtractText --> SplitText --> UpdateAttribute --> AttributesToJSON --> PutFile

I receive an error in PutFile. Below are my modified configurations

ExtractText - Enable Multiline Mode = True

SplitText - Line Split Count = 1; Header Line Count = 1

Update Attribute - Properties as suggested Att1 = (.{2}).{6}; Att2 = .{2}(.{4}).{2}; Att3 = .{6}(.{2})

AttributesToJSON Attributes List = Att1, Att2, Att3

What am I missing here?

avatar
Expert Contributor

What error do you see in PutFile?

avatar
Contributor

I am able to run the flow when I set the ExtractText-->Splittext connection for matched and unmatched, but with incorrect output: {"Att3":" .{6}(.{2})","Att2":".{2}(.{4}).{2}","Att1":"(.{2}).{6}"}.

Would it be more efficient to use the ReplaceTextWithMapping processor? I am unable to find a template with this processor and a relevant mapping file.

avatar
Explorer

Hi,

I am not able to replicate same example. I am getting output as $1;$2;$3 all the times. I am new to NiFi and I and not able to find where I am missing. I think I have not used Properties correctly.

avatar
New Contributor

Is it possible to output some part of the input text? 

For Example:

Input: AABBBBCC

Output: AA

avatar
New Contributor

This is working fine. Can we provide Search Value and Replacement Value as Variable or flowfile attribute. As I wanted to use same replace text processor to convert different input files with different number of columns. Basically I want to parameterised the Search Value and Replacement Value in replace text processor. @mpayne @ltsimps1 @kpulagam @jpercivall @other