Archives of Support Questions (Read Only)

This is an archived board for historical reference. Information and links may no longer be available or relevant
Announcements
This board is archived and read-only for historical reference. To ask a new question, please post a new topic on the appropriate active board.

How to parse w/ fixed width instead of char delimited contents?

avatar
New Member

I am trying to parse data from file contents that are generated by fixed width instead of by a delimiter. As a simplified example, the value for data attribute 1 is in position 1-2, for attribute 2 is in position 3-6, and attribute 3 is in position 7-8 in each line. Then, the file contents should be transformed as below.

Before

AABBBBCC

DDEEEEFF

After

AA;BBBB;CC

DD;EEEE;FF

I assume there may be a way to capture substrings per line? Please assist.

1 ACCEPTED SOLUTION

avatar
Expert Contributor

Kausha,

You can use ReplaceText to do this. In your example above, you can use a Replacement Strategy of "Regex Replace".

Set Evaluation Mode to "Line-by-Line"

The Search Value would then be:

(.{2})(.{4})(.{2})

And the Replacement Value would be:

$1;$2;$3

Does that help?

View solution in original post

19 REPLIES 19

avatar
Expert Contributor

Kausha,

You can use ReplaceText to do this. In your example above, you can use a Replacement Strategy of "Regex Replace".

Set Evaluation Mode to "Line-by-Line"

The Search Value would then be:

(.{2})(.{4})(.{2})

And the Replacement Value would be:

$1;$2;$3

Does that help?

avatar
New Member

Yes. This works well, but is there a way to store the values as attributes. Ultimately, I want to use the AttributesToJSON processor.

avatar
Expert Contributor

You can use the ExtractText processor and provide it a regex also in order to pull the values into attributes. For example, you could have:

field1: (.{2}).{6}

field2: .{2}(.{4}).{2}

field3: .{6}(.{2})

This assumes, though, that each FlowFile has only a single line. You could use SplitText, for example, to split each FlowFile into a separate line perhaps? I think we may need more context about what you're trying to accomplish to provide a more detailed answer.

avatar
New Member

I have assumed the following flow: GetFile --> ExtractText --> SplitText --> UpdateAttribute --> AttributesToJSON --> PutFile

I receive an error in PutFile. Below are my modified configurations

ExtractText - Enable Multiline Mode = True

SplitText - Line Split Count = 1; Header Line Count = 1

Update Attribute - Properties as suggested Att1 = (.{2}).{6}; Att2 = .{2}(.{4}).{2}; Att3 = .{6}(.{2})

AttributesToJSON Attributes List = Att1, Att2, Att3

What am I missing here?

avatar
Expert Contributor

What error do you see in PutFile?

avatar
New Member

I am able to run the flow when I set the ExtractText-->Splittext connection for matched and unmatched, but with incorrect output: {"Att3":" .{6}(.{2})","Att2":".{2}(.{4}).{2}","Att1":"(.{2}).{6}"}.

Would it be more efficient to use the ReplaceTextWithMapping processor? I am unable to find a template with this processor and a relevant mapping file.

avatar
New Member

Hi,

I am not able to replicate same example. I am getting output as $1;$2;$3 all the times. I am new to NiFi and I and not able to find where I am missing. I think I have not used Properties correctly.

avatar
New Member

Is it possible to output some part of the input text? 

For Example:

Input: AABBBBCC

Output: AA

avatar
Visitor

This is working fine. Can we provide Search Value and Replacement Value as Variable or flowfile attribute. As I wanted to use same replace text processor to convert different input files with different number of columns. Basically I want to parameterised the Search Value and Replacement Value in replace text processor. @mpayne @ltsimps1 @kpulagam @jpercivall @other