Created 01-14-2016 04:49 PM
I am trying to parse data from file contents that are generated by fixed width instead of by a delimiter. As a simplified example, the value for data attribute 1 is in position 1-2, for attribute 2 is in position 3-6, and attribute 3 is in position 7-8 in each line. Then, the file contents should be transformed as below.
Before
AABBBBCC
DDEEEEFF
After
AA;BBBB;CC
DD;EEEE;FF
I assume there may be a way to capture substrings per line? Please assist.
Created 01-14-2016 04:54 PM
Kausha,
You can use ReplaceText to do this. In your example above, you can use a Replacement Strategy of "Regex Replace".
Set Evaluation Mode to "Line-by-Line"
The Search Value would then be:
(.{2})(.{4})(.{2})
And the Replacement Value would be:
$1;$2;$3
Does that help?
Created 01-14-2016 04:54 PM
Kausha,
You can use ReplaceText to do this. In your example above, you can use a Replacement Strategy of "Regex Replace".
Set Evaluation Mode to "Line-by-Line"
The Search Value would then be:
(.{2})(.{4})(.{2})
And the Replacement Value would be:
$1;$2;$3
Does that help?
Created 01-14-2016 05:13 PM
Yes. This works well, but is there a way to store the values as attributes. Ultimately, I want to use the AttributesToJSON processor.
Created 01-14-2016 06:04 PM
You can use the ExtractText processor and provide it a regex also in order to pull the values into attributes. For example, you could have:
field1: (.{2}).{6}
field2: .{2}(.{4}).{2}
field3: .{6}(.{2})
This assumes, though, that each FlowFile has only a single line. You could use SplitText, for example, to split each FlowFile into a separate line perhaps? I think we may need more context about what you're trying to accomplish to provide a more detailed answer.
Created 01-14-2016 06:36 PM
I have assumed the following flow: GetFile --> ExtractText --> SplitText --> UpdateAttribute --> AttributesToJSON --> PutFile
I receive an error in PutFile. Below are my modified configurations
ExtractText - Enable Multiline Mode = True
SplitText - Line Split Count = 1; Header Line Count = 1
Update Attribute - Properties as suggested Att1 = (.{2}).{6}; Att2 = .{2}(.{4}).{2}; Att3 = .{6}(.{2})
AttributesToJSON Attributes List = Att1, Att2, Att3
What am I missing here?
Created 01-14-2016 08:45 PM
What error do you see in PutFile?
Created 01-15-2016 02:29 AM
I am able to run the flow when I set the ExtractText-->Splittext connection for matched and unmatched, but with incorrect output: {"Att3":" .{6}(.{2})","Att2":".{2}(.{4}).{2}","Att1":"(.{2}).{6}"}.
Would it be more efficient to use the ReplaceTextWithMapping processor? I am unable to find a template with this processor and a relevant mapping file.
Created 05-08-2017 08:08 PM
Hi,
I am not able to replicate same example. I am getting output as $1;$2;$3 all the times. I am new to NiFi and I and not able to find where I am missing. I think I have not used Properties correctly.
Created 10-28-2022 02:58 PM
Is it possible to output some part of the input text?
For Example:
Input: AABBBBCC
Output: AA
Created 12-15-2022 05:24 AM
This is working fine. Can we provide Search Value and Replacement Value as Variable or flowfile attribute. As I wanted to use same replace text processor to convert different input files with different number of columns. Basically I want to parameterised the Search Value and Replacement Value in replace text processor. @mpayne @ltsimps1 @kpulagam @jpercivall @other