- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
How to parse w/ fixed width instead of char delimited contents?
- Labels:
-
Apache NiFi
Created ‎01-14-2016 04:49 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I am trying to parse data from file contents that are generated by fixed width instead of by a delimiter. As a simplified example, the value for data attribute 1 is in position 1-2, for attribute 2 is in position 3-6, and attribute 3 is in position 7-8 in each line. Then, the file contents should be transformed as below.
Before
AABBBBCC
DDEEEEFF
After
AA;BBBB;CC
DD;EEEE;FF
I assume there may be a way to capture substrings per line? Please assist.
Created ‎01-14-2016 04:54 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Kausha,
You can use ReplaceText to do this. In your example above, you can use a Replacement Strategy of "Regex Replace".
Set Evaluation Mode to "Line-by-Line"
The Search Value would then be:
(.{2})(.{4})(.{2})
And the Replacement Value would be:
$1;$2;$3
Does that help?
Created ‎01-14-2016 04:57 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
This is a bit tricky and will require a bit of regex magic. You will want to capture each length of data attribute (2, 4 and 2 respectively) into capture groups then use those capture groups to replace the content.
You'll use the ReplaceText processor with a search value of "(.{2})(.{4})(.{2})" and a replacement value of "$1;$2;$3" and configure it to evaluate line by line. This will go through the contents of the flowfile line by line and replace the contents like you want.
Comment below if you run into any problems!
Created ‎01-14-2016 05:14 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I have posted to @mpayne also: "Yes. This works well, but is there a way to store the values as attributes?" Ultimately, I want to use the AttributesToJSON processor.
Created ‎01-14-2016 05:19 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Ah so you also want to extract the text to be attributes of the flowfile. Is the structure of the contents only ever two lines and do you want to create JSON using both of those lines or split them into separate flowfiles?
Created ‎01-14-2016 05:59 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The number of rows for the flow files will vary. Each line of data will represent a record/item. Also, I want the data output in its original file, not in separate flowfiles.
Created ‎01-14-2016 06:09 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
What's your end goal JSON schema look like? What identifiers are each of the values going to use (for every line)? As a note, AttributesToJSON only creates flat JSON objects (no nested fields).
Created ‎01-15-2016 02:07 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The output, based on this example would be JSON array
[{"field1":"AA","field2":"BBBB","field3":"CC"},{"field1":"DD","field2":"EEEE","field3":"FF"}]
Created ‎01-14-2016 05:11 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Assuming you are okay with using Hive for this, you would just create a table with one column (column name something like row) and then load the whole file into that table. Run a query to then split the columns and insert in another table.
Here are more details and code snippet. https://martin.atlassian.net/wiki/pages/viewpage.action?pageId=21299205
Created ‎01-14-2016 05:16 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
didnt realize the question was about nifi.. my bad.
Created ‎01-14-2016 05:14 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I agree with @jpercivall and @mpayne ReplaceText is the best way to go. I created a quick workflow that you can reference. This was assuming the input of AABBBBCC as you suggested. You can change the GetFile path and PutFile path, and the regex in ReplaceText to test with your real data.fixedwidthexample.xml
Created ‎01-15-2016 02:35 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I have attempted, unsuccessfully, using the ReplaceText processor. The method works if I have a small/set number of lines in my file.
Do you have any guidance on the ReplaceTextWithMapping processor and how the mapping file should be formatted?

- « Previous
-
- 1
- 2
- Next »