Ah so you also want to extract the text to be attributes of the flowfile. Is the structure of the contents only ever two lines and do you want to create JSON using both of those lines or split them into separate flowfiles?
The number of rows for the flow files will vary. Each line of data will represent a record/item. Also, I want the data output in its original file, not in separate flowfiles.
What's your end goal JSON schema look like? What identifiers are each of the values going to use (for every line)? As a note, AttributesToJSON only creates flat JSON objects (no nested fields).
The output, based on this example would be JSON array
Assuming you are okay with using Hive for this, you would just create a table with one column (column name something like row) and then load the whole file into that table. Run a query to then split the columns and insert in another table.
Here are more details and code snippet. https://martin.atlassian.net/wiki/pages/viewpage.action?pageId=21299205
I agree with @jpercivall and @mpayne ReplaceText is the best way to go. I created a quick workflow that you can reference. This was assuming the input of AABBBBCC as you suggested. You can change the GetFile path and PutFile path, and the regex in ReplaceText to test with your real data.fixedwidthexample.xml
I have attempted, unsuccessfully, using the ReplaceText processor. The method works if I have a small/set number of lines in my file.
Do you have any guidance on the ReplaceTextWithMapping processor and how the mapping file should be formatted?