Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

How to parse w/ fixed width instead of char delimited contents?

Solved Go to solution

Re: How to parse w/ fixed width instead of char delimited contents?

Ah so you also want to extract the text to be attributes of the flowfile. Is the structure of the contents only ever two lines and do you want to create JSON using both of those lines or split them into separate flowfiles?

Re: How to parse w/ fixed width instead of char delimited contents?

New Contributor

The number of rows for the flow files will vary. Each line of data will represent a record/item. Also, I want the data output in its original file, not in separate flowfiles.

Re: How to parse w/ fixed width instead of char delimited contents?

What's your end goal JSON schema look like? What identifiers are each of the values going to use (for every line)? As a note, AttributesToJSON only creates flat JSON objects (no nested fields).

Re: How to parse w/ fixed width instead of char delimited contents?

New Contributor

The output, based on this example would be JSON array

[{"field1":"AA","field2":"BBBB","field3":"CC"},{"field1":"DD","field2":"EEEE","field3":"FF"}]

Re: How to parse w/ fixed width instead of char delimited contents?

Assuming you are okay with using Hive for this, you would just create a table with one column (column name something like row) and then load the whole file into that table. Run a query to then split the columns and insert in another table.

Here are more details and code snippet. https://martin.atlassian.net/wiki/pages/viewpage.action?pageId=21299205

Re: How to parse w/ fixed width instead of char delimited contents?

didnt realize the question was about nifi.. my bad.

Re: How to parse w/ fixed width instead of char delimited contents?

Guru

I agree with @jpercivall and @mpayne ReplaceText is the best way to go. I created a quick workflow that you can reference. This was assuming the input of AABBBBCC as you suggested. You can change the GetFile path and PutFile path, and the regex in ReplaceText to test with your real data.fixedwidthexample.xml

Re: How to parse w/ fixed width instead of char delimited contents?

New Contributor

I have attempted, unsuccessfully, using the ReplaceText processor. The method works if I have a small/set number of lines in my file.

Do you have any guidance on the ReplaceTextWithMapping processor and how the mapping file should be formatted?

Don't have an account?
Coming from Hortonworks? Activate your account here