Support Questions

Find answers, ask questions, and share your expertise

NiFi - JSON to MongoDB - problems with array

avatar
Explorer

I am a new NiFi and MongoDB user. I'm attempting to create an ingest process that first accepts a JSON file that will provide the information about a project (project submission id, project name, project submittor, project attachments (array of documents). I am able to parse the JSON information until I get to the array of project attachments. The project attachments field is an array with the name of one or more project documents (ex: attachment1.docx, attachment2.ppt, etc). The end goal is to put each project submission in MongoDB as a document. The project attachments field must be an array so I can reference them later when parsing the metadata from each of the attachment files. I want to be able to nest the metadata for each attachment to correspond to each attachment file in the array. 

 

So far, I am the problem I am running into is that the final project attachments array somehow becomes a string instead of an array after passing through the AttributesToJSON processor. I can see that the JSON looks fine until it gets to this processor then after AttributesToJSON, suddenly there are quotes around the array like this: "file_attachment" : "[\"attachment1.docx\",\"attachment2.docx\",\"attachment3.docx\",\"attachment4.docx\"]"

Because of this, the PutMongoRecord processor enters the array in MongoDB as a string instead of array.

 

I'm using the following processors:

GetFile - read in  json file 

SplitJson 

EvaluateJsonPath - destination = flowfile-attribute, return type=json, renaming some json fields as they are converted to flowfile attributes (submission id = $.id, abstract = $.abstract, file_attachment = $.data.[*] <---COULD this be part of the problem?)

 

NOTE: after passing through this processor, file_attachment shows up like this in the attributes: 

["attachment1.docx","attachment2.docx","attachment3.docx","attachment4.docx"]"
 

AttributesToJSON - specify the flowfile attributes to write to flowfile content. When viewing the flow-file content the fields have converted correctly EXCEPT for tile attachment:

"file_attachment" : "[\"attachment1.docx\",\"attachment2.docx\",\"attachment3.docx\",\"attachment4.docx\"]"

 

Any idea how to get around this issue? Thank you for any suggestions!

 

1 ACCEPTED SOLUTION

avatar
Super Guru

Regarding:

"file_attachment" : "[\"attachment1.docx\",\"attachment2.docx\",\"attachment3.docx\",\"attachment4.docx\"]"

 

The processor translates the object as a string.  You will need to take some action on it to prepare it as you want it to be downstream.  Assuming you are working with content of FlowFile, the processor you need is ReplaceText and the syntax is:

 

${'$1':unescapeJson()}

 

To dial it even further so its not a string at all:

 

${'$1':unescapeJson():replace('"[','['):replace(']"',']')}

 

Here is same in the ReplaceText processor properties:

Screen Shot 2019-12-29 at 9.51.48 AM.png

 

If you are operating on an attribute, you can do the similar action with UpdateAttribute:

 

${attributeName:unescapeJson()}

 

 

If this reply answers your question please mark it as a Solution.

 

View solution in original post

4 REPLIES 4

avatar

The first thing that comes to mind is not select $.data.* but something like $.data.file_attachment or $.data.file_attachment.*

 

Does this bring you (closer) to the answer?

 

If there are still simple things you want to change in the text, you could use this workaround:

In the update Attribute processor use something like replaceall. 

 

Hope this helps, but also curious if there are other things relevant here. 


- Dennis Jaheruddin

If this answer helped, please mark it as 'solved' and/or if it is valuable for future readers please apply 'kudos'.

avatar
Explorer

Thank you for the input, this was helpful advice!

avatar
Super Guru

Regarding:

"file_attachment" : "[\"attachment1.docx\",\"attachment2.docx\",\"attachment3.docx\",\"attachment4.docx\"]"

 

The processor translates the object as a string.  You will need to take some action on it to prepare it as you want it to be downstream.  Assuming you are working with content of FlowFile, the processor you need is ReplaceText and the syntax is:

 

${'$1':unescapeJson()}

 

To dial it even further so its not a string at all:

 

${'$1':unescapeJson():replace('"[','['):replace(']"',']')}

 

Here is same in the ReplaceText processor properties:

Screen Shot 2019-12-29 at 9.51.48 AM.png

 

If you are operating on an attribute, you can do the similar action with UpdateAttribute:

 

${attributeName:unescapeJson()}

 

 

If this reply answers your question please mark it as a Solution.

 

avatar
Explorer

Thank you, this worked perfectly!