Created on 12-27-2019 05:20 PM - last edited on 12-27-2019 06:28 PM by ask_bill_brooks
I am a new NiFi and MongoDB user. I'm attempting to create an ingest process that first accepts a JSON file that will provide the information about a project (project submission id, project name, project submittor, project attachments (array of documents). I am able to parse the JSON information until I get to the array of project attachments. The project attachments field is an array with the name of one or more project documents (ex: attachment1.docx, attachment2.ppt, etc). The end goal is to put each project submission in MongoDB as a document. The project attachments field must be an array so I can reference them later when parsing the metadata from each of the attachment files. I want to be able to nest the metadata for each attachment to correspond to each attachment file in the array.
So far, I am the problem I am running into is that the final project attachments array somehow becomes a string instead of an array after passing through the AttributesToJSON processor. I can see that the JSON looks fine until it gets to this processor then after AttributesToJSON, suddenly there are quotes around the array like this: "file_attachment" : "[\"attachment1.docx\",\"attachment2.docx\",\"attachment3.docx\",\"attachment4.docx\"]"
Because of this, the PutMongoRecord processor enters the array in MongoDB as a string instead of array.
I'm using the following processors:
GetFile - read in json file
SplitJson
EvaluateJsonPath - destination = flowfile-attribute, return type=json, renaming some json fields as they are converted to flowfile attributes (submission id = $.id, abstract = $.abstract, file_attachment = $.data.[*] <---COULD this be part of the problem?)
NOTE: after passing through this processor, file_attachment shows up like this in the attributes:
AttributesToJSON - specify the flowfile attributes to write to flowfile content. When viewing the flow-file content the fields have converted correctly EXCEPT for tile attachment:
"file_attachment" : "[\"attachment1.docx\",\"attachment2.docx\",\"attachment3.docx\",\"attachment4.docx\"]"
Any idea how to get around this issue? Thank you for any suggestions!
Created 12-29-2019 06:56 AM
Regarding:
"file_attachment" : "[\"attachment1.docx\",\"attachment2.docx\",\"attachment3.docx\",\"attachment4.docx\"]"
The processor translates the object as a string. You will need to take some action on it to prepare it as you want it to be downstream. Assuming you are working with content of FlowFile, the processor you need is ReplaceText and the syntax is:
${'$1':unescapeJson()}
To dial it even further so its not a string at all:
${'$1':unescapeJson():replace('"[','['):replace(']"',']')}
Here is same in the ReplaceText processor properties:
If you are operating on an attribute, you can do the similar action with UpdateAttribute:
${attributeName:unescapeJson()}
If this reply answers your question please mark it as a Solution.
Created 12-28-2019 02:15 AM
The first thing that comes to mind is not select $.data.* but something like $.data.file_attachment or $.data.file_attachment.*
Does this bring you (closer) to the answer?
If there are still simple things you want to change in the text, you could use this workaround:
In the update Attribute processor use something like replaceall.
Hope this helps, but also curious if there are other things relevant here.
Created 12-30-2019 05:53 AM
Thank you for the input, this was helpful advice!
Created 12-29-2019 06:56 AM
Regarding:
"file_attachment" : "[\"attachment1.docx\",\"attachment2.docx\",\"attachment3.docx\",\"attachment4.docx\"]"
The processor translates the object as a string. You will need to take some action on it to prepare it as you want it to be downstream. Assuming you are working with content of FlowFile, the processor you need is ReplaceText and the syntax is:
${'$1':unescapeJson()}
To dial it even further so its not a string at all:
${'$1':unescapeJson():replace('"[','['):replace(']"',']')}
Here is same in the ReplaceText processor properties:
If you are operating on an attribute, you can do the similar action with UpdateAttribute:
${attributeName:unescapeJson()}
If this reply answers your question please mark it as a Solution.
Created 12-30-2019 05:51 AM
Thank you, this worked perfectly!