Newbie question: I'm working on an ingest process that accepts a group of files, routes one to MongoDB and inserts it. This creates a basic structure for the document. Fields included are:_id - (ObjectId automatically added by MongoDB, id, author, submission_type, submission_name, abstract, data (array). The process assumes a json file for the basic structure and attachments for the submission. I am able to use ListFile, FetchFile, and RouteOnAttribute files to grab the files. I use RouteOnAttribute to separate the json structure file from the attachment files. The json file is inserted into MongoDB using PutMongoRecord. Everything is inserted fine.
The issue comes with the attachment files. They are routed to ExtractMediaMetadata to grab metadata attributes. Using processors SplitJson, EvaluateJsonPath, and AttributesToJson, I gather all the attributes to insert in a MongoDB record. Now I want to match the filename attributes from the attachment files to the mongoDB document I just inserted, matching on db.collection.data which is an array.
Example of JSON MongoDB insert:
author: "John Q Sample"
submission_name : "Classical Compilation"
abstract : "This submission.... ..... .... more abstract info"
data : Array
0 : "File1.docx"
1 : "File2.docx"
2 : "File3.ppt"
3 : "File4.xls"
The filename attributes of the attachment files will match the filenames stored in the data array that was inserted into MongoDB. Is there a way that I can use NiFi to query the MongoDB document (db.collection.data, look at array and match the filenames) then insert the json attributes from the attachment files into the data array? Any advice would be appreciated!
... View more
I am a new NiFi and MongoDB user. I'm attempting to create an ingest process that first accepts a JSON file that will provide the information about a project (project submission id, project name, project submittor, project attachments (array of documents). I am able to parse the JSON information until I get to the array of project attachments. The project attachments field is an array with the name of one or more project documents (ex: attachment1.docx, attachment2.ppt, etc). The end goal is to put each project submission in MongoDB as a document. The project attachments field must be an array so I can reference them later when parsing the metadata from each of the attachment files. I want to be able to nest the metadata for each attachment to correspond to each attachment file in the array.
So far, I am the problem I am running into is that the final project attachments array somehow becomes a string instead of an array after passing through the AttributesToJSON processor. I can see that the JSON looks fine until it gets to this processor then after AttributesToJSON, suddenly there are quotes around the array like this: "file_attachment" : "[\"attachment1.docx\",\"attachment2.docx\",\"attachment3.docx\",\"attachment4.docx\"]"
Because of this, the PutMongoRecord processor enters the array in MongoDB as a string instead of array.
I'm using the following processors:
GetFile - read in json file
EvaluateJsonPath - destination = flowfile-attribute, return type=json, renaming some json fields as they are converted to flowfile attributes (submission id = $.id, abstract = $.abstract, file_attachment = $.data.[*] <---COULD this be part of the problem?)
NOTE: after passing through this processor, file_attachment shows up like this in the attributes:
AttributesToJSON - specify the flowfile attributes to write to flowfile content. When viewing the flow-file content the fields have converted correctly EXCEPT for tile attachment:
"file_attachment" : "[\"attachment1.docx\",\"attachment2.docx\",\"attachment3.docx\",\"attachment4.docx\"]"
Any idea how to get around this issue? Thank you for any suggestions!
... View more