Created 10-04-2018 12:16 AM
Hi,
Using NiFi 1.7.1
I am using one JSON file with a batch of 20 json records separated by newline to be Upserted into MongoDB.
My flow is like: GetFile-->PutMongo-->PutFile
GetFile pulls the JSON , PutMongo upserts into MongoDB based on two keys, PutFile archives the JSON file
Looks simple but the issue I am having is that the PutMongo processor only creates the first record from this JSON file into MongoDB.
My JSON file is a multiline file, contains 20 records separated by newline.
Sample from this input JSON file is as below,
{"key1":abcd,"key2":1234,"data1":12500.0000000000,"data2":true} {"key1":efgh,"key2":6541,"data1":19999.0000000000,"data2":true}
PutMongo properties as below,
SSL Context Service: No value set
Client Auth: NONE
Mode: update
Upsert: true
Update Query Key: key1,key2
Update Query: No value set
Update Mode: With whole document
Write Concern: ACKNOWLEDGED
Character Set: UTF-8
Please suggest.
Thanks,
Rajesh
Created 10-11-2018 06:23 PM
yeah the problem is that putmongo cannot recognize that there is more than one json in the file. So what ends up happening is that it reads the first line which is valid json, but when it hits the next line, it thinks this is part of the json from the first line and probably fails with a parsing error. To resolve this i would add a splitline or splittext processor so you can split the input file using the newline character. This will give you the 20 json in individual flowfile. Then you can feed this to mongo and it should work fine. From the split , you can take the "original" relation and use that for doing the archive using putfile.
Created 10-11-2018 08:23 PM
Thanks for your response. I will definitely try this.
Few points I would like to understand.
1. Are there any alternative to splitting this JSON file (using splitline or splittext )? Sounds like a definite performance hit when ingesting large number of records (> 100,000).
2. Is this a limitation of PutMongo Processor that it is not able to read this JSON file? How does PutMongo expect the JSON to be so that it can ingest multiple records from one single JSON file.
Thanks,
Rajesh
Created 10-12-2018 03:32 AM
You can use PutMongoRecord for this, the JsonTreeReader can accept "one JSON per line" (as of NiFi 1.7.0 via NIFI-4456, also HDP 3.2)
Created 10-12-2018 02:26 PM
Thanks for your response Matt.
I have used PutMongoRecord before for bulk ingestion use cases where requirement was only to insert records. Worked fine for me.
On this use case I need to do upserts. Not sure if I can upsert using PutMongoRecord. If yes, could you please suggest the processor configuration of PutMongoRecord for doing upsert using two keys.
Thanks,
Rajesh