Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

NiFi PutMongo creating only first record from Multiline JSON

NiFi PutMongo creating only first record from Multiline JSON

New Contributor

Hi,

Using NiFi 1.7.1

I am using one JSON file with a batch of 20 json records separated by newline to be Upserted into MongoDB.

My flow is like: GetFile-->PutMongo-->PutFile

GetFile pulls the JSON , PutMongo upserts into MongoDB based on two keys, PutFile archives the JSON file

Looks simple but the issue I am having is that the PutMongo processor only creates the first record from this JSON file into MongoDB.

My JSON file is a multiline file, contains 20 records separated by newline.

Sample from this input JSON file is as below,

{"key1":abcd,"key2":1234,"data1":12500.0000000000,"data2":true} {"key1":efgh,"key2":6541,"data1":19999.0000000000,"data2":true}

PutMongo properties as below,

SSL Context Service: No value set

Client Auth: NONE

Mode: update

Upsert: true

Update Query Key: key1,key2

Update Query: No value set

Update Mode: With whole document

Write Concern: ACKNOWLEDGED

Character Set: UTF-8

Please suggest.

Thanks,

Rajesh

4 REPLIES 4
Highlighted

Re: NiFi PutMongo creating only first record from Multiline JSON

Expert Contributor
@Rajesh Ghosh

yeah the problem is that putmongo cannot recognize that there is more than one json in the file. So what ends up happening is that it reads the first line which is valid json, but when it hits the next line, it thinks this is part of the json from the first line and probably fails with a parsing error. To resolve this i would add a splitline or splittext processor so you can split the input file using the newline character. This will give you the 20 json in individual flowfile. Then you can feed this to mongo and it should work fine. From the split , you can take the "original" relation and use that for doing the archive using putfile.

Highlighted

Re: NiFi PutMongo creating only first record from Multiline JSON

New Contributor

@Karthik Narayanan

Thanks for your response. I will definitely try this.

Few points I would like to understand.

1. Are there any alternative to splitting this JSON file (using splitline or splittext )? Sounds like a definite performance hit when ingesting large number of records (> 100,000).

2. Is this a limitation of PutMongo Processor that it is not able to read this JSON file? How does PutMongo expect the JSON to be so that it can ingest multiple records from one single JSON file.

Thanks,

Rajesh

Highlighted

Re: NiFi PutMongo creating only first record from Multiline JSON

Super Guru

You can use PutMongoRecord for this, the JsonTreeReader can accept "one JSON per line" (as of NiFi 1.7.0 via NIFI-4456, also HDP 3.2)

Highlighted

Re: NiFi PutMongo creating only first record from Multiline JSON

New Contributor

@Matt Burgess

Thanks for your response Matt.

I have used PutMongoRecord before for bulk ingestion use cases where requirement was only to insert records. Worked fine for me.

On this use case I need to do upserts. Not sure if I can upsert using PutMongoRecord. If yes, could you please suggest the processor configuration of PutMongoRecord for doing upsert using two keys.

Thanks,

Rajesh

Don't have an account?
Coming from Hortonworks? Activate your account here