I need a suggestion/advice on a use case that i have. I have a bunch of data being streamed from NiFi on 3 different mongo db collections let's say A,B,C. I need to perform a lookup based on a particular field say emp ID which is present in A. If the value of Emp ID present in A is matching with that of the payloads present in collection B & C. Only then i need to attach the document present in B & C to document present in A(i am creating new key/value pair, key being a random UID and value being the payload from B & C in A for performing update), If the value doesn't exists then I need to insert the document as it is in A from B or C. Currently i am using Spark to do this. I believe its an overkill for this job and doesn't make sense to use spark. Any other big data tech suggestions ?
P.S I need to perform update on real time basis.
-Thanks in Advance
You can leverage NiFi for this usecase by using LookupRecord processor with MongoDBLookup service.
Once you get empID in A then perform a series of lookups to check the same empid exists in B,C collection and define your Record writer controller service with avro schema that matches with the Result Record to create new key/value pair.
Based on Routing Strategy property you can know is the empID present in B,C collections or not, use matched/unmatched connections to make decision i.e. "To create a new key/value record then insert into A collection (or) insert document into A collection."
Refer this link for more details regards to LookupRecord processor with MongoDbLookup service.
Thank you for your reply. But I don't see lookuprecord processor present in nifi under add processor menu. The NiFi version that I am using is 18.104.22.168.0.1.1-5. Is there anything that I am missing ?
Thank you very much. Ill check with Hortonworks on how to get these processors into the Nifi version that I am using. Meanwhile, I wanted to know is there any other tool that I can use other than Spark or NiFi ?
Thanks In Advance