We want to ingest Salesforce data in real-time. Does anyone know how other companies are doing it? we have hortonworks stack. Will it be a REST API call from NIFI or should we directly use Kafka ? Can it be event driven ? Like order record is updated in salesforce and that triggers a event to big data?
Last year I wrote an article on integrating with the Salesforce Bulk API (which uses SOAP not REST): https://community.hortonworks.com/articles/146016/integrating-with-salesforce-using-the-sfdc-soap-ap.... That should give you a starting point for how the interaction works. In this case, communication is done using XML documents and NiFi initiated a job in Salesforce, waited until the job was done, and queried for the results to write to HDFS.
In the real-time scenario, I believe it is a JSON-based REST API you'll need to use. You will want to use NiFi to make a REST call to retrieve the data you're interested in on the SFDC server. If you'd like to make it event-driven, you'll have to explore the custom functionality that SFDC offers to enable that. I think APEX triggers would be a good place to start though my knowledge in that area is limited.
To enable your HDF platform to receive these events however, you can use the ListenHTTP processor to make NiFi an HTTP webserver that SFDC can communicate to or use Kafka and expose your host/port publicly to SFDC. SFDC would then be able to communicate with NiFi or Kafka to send events/records.
Thank you for response - But since our main goal is to capture the changes in salesforce like entering the new customer information, shouldn't we use StreamingAPI? this can help us to be notified about the changes
Correct - I linked the article on Bulk API to give you an example of one way you can do it, so you can reference it whether you're doing batch or real-time. If you are interested in receiving real-time event data, then you will want to stay away from the Bulk API.
By all means follow your linked guide if you'd like to start by sending it to Kafka - it looks comprehensive and straightforward to implement. NiFi can then pick it up from there and you can apply whichever operations you need after that. NiFi has processors for searching JSON data so it is right at home there.