I have been using kafka connect Heavily and for the current projects I need to use Apache Nifi in cluster mode.
I am wondering what should be the best way to use Data ingestion for data lake. (For now lets consider Nifi..)
1. Use Kafka Connect to poll the files and put the data into topic, later process it by Apache Nifi
2. Use Apache Nifi Site to site push Model.
Hi @Pankaj Singh
You can use NiFi directly to pull the file and store them in your data lake (I assume you mean HDFS). You have list/fetch files processor and PutHDFS processor to do so. S2S can be used to distribute the load on the NiFi cluster.