Hi,
Am very new to Nifi, We are trying to load data receiving from an external system on a daily basis. Once the data ingestion is completed, the data is processed using hive tables through a oozie workflow. I have explained below the steps we are following in data ingestion. kindly let us know if we are following the right approach to avoid Nifi not duplicating any data and be stable.
1. we are receiving data from an external system on daily basis in four different feeds. This is a daily batch process.
eg: tabl1yyyymmdd, table2yyyymmdd, table3yyyymmdd and table4yyyymmdd.
2. The above mentioned four feeds is dropped into folder named source is linux box where Nifi is also installed.
3. We created a processes group Data_Load in Nifi
3. We are moving data from linux folder to Hdfs folder named unprocessed by creating the below Nifi flow inside the processor group Data_Load
ListFile -> FetchFile ->PutHdfs processor
4. Then we are moving the data from HDFS folder unprocessed to a another HDFS folder processed by creating the below Nifi flow inside the processor group Data_Load.
ListHdfs -> FetchHdfs -> PutHdfs processor.
Note: The HDFS folder unprocessed will have complete data. The HDFS folder processed will have only one day data. The previous day data will be deleted through oozie workflow once the data processing is completed.
Kindly let us know whether the above mentioned process of data ingestion using Nifi has any flaw in their design to be stable and to avoid Nifi not duplicating any data.
Thanks