Support Questions
Find answers, ask questions, and share your expertise

Need inputs on processing data using NiFi for a particular use case.


I have the below files in the input directory in the below pattern,

cntl_table_1.xml table.1.dat cntl_table_2.xml table_2.dat cntl_table_3.xml table_3.dat cntl_table_4.xml table_4_1.dat table_4_2.dat

The files with (.xml) extension has the structure of the table & files with (.dat) extension has the actual data for the table. If a particular data file for a table is more than 500MB, the files will be split into 2 or more files. In our case it is table_4. In total we will be getting files for 15 different tables.

Once we receive the files(*.xml) for 15 tables(we have to wait till all 15 files have been received), we have to trigger a shell script, which looks for these 15(.xml) files in the input directory & creates Hive tables. Once the shell script completes successfully, we have to start processing the data files from the input directory.

I have to achieve the above logic using NiFi but still not successful. Could anyone please let me know, if this logic can be achieved through NiFi? Also, please let me know if more information is required.