Created 09-27-2017 07:11 AM
Created on 09-27-2017 01:16 PM - edited 08-17-2019 10:22 PM
Hi @Xtr yarhid, In Apache NiFi, Controller Services are shared services that can be used by Processors, i.e let's take if you are thinking to get the data or store the data to Hbase Hive tables then these processors need Hive,Hbase controller services First we needs to enable there services and then use them in the processors.
Coming back to your question, Use ListHDFS processor this processor will store the state and run this processor for 15 mins using cron (or) timer driven
This processor won't do any fetching of files it will do just listing all the available files in the directory and FetchHDFS processor will do actual fetching of files.
ListHDFS Configs:-
in this processor i have given /user/yashu/del_test as the directory property and this process runs for every 900 sec on Timer driven, for the first time this processor lists all the files that are in del_test directory and filename,path will attributes with the flow file(if you are having 2 files in the directory then there will be 2 flowfiles each ff will have file name and path attribute to it).
if you want to see the state in ListHDFS processor right click on processor and click on view state button.
FetchHDFS:-
Then use FetchHDFS processor and leave that with default configs as this processor gets attributes ${path}/${filename} from ListHDFS processor.
in this processor fetches actual data from HDFS as list hdfs processor only lists the files that are changed in last 15 mins.
In addition, this way after ListHDFS processor you can use Site-to-site processor, S2S will distributes the work across the cluster and use FetchHDFS we can do actual fetching the data.
Hope this helps..!!
Created on 09-27-2017 01:16 PM - edited 08-17-2019 10:22 PM
Hi @Xtr yarhid, In Apache NiFi, Controller Services are shared services that can be used by Processors, i.e let's take if you are thinking to get the data or store the data to Hbase Hive tables then these processors need Hive,Hbase controller services First we needs to enable there services and then use them in the processors.
Coming back to your question, Use ListHDFS processor this processor will store the state and run this processor for 15 mins using cron (or) timer driven
This processor won't do any fetching of files it will do just listing all the available files in the directory and FetchHDFS processor will do actual fetching of files.
ListHDFS Configs:-
in this processor i have given /user/yashu/del_test as the directory property and this process runs for every 900 sec on Timer driven, for the first time this processor lists all the files that are in del_test directory and filename,path will attributes with the flow file(if you are having 2 files in the directory then there will be 2 flowfiles each ff will have file name and path attribute to it).
if you want to see the state in ListHDFS processor right click on processor and click on view state button.
FetchHDFS:-
Then use FetchHDFS processor and leave that with default configs as this processor gets attributes ${path}/${filename} from ListHDFS processor.
in this processor fetches actual data from HDFS as list hdfs processor only lists the files that are changed in last 15 mins.
In addition, this way after ListHDFS processor you can use Site-to-site processor, S2S will distributes the work across the cluster and use FetchHDFS we can do actual fetching the data.
Hope this helps..!!
Created 09-28-2017 06:54 AM
ok .... thank you
Created 10-02-2017 05:38 AM
Hi @Shu ,yes, your answer is helpful to resolve my problem , thank you very much