Support Questions

Find answers, ask questions, and share your expertise

Hello all , what are the differences between services and processor : what should i create if i a want to do a directory check per 15 min .....

avatar
 
1 ACCEPTED SOLUTION

avatar
Master Guru

Hi @Xtr yarhid, In Apache NiFi, Controller Services are shared services that can be used by Processors, i.e let's take if you are thinking to get the data or store the data to Hbase Hive tables then these processors need Hive,Hbase controller services First we needs to enable there services and then use them in the processors.

Coming back to your question, Use ListHDFS processor this processor will store the state and run this processor for 15 mins using cron (or) timer driven

  1. if there is no changes made to the directory (or) file it won't list the flowfile,
  2. If there is any change in the directory or file then this processor gives only the new file that got changed in the directory and updates the state of processor with new file created timestamp Configure the directory property.
  3. In this way ListHDFS processor gives an flowfile with path and filename attributes which are used by FetchHDFS processor to fetch the data from HDFS directory.

This processor won't do any fetching of files it will do just listing all the available files in the directory and FetchHDFS processor will do actual fetching of files.

ListHDFS Configs:-

40560-listhdfs-config.png

in this processor i have given /user/yashu/del_test as the directory property and this process runs for every 900 sec on Timer driven, for the first time this processor lists all the files that are in del_test directory and filename,path will attributes with the flow file(if you are having 2 files in the directory then there will be 2 flowfiles each ff will have file name and path attribute to it).

if you want to see the state in ListHDFS processor right click on processor and click on view state button.

FetchHDFS:-
Then use FetchHDFS processor and leave that with default configs as this processor gets attributes ${path}/${filename} from ListHDFS processor.

40559-fetch-hdfs.png

in this processor fetches actual data from HDFS as list hdfs processor only lists the files that are changed in last 15 mins.

In addition, this way after ListHDFS processor you can use Site-to-site processor, S2S will distributes the work across the cluster and use FetchHDFS we can do actual fetching the data.

Hope this helps..!!

View solution in original post

3 REPLIES 3

avatar
Master Guru

Hi @Xtr yarhid, In Apache NiFi, Controller Services are shared services that can be used by Processors, i.e let's take if you are thinking to get the data or store the data to Hbase Hive tables then these processors need Hive,Hbase controller services First we needs to enable there services and then use them in the processors.

Coming back to your question, Use ListHDFS processor this processor will store the state and run this processor for 15 mins using cron (or) timer driven

  1. if there is no changes made to the directory (or) file it won't list the flowfile,
  2. If there is any change in the directory or file then this processor gives only the new file that got changed in the directory and updates the state of processor with new file created timestamp Configure the directory property.
  3. In this way ListHDFS processor gives an flowfile with path and filename attributes which are used by FetchHDFS processor to fetch the data from HDFS directory.

This processor won't do any fetching of files it will do just listing all the available files in the directory and FetchHDFS processor will do actual fetching of files.

ListHDFS Configs:-

40560-listhdfs-config.png

in this processor i have given /user/yashu/del_test as the directory property and this process runs for every 900 sec on Timer driven, for the first time this processor lists all the files that are in del_test directory and filename,path will attributes with the flow file(if you are having 2 files in the directory then there will be 2 flowfiles each ff will have file name and path attribute to it).

if you want to see the state in ListHDFS processor right click on processor and click on view state button.

FetchHDFS:-
Then use FetchHDFS processor and leave that with default configs as this processor gets attributes ${path}/${filename} from ListHDFS processor.

40559-fetch-hdfs.png

in this processor fetches actual data from HDFS as list hdfs processor only lists the files that are changed in last 15 mins.

In addition, this way after ListHDFS processor you can use Site-to-site processor, S2S will distributes the work across the cluster and use FetchHDFS we can do actual fetching the data.

Hope this helps..!!

avatar

ok .... thank you

avatar

Hi @Shu ,yes, your answer is helpful to resolve my problem , thank you very much