Created 09-25-2017 10:38 AM
I have two folders on hdfs for example folder 1 and folder 2 if i have the same data on both of them and if i delete or update file in folder 1 can getHdfs processor catch changes ( i mean if i update or delete file it should have info log on hdfs can nifi processor catch such cahnges? )?or can any nifi processor make this?
Created on 09-25-2017 01:02 PM - edited 08-17-2019 10:35 PM
Hi @sally sally GetHDFS processor won't store the state that means if you start and stop the processor it will fetches the files from Directory and deletes the files from HDFS, this is default behaviour of GetHDFS processor (or) if you don't want to delete the files then change Keep Source File property to true that is fetch the source file and keep the source file in HDFS directory. When the GetHDFS processor runs again it will fetches the same file because processor won't remember the fetched files.
Use ListHDFS processor this processor will store the state and
if there is no changes made to the directory (or) file it won't list the flowfile,
If there is any change in the directory or file then this processor gives only the file only the new file that got changed in the directory and updates the state of processor with new file created timestamp Configure the directory property.
In this way ListHDFS processor gives an flowfile with path and filename attributes which are used by FetchHDFS processor to fetch the data from HDFS directory.
This processor won't do any fetching of files it will do just listing all the available files in the directory and FetchHDFS processor will do actual fetching of files.
FetchHDFS:-
Then use FetchHDFS processor and leave that with default configs as this processor gets attributes ${path}/${filename} from ListHDFS processor.
Flow:-
In addition, this way after ListHDFS processor you can use Site-to-site processor, S2S will distributes the work across the cluster and use FetchHDFS we can do actual fetching the data.
Created 09-25-2017 01:40 PM
if i change something in flowfile for exmple change the name of my reponse data , ListProcessor will find this and updates flowfile?, what about deleting flowiles? for example if i have flowfile 1 in my first directory and i fetched it in my second directory after it i have deleted flowfile 1 in my folder 1, what should i do to delete it in my folder 2 too? eveyone tells syncing directories is impossible in nifi , can you reccomend me anything which can help me solve this problem
thank you in advanc
Created on 09-25-2017 08:12 PM - edited 08-17-2019 10:34 PM
If you want to delete the same flowfile from both folder1,folder2, we can do that in nifi by connecting success of one Delete HDFS processor to another Delte HDFS processor to delete fetched flowfile from both directories.
Here is the Example that i tried:-
both folder1,folder2 are having same 2 files as listed below.
/user/yashu/folder2/part1.txt
/user/yashu/folder2/part1_sed.txt
so in my flow i'm fetching from folder2
DeleteHDFS config for folder1:-
Once i delete the files from folder1 then i connected success relation to another DeleteHDFS processor to delete same files from folder2.
DeleteHDFS config for folder2:-
So in our first DeleteHDFS processor we have deleted folder1 files and in second DeleteHDFS we have deleted folder2 files.
Flow:-
Created 09-26-2017 05:24 AM
@Yash thank you for your answer , Do you somehow know is there any way i can manage deleting and updating flowfiles in my hdfs directrory after i delete or update them in my second hdfs directory,i mean i want the same flowfile in directory 1 to change or be deleted aproprietly when the flowfile with the same name is changed in second directory?
Created 09-26-2017 10:20 PM
We can do that by using PutHDFS processor before that Can you give me more details about how you are going to delete (or) update ff in directories..
Lets assume you are having one file already exists in directory1 with same file name as ff
My question is how you are detecting which flowfile to Delete and which flowfile to Update?.
Give me your logic to Delete (or) Update flowfile in directories(1,2) so that i can help you..!!
Created 09-28-2017 07:45 AM
@Shu i want to delete oldest flowfiles for exmple if i have flowfiles with names 1 to 100 i want to delete first 10 flowfile and i want to update newer flowfiles in this case last 10 flowfile
Created 09-29-2017 01:57 AM
@sally sally, it looks like complicated logic and i think there is no way we can delete only the first 10 flow files unless if you name them as appropriately to find them uniquely before filtering them in RouteonAttribute.