Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Can GetHDFS processor catch changes ?

Can GetHDFS processor catch changes ?

I have two folders on hdfs for example folder 1 and folder 2 if i have the same data on both of them and if i delete or update file in folder 1 can getHdfs processor catch changes ( i mean if i update or delete file it should have info log on hdfs can nifi processor catch such cahnges? )?or can any nifi processor make this?

7 REPLIES 7
Highlighted

Re: Can GetHDFS processor catch changes ?

Super Guru

Hi @sally sally GetHDFS processor won't store the state that means if you start and stop the processor it will fetches the files from Directory and deletes the files from HDFS, this is default behaviour of GetHDFS processor (or) if you don't want to delete the files then change Keep Source File property to true that is fetch the source file and keep the source file in HDFS directory. When the GetHDFS processor runs again it will fetches the same file because processor won't remember the fetched files.

Use ListHDFS processor this processor will store the state and

if there is no changes made to the directory (or) file it won't list the flowfile,

If there is any change in the directory or file then this processor gives only the file only the new file that got changed in the directory and updates the state of processor with new file created timestamp Configure the directory property.

In this way ListHDFS processor gives an flowfile with path and filename attributes which are used by FetchHDFS processor to fetch the data from HDFS directory.

This processor won't do any fetching of files it will do just listing all the available files in the directory and FetchHDFS processor will do actual fetching of files.

40511-listhdfs-config.png

FetchHDFS:-

Then use FetchHDFS processor and leave that with default configs as this processor gets attributes ${path}/${filename} from ListHDFS processor.

40512-fetch-hdfs.png

Flow:-

40513-flow-hdfs.png

In addition, this way after ListHDFS processor you can use Site-to-site processor, S2S will distributes the work across the cluster and use FetchHDFS we can do actual fetching the data.

Highlighted

Re: Can GetHDFS processor catch changes ?

if i change something in flowfile for exmple change the name of my reponse data , ListProcessor will find this and updates flowfile?, what about deleting flowiles? for example if i have flowfile 1 in my first directory and i fetched it in my second directory after it i have deleted flowfile 1 in my folder 1, what should i do to delete it in my folder 2 too? eveyone tells syncing directories is impossible in nifi , can you reccomend me anything which can help me solve this problem

thank you in advanc

Highlighted

Re: Can GetHDFS processor catch changes ?

Super Guru

@sally sally

If you want to delete the same flowfile from both folder1,folder2, we can do that in nifi by connecting success of one Delete HDFS processor to another Delte HDFS processor to delete fetched flowfile from both directories.

Here is the Example that i tried:-

both folder1,folder2 are having same 2 files as listed below.

/user/yashu/folder2/part1.txt 
/user/yashu/folder2/part1_sed.txt

so in my flow i'm fetching from folder2

40514-listhdfs-config.png

  1. Once i listed the files that are presented in folder2 directory
  2. Then use FetchHbase processor to fetch the files from folder2.
  3. Once you fetched the files from folder2 then i'm giving success relation to DeleteHDFS processor to delete same flies from folder1.
  4. As i have fetched from folder2 directory the ff will have attribute values for directory and filename so i'm using expression to get filename attributes from ff.
    Attributes in ff:-

    40521-attr-hdfs.png
    DeleteHDFS config for folder1:-

40515-delete-hdfs-folder1.png

Once i delete the files from folder1 then i connected success relation to another DeleteHDFS processor to delete same files from folder2.

DeleteHDFS config for folder2:-

40519-delete-hdfs-folder2.png

So in our first DeleteHDFS processor we have deleted folder1 files and in second DeleteHDFS we have deleted folder2 files.

Flow:-

40520-flow-delete-hdfs.png

Highlighted

Re: Can GetHDFS processor catch changes ?

@Yash thank you for your answer , Do you somehow know is there any way i can manage deleting and updating flowfiles in my hdfs directrory after i delete or update them in my second hdfs directory,i mean i want the same flowfile in directory 1 to change or be deleted aproprietly when the flowfile with the same name is changed in second directory?

Highlighted

Re: Can GetHDFS processor catch changes ?

Super Guru

@sally sally,

We can do that by using PutHDFS processor before that Can you give me more details about how you are going to delete (or) update ff in directories..

Lets assume you are having one file already exists in directory1 with same file name as ff

  1. Are you going to Delete that file? (or)
  2. Else you are going to Update the file with new contents of ff?

My question is how you are detecting which flowfile to Delete and which flowfile to Update?.
Give me your logic to Delete (or) Update flowfile in directories(1,2) so that i can help you..!!

Highlighted

Re: Can GetHDFS processor catch changes ?

@Shu i want to delete oldest flowfiles for exmple if i have flowfiles with names 1 to 100 i want to delete first 10 flowfile and i want to update newer flowfiles in this case last 10 flowfile

Highlighted

Re: Can GetHDFS processor catch changes ?

Super Guru

@sally sally, it looks like complicated logic and i think there is no way we can delete only the first 10 flow files unless if you name them as appropriately to find them uniquely before filtering them in RouteonAttribute.

Don't have an account?
Coming from Hortonworks? Activate your account here