Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

nifi sync 2 different directories

avatar
New Contributor

Hello There,

I have a simple workflow that looks like:

ListFile -> FetchFile -> PutHDFS

This workflow is used to update a directory on HDFS based on a windows share directory.

ListFile seems to get the job done when a new file is added or an existing file is updated, but it doesn't seem to handle the case where a file is removed from the source directory. Is there an existing processor that does this? Or is this something that I have to implement a custom processor for? I.e. I'm trying to delete files on HDFS based on what gets deleted in a windows share directory. Is there something that already does this or do I need to write a custom processor?

Thanks for your help.

1 ACCEPTED SOLUTION

avatar
Master Guru

AFAIK, there is no current capability for this, as GetFile/ListFile detect existing files, and GetFile/FetchFile sometimes handle deletes (if they are the ones deleting them). Perhaps a custom (hopefully shared with the Apache NiFi community?) processor called WatchFile would be prudent. It could implement the WatchService API and generate (perhaps empty) flow files whose attributes reflect the file and its change in state.

View solution in original post

2 REPLIES 2

avatar
Master Guru

AFAIK, there is no current capability for this, as GetFile/ListFile detect existing files, and GetFile/FetchFile sometimes handle deletes (if they are the ones deleting them). Perhaps a custom (hopefully shared with the Apache NiFi community?) processor called WatchFile would be prudent. It could implement the WatchService API and generate (perhaps empty) flow files whose attributes reflect the file and its change in state.

avatar
New Contributor

Thanks for the advice Matt. Let me preface this by saying I'm completely new to Nifi. After looking at the source code for ListFile processor, I've got some questions about implementing WatchFile:

Would it be ideal if I were to do something similar to ListFile? I.E. I would extend AbstractListProcessor<FileInfo> instead of AbstractProcessor itself. I would implement my own version of performListing with the WatchServiceAPI along with the other functions from AbstractListProcessor that I would need to override, and let the AbstractListProcessor generate the Flow Files?

Thanks for your help.