- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
How do you use GetFile to read the latest file exactly once and push it to KafkaProducer ?
- Labels:
-
Apache NiFi
Created ‎01-26-2017 03:07 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi
I'm looking to read the most recent/latest file exactly once from a directory using NiFi Getfile processor and push it to KafkaProducer. The Kafka Consumer should get the messages on the topic only once.
Current config:
GetFile connected to KafkaProducer in NiFi. I'm using pykafka consumer to read the messages for further processing.
I am using GetFile processor and playing around with MinFileAge, MaxFileAge and scheduling to ensure I see only one set of messages in KafkaConsumer. With some experiments, I've always seen the same content appearing twice.
Thanks
Chetan
Created ‎01-26-2017 04:32 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
You may want to use ListFile -> FetchFile rather than GetFile. ListFile will keep track of the files it has found and will not list them again unless they have been updated (and still satisfy the other filters you specify in the properties).
Can you describe your use case a bit more? Is it the case that many files may be placed in the directory "at once" but you only want the latest one? Also do the files need to remain in that directory? If so, I think ListFile -> FetchFile is your best bet, but if not, you can set GetFile to remove the file on read. Then only "new" files will be found by GetFile (because any files processed would be removed).
Created ‎01-26-2017 04:32 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
You may want to use ListFile -> FetchFile rather than GetFile. ListFile will keep track of the files it has found and will not list them again unless they have been updated (and still satisfy the other filters you specify in the properties).
Can you describe your use case a bit more? Is it the case that many files may be placed in the directory "at once" but you only want the latest one? Also do the files need to remain in that directory? If so, I think ListFile -> FetchFile is your best bet, but if not, you can set GetFile to remove the file on read. Then only "new" files will be found by GetFile (because any files processed would be removed).
Created ‎01-29-2017 02:39 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks Matt. That worked. Yes, I need to ensure the files remain the directory. I could do a Putfile to a temp/backup dir and do a GetFile with remove-on-read. Many files will not be placed in the directory at once. By default we would need to process only the latest file.
Cheers!
