Support Questions

Find answers, ask questions, and share your expertise

Nifi to HDFS filename leaving off minutes

Rising Star

We have a recurring issue we haven't been able to solve.

Take a look at our flow. We're taking logfiles, extracting date and time attributes to use as the filename when writing them to HDFS

It mostly works but we keep intermittently getting filenames that are missing the minutes field.




2017_05_16_13__fozziesplunkr.log <---this is the file. It contains entries from minutes 55, 56 and 57. It feels like a "catch-all" file.

Our setup: 6 hosts sending to 3 topics, 2 hosts per topic. It is generating at about 10k/messages per second of 1 million events. This is just our test data.

On our sending side, we are doing TailFile -> Control Rate (1 MB) -> PublishKafka (this seems to work well)

On our receiving side, there are screenshots of one of our topics with every processor and its tabs.

We used ConsumeKafka -> ExtractText -> UpdateAttribute (Regex for timestamp from log) -> MergeContent -> UpdateAttribute (Create filename) -> PutHDFS

All of these have screenshots as shown. If anyone has had this problem and has any idea on a solution, that'd be welcome. We've tried all kinds of performances tweaks without success. Nifi logs show no warnings or errors.


Flow Overview and Odd missing minute in Filenames


ConsumeKafka Processor and Update Attribute (Create Filename) Processor


ExtractText (Extract from Syslog -Regex) Processor and Update Attribute (assign to attributes) processor


PutHDFS & Merge Content



Rising Star

Sorry I was having problems uploading. It kept timing out so I guess all three ended up posting. How do I close it? The delete option is not available (and I think its because this question has a comment on it, this comment here...)


Rising Star

I reported it to get it removed.