Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Nifi filename in HDFS granularity off

Nifi filename in HDFS granularity off

Rising Star

Currently, we are trying to separately files going through Nifi into HDFS by minute by using the PutHDFS configurable property Conflict Resolution Strategy: Append. We use ExtractText to retrieve the minute from the event's timestamp, then save it into an attribute and create the filename with that attribute (among others).

We find that within a margin of 2-6 seconds, there are events from the next minute in the previous minutes file. This will cause problems down the road in terms of searching through the data based on a minute.

Has anyone found this issue themselves using the method we are? Is there a configurable property in ExtractText or UpdateAttribute that might lead us to a more granular depositing of the events correctly?

Thanks

4 REPLIES 4

Re: Nifi filename in HDFS granularity off

Rising Star

We also keep getting odd filename missing parts such as example:

Correct file:

2017_05_11_13_25_topic.log

Incorrect:

2017_05_11_13__topics.log

Although I will say I notice the incorrect version of the filename moreso when the data is generated in a kafka topic without it being empty rather than it being a new topic... weird.

Re: Nifi filename in HDFS granularity off

New Contributor

Without seeing the full data flow, my initial thought would be to try and use a 'merge content' process, and use a variant of your timestamp attribute as the correlation attribute. All flow files with the same correlation attribute should be grouped together; then just write the resulting set of merged flow files out to HDFS.

Re: Nifi filename in HDFS granularity off

Rising Star

Thanks I can try something like this. In place of the PutHDFS Conflict Resolution Strategy of Append, you mean?

Re: Nifi filename in HDFS granularity off

New Contributor

Correct; the correlation would happen in the merge process, and then written out, although you may be able to use both if your batch sizes are going to be pretty large.

Don't have an account?
Coming from Hortonworks? Activate your account here