Support Questions
Find answers, ask questions, and share your expertise

Nifi - Ingest Files from Local System using ListFile, ingest files irrespective of timestamp

New Contributor

We are trying to ingest files from local system to HDFS. Nifi Flow is

 

ListFile => FetchFile => UpdateAttribute => PutHDFS

 

 

ListFile Listing Strategy is Tracking Entities. Execution Primary node only.

Scenario -

On our local system, we have directory structure for each month data as -

 

2019/201901
2019/201902
  .
  .
2019/201911
2019/201912

 

Files timestamp corresponds to month. All these monthly folders continuously gets data.

When ListFile processor ingest files from 201911 directory and new files gets added into other folders (folders older than 201911, say 201903) these files are not picked by ListFile processor. I tried using different values for Entity Tracking Time Window property but no luck. Apparently Tracking Entities Listing Strategy is behaving like Tracking Timestamps(caching latest timestamp from ingested files and not ingesting any older timestamped files)

As far my understanding when we use ListFile with Listing Strategy as Tracking Entities, it will cache - Name, Size and Last modified timestamp for each flowfile and then keep Listing files which are not in the cache based on these attributes.

  1. Why ListFile processor is not picking files with my current configuration
  2. So, basically I want to continuously ingest new files based on filename (irrespective of timestamp) and skip already ingested files - is there any workaround to achieve this.

 

lf1.PNG

 

 

lf2.PNG

 

P.S. -  I've asked same question on StackOverflow,

https://stackoverflow.com/questions/60189067/nifi-ingest-files-from-local-system-using-listfile-inge...

 

0 REPLIES 0