Member since
12-21-2016
16
Posts
0
Kudos Received
0
Solutions
05-25-2017
03:24 PM
As mentioned above on Matt's comment, yes, the one left behind always has the latest timestamp.
... View more
05-25-2017
03:23 PM
Yes, the one that is left behind is the latest generated file. The last file gets picked up on the second run. My use case was looking for a listing of all the files in an hdfs directory at a given moment. GetHDFS provides that functionality with the inefficient overhead of bringing the actual files into nifi. I was hoping to just get the list of files with listHDFS. I'm thinking I might look into ExecuteStreamCommand to generate the list with a hdfs dfs -ls and parse that list.
... View more
05-25-2017
02:26 PM
I am running a ListHDFS processor pointing to a directory on hdfs on a timer driven schedule set to execute once per hour. After making sure the state is clear on the processor, I run it and see that it creates a flow file for all but 1 file in the directory. There are 5 files in the directory, and only 4 flowfiles are created. If I add more files and clear the state and attempt to run again, the pattern repeats, always one less flowfile is create, so one file is missed. It is not the same file that is missed with each run.
Why is the processor missing 1 file each time? Is this by design?
This is in HDF 2.1.0.1 and Apache NiFi - Version 1.1.0.2.1.0.1-1
... View more
Labels:
- Labels:
-
Apache NiFi