Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

ListFile processor is not detecting all new files

Solved Go to solution

ListFile processor is not detecting all new files

Explorer

The ListFile processor detects all files in a directory on startup, but additional files are not always detected. Anyone know why this would happen? The new files have different names. All files are owned by the same user so it's not a privilege issue.

To provide more details...Initially there were 255 files which were all detected. I added 220 new files, but only 45 of those were detected. I then added an additional 230 files, but only 18 of those were detected. Any files added after that were not detected at all...

1 ACCEPTED SOLUTION

Accepted Solutions

Re: ListFile processor is not detecting all new files

Master Guru
@Fawn Nguyen

How are are the Files being moved to this directory. To understand what is happening here, let me explain how the listFile decides what files to list in a source directory.

The List type processors all keep "state". It would be very expensive to record information about every File listed and then compare any new listing against that list each time a list based processor runs. So what is recorded in state management is the latest time stamp of the lats batch of files listed. So on next run the list based processors will only list files with newer timestamps and then update state again.

It is my guess that the method which you are suing to "move" these new sets of files in to this directory is not resulting in a update of the file timestamp. So only the few files that actually have newer timestamps then what was last recorded in state are being listed. You will need to change how you are moving files in to this directory to make sure all the file's timestamps are updated. A move operation typically does not update the timestamp, but a copy will.

Thanks,

Matt

2 REPLIES 2

Re: ListFile processor is not detecting all new files

Master Guru
@Fawn Nguyen

How are are the Files being moved to this directory. To understand what is happening here, let me explain how the listFile decides what files to list in a source directory.

The List type processors all keep "state". It would be very expensive to record information about every File listed and then compare any new listing against that list each time a list based processor runs. So what is recorded in state management is the latest time stamp of the lats batch of files listed. So on next run the list based processors will only list files with newer timestamps and then update state again.

It is my guess that the method which you are suing to "move" these new sets of files in to this directory is not resulting in a update of the file timestamp. So only the few files that actually have newer timestamps then what was last recorded in state are being listed. You will need to change how you are moving files in to this directory to make sure all the file's timestamps are updated. A move operation typically does not update the timestamp, but a copy will.

Thanks,

Matt

Re: ListFile processor is not detecting all new files

Explorer

I am using touch to modify the timestamp of the files (stat <filename> shows new modified timestamp) but still the files are not picked up by ListFile processor

Don't have an account?
Coming from Hortonworks? Activate your account here