Support Questions
Find answers, ask questions, and share your expertise

LISTFTP : number of listed files limitation ?

Highlighted

LISTFTP : number of listed files limitation ?

Hello,

i've got a problem to fetch list of a ftp directory.

It contains around 9500 files and when i wat to retrieve the list, i obtain this error :

2017-04-13 14:33:41,961 ERROR [Timer-Driven Process Thread-3] o.a.nifi.processors.standard.ListFTP ListFTP[id=b99a71fa-9919-1a87-ffff-ffffd56fde68] ListFTP[id=b99a71fa-9919-1a87-ffff-ffffd56fde68] failed to process session due to java.lang.NullPointerException: java.lang.NullPointerException

2017-04-13 14:33:41,961 WARN [Timer-Driven Process Thread-3] o.a.nifi.processors.standard.ListFTP ListFTP[id=b99a71fa-9919-1a87-ffff-ffffd56fde68] Processor Administratively Yielded for 1 sec due to processing failure

2017-04-13 14:33:41,961 WARN [Timer-Driven Process Thread-3] o.a.n.c.t.ContinuallyRunProcessorTask Administratively Yielding ListFTP[id=b99a71fa-9919-1a87-ffff-ffffd56fde68] due to uncaught Exception: java.lang.NullPointerException

2017-04-13 14:33:46,952 ERROR [Timer-Driven Process Thread-9] o.a.nifi.processors.standard.ListFTP ListFTP[id=b99a71fa-9919-1a87-ffff-ffffd56fde68] ListFTP[id=b99a71fa-9919-1a87-ffff-ffffd56fde68] failed to process due to java.lang.NullPointerException; rolling back session: java.lang.NullPointerException

2017-04-13 14:33:46,952 ERROR [Timer-Driven Process Thread-9] o.a.nifi.processors.standard.ListFTP ListFTP[id=b99a71fa-9919-1a87-ffff-ffffd56fde68] ListFTP[id=b99a71fa-9919-1a87-ffff-ffffd56fde68] failed to process session due to java.lang.NullPointerException: java.lang.NullPointerException

2017-04-13 14:33:46,952 WARN [Timer-Driven Process Thread-9] o.a.nifi.processors.standard.ListFTP ListFTP[id=b99a71fa-9919-1a87-ffff-ffffd56fde68] Processor Administratively Yielded for 1 sec due to processing failure

If a write a regex in the File Filter Regex for limit number of file, it's ok.

But my use case need to list all file in the directory.

I've changed the Remote Poll Batch Size from 5000 to 10000 (and 15000 for try) -> same error

The dataflow is executed on a cluster but i have set the Execution to Primary Node (if i filter files in the ListFTP processor, my listing and fetch file is ok).

Is it a ListFTP processor's bug ? Or have i to set something else for fetch a large list of file ?

Thanks for your answer

10 REPLIES 10
Highlighted

Re: LISTFTP : number of listed files limitation ?

@Maxime Lézier

Which version of NiFi are you using? How many concurrent tasks does the ListFTP processor have configured?

Highlighted

Re: LISTFTP : number of listed files limitation ?

Hello,

It's on hdf 2.1/nifi 1.1

Thanks

Highlighted

Re: LISTFTP : number of listed files limitation ?

Hello,

The remote path property contain only the dir's name like "BIG_DATA_XXXX".

It's not the user's root directory.

Highlighted

Re: LISTFTP : number of listed files limitation ?

Master Guru

@Maxime Lézier

Check you nifi-app.log on your primary node for the NPE ERROR you are showing above and provide the full stack trace that should follow it.

Highlighted

Re: LISTFTP : number of listed files limitation ?

@Maxime Lézier

The way the FetchFTP processor works is the first listing is created with files that are the oldest time stamp, up to the Remote Poll Batch Size. The next time the processors runs, it creates another listing of files with a time stamp later than the previous time stamp and so on.

So, for example, if the Remote Poll Batch Size is 5000, and there are 6000 files all with the same time stamp, it will only create a listing of 5000 files. And then the next time it runs, it will only create a list of files with a time stamp later than the time stamp of the 5000 files, so there would be 1000 files that would not be listed because the Batch Size was not large enough to get them in the first listing.

So, I set my Remote Poll Batch Size to a value large enough to all of the files with the same time stamp. In my case I set it to 15000 and then all 6000 files were in the first listing and then the rest of my files all with a later time stamp were added to the next listing.

The reason it didn't work for you after you changed the RemotePoll Batch Size is that the processor retains state about previous listing. So, you would have to clear the state of the processor and then it should work for you.

To clear the state of the processor do the following steps,

right click while hovering over the processor

14794-screen-shot-2017-04-20-at-44425-pm.png

Then select view state, another window will pop up

14795-screen-shot-2017-04-20-at-44526-pm.pngat

Then click Clear state, and that should be all you need to do. The next time you run the FetchFTP processor, it should get a listing of all of your files.

Highlighted

Re: LISTFTP : number of listed files limitation ?

Hello,

first, thanks for your answer .

The ftp's root directory contain 4 directory.

I have to fetch files in one of them, it's name is "BIG_DATA_TYPE"

The directory contains 9328 files, the file's name look like :

matriceTYPE_YYYYMMDDHHmm.csv.gz.

if i don't put a regex filter or filter like .*, i obtain error.

If i put a regex like 'matriceTYPE_201704.*' (return 6759 files) -> it's ok

My problem is than i can't put a regex like that, because i can't change ma processor's configuration all month .

Between each try, i clear state.

Highlighted

Re: LISTFTP : number of listed files limitation ?

@Maxime Lézier

Will you post a snapshot of the ListFTP processor's config?

Highlighted

Re: LISTFTP : number of listed files limitation ?

14860-2017-04-25-08-58-24-nifi-bl6.png

Yes of course :

14854-conf-listftp-meteo-nifi-bl.png

For this config it's ok :

14855-2017-04-25-08-58-24-nifi-bl2.png

But for this :

14859-2017-04-25-08-58-24-nifi-bl5.png

Error :

14860-2017-04-25-08-58-24-nifi-bl6.png

thanks for your help.


2017-04-25-08-58-24-nifi-bl4.png
Highlighted

Re: LISTFTP : number of listed files limitation ?

@Maxime Lézier

Have you tried using a file filter regex like "matrice.*" ?