Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Nifi how to cron scrollelasticsearchhttp processor ?

Highlighted

Nifi how to cron scrollelasticsearchhttp processor ?

New Contributor

Hello,


I have a batch that must run every day at 2:00. This batch needs to process a big volume of data (the data of the previous day : about 100 000 elastic documents).

For these 2 reasons, i use :
- a "ScrollElasticSearchHttp" with a query that filters the data of the day before (see below) and a page size = 1000

- a cron that launches the processor above : */5 * 2 * * ?

108603-sans-titre.png

I have no problem for the first day : the scroll iterates according the cron (a call every 5s) and retrieves all the pages.

The problem is for the following day : I have a 404.


I think the 404 is caused by the removing of the scroll context after 1mn of inactivity.


I have tried to increase the scroll duration (eg : 1 day) I have no 404 but I can't retrieve the new values (because the query seems to be based on the initial state).


My question : is there someting wrong in my configuration ? is there a way to do the job : cron a batch that retrieves - using a scroll process - the data of the previous day ?


Thanks by davance.

1 REPLY 1
Highlighted

Re: Nifi how to cron scrollelasticsearchhttp processor ?

Explorer

Did you get the issue resolved?

 

I would keep the "scroll" time to be short, like enough to process the one page worth of hits returned from the query. AFAIK, the "scroll" duration is reset everytime ES is accessed for that query context. 

Don't have an account?
Coming from Hortonworks? Activate your account here