I have a batch that must run every day at 2:00. This batch needs to process a big volume of data (the data of the previous day : about 100 000 elastic documents).
For these 2 reasons, i use :
- a "ScrollElasticSearchHttp" with a query that filters the data of the day before (see below) and a page size = 1000
- a cron that launches the processor above : */5 * 2 * * ?
I have no problem for the first day : the scroll iterates according the cron (a call every 5s) and retrieves all the pages.
The problem is for the following day : I have a 404.
I think the 404 is caused by the removing of the scroll context after 1mn of inactivity.
I have tried to increase the scroll duration (eg : 1 day) I have no 404 but I can't retrieve the new values (because the query seems to be based on the initial state).
My question : is there someting wrong in my configuration ? is there a way to do the job : cron a batch that retrieves - using a scroll process - the data of the previous day ?
Thanks by davance.
Did you get the issue resolved?
I would keep the "scroll" time to be short, like enough to process the one page worth of hits returned from the query. AFAIK, the "scroll" duration is reset everytime ES is accessed for that query context.