Hi all, I have this File Action (part of an Oozie Workflow) that runs every 15 minutes, and moves all the files in a "receiving" HDFS directory, putting them into an "in_process" HDFS directory.
Everything is OK unless for whatever reason the number of files in the "receiving" HDFS directory grows too much. If, let's say, that number gets to be > 30000 (not exactly, but around that number) the File Action fails, without meaningful errors in the logs
I'd need help to sort out a few possible options:
- Is it possible to manually specify the maximum number of files a File Action would be able to handle? E.g. setting more resources in whatever parameter, or specifying an explicit value somewhere?
- If I use a Shell Action instead, would it possibly be a better choice? I'm a bit hesitant because I'm afraid that, given that the "live files" getting copied in the "receiving" HDFS directory are coming in at a very high rate, the Shell Action could not be able to keep up with the changes inside the directory...
Looking forward to receiving your insights, thanks a lot!