PROBLEM: Oozie jobs gets stuck in PREP mode
ROOT CAUSE : Below are the possible reasons:
1. Wrong Namenode host/port in job.properties2. Wrong Resource manager host/port in the configurations.If there are lot of jobs stuck in the
1. Stop Oozie server from Ambari.2. Backup Oozie DB is cluster is production.3. Remove entries for stuck jobs from below tablesWF_JOBSCOORD_JOBSWF_ACTIONS4. Start Oozie server
I tried the above steps, it didn't work.
If you run your workflow:
Use "-run" command : oozie job -oozie http://localhost:11000/oozie -config job.properties -run
While you submit the coord-job use:
oozie job -oozie http://localhost:11000/oozie -config job.properties -submit
I wanted to interject that while both the above are definite valid possible causes of Oozie jobs stuck in PREP state, there may be several other possible causes which may need to be resolved such as:
1. Issues with the Yarn Resource Manager / MR Job Tracker, lack of resources either for the RM or queues for the user running the job.
2. Problems with the Oozie server getting to the oozie database server, the database server itself, or locks on tables.
3. Lack of resources to Oozie such as callable queues, java heap, GC thrashing, etc.
The above is a brief shortlist from review of support cases relating to Oozie jobs stuck in PREP. I want to emphasize that deleting records from the Oozie database should be ONLY done the last resort to solving this problem, and only needed if you have a very large mass of oozie workflows that cannot be killed in a timely fashion by an oozie CLI script. This should be only done at the direction of support, people knowledgeable with SQL, and the relationship between tables, columns, and rows in the oozie database as referential integrity and constraints are lacking in the schema design. The above post from 2017 also missed one key table COORD_ACTIONS, where if this data was not properly cleaned up, would break your Oozie purge and possibly cause other serious problems.