Member since
07-18-2019
1
Post
0
Kudos Received
0
Solutions
09-26-2019
11:15 AM
I wanted to interject that while both the above are definite valid possible causes of Oozie jobs stuck in PREP state, there may be several other possible causes which may need to be resolved such as:
1. Issues with the Yarn Resource Manager / MR Job Tracker, lack of resources either for the RM or queues for the user running the job.
2. Problems with the Oozie server getting to the oozie database server, the database server itself, or locks on tables.
3. Lack of resources to Oozie such as callable queues, java heap, GC thrashing, etc.
The above is a brief shortlist from review of support cases relating to Oozie jobs stuck in PREP. I want to emphasize that deleting records from the Oozie database should be ONLY done the last resort to solving this problem, and only needed if you have a very large mass of oozie workflows that cannot be killed in a timely fashion by an oozie CLI script. This should be only done at the direction of support, people knowledgeable with SQL, and the relationship between tables, columns, and rows in the oozie database as referential integrity and constraints are lacking in the schema design. The above post from 2017 also missed one key table COORD_ACTIONS, where if this data was not properly cleaned up, would break your Oozie purge and possibly cause other serious problems.
... View more