Community Articles

Find and share helpful community-sourced technical articles.
Labels (1)
avatar
Super Guru

SYMPTOM: After upgrading ambari from 2.1.1 to 2.2.2.2 tried restarting oozie service which failed with error - " su: cannot set user id: Resource temporarily unavailable"

ERROR: Below are the error logs-

Execution, [[0000002-160227115902137-oozie-oozi-C@4]::CoordActionInputCheck:: Ignoring action. Coordinator job is not in RUNNING/RUNNINGWITHERROR/PAUSED/PAUSEDWITHERROR state, but state=SUSPENDED], Error Code: E1100 
2016-07-02 13:04:42,457 WARN CoordActionInputCheckXCommand:523 - SERVER[hdmlup000a.machine.group] USER[-] GROUP[-] TOKEN[-] APP[-] JOB[0000002-160227115902137-oozie-oozi-C] ACTION[0000002-160227115902137-oozie-oozi-C@5] E1100: Command precondition does not hold before execution, [[0000002-160227115902137-oozie-oozi-C@5]::CoordActionInputCheck:: Ignoring action. Coordinator job is not in RUNNING/RUNNINGWITHERROR/PAUSED/PAUSEDWITHERROR state, but state=SUSPENDED], Error Code: E1100 
2016-07-02 13:04:42,459 WARN CoordActionInputCheckXCommand:523 - SERVER[hdmlup000a.machine.group] USER[-] GROUP[-] TOKEN[-] APP[-] JOB[0000002-160227115902137-oozie-oozi-C] ACTION[0000002-160227115902137-oozie-oozi-C@6] E1100: Command precondition does not hold before execution, [[0000002-160227115902137-oozie-oozi-C@6]::CoordActionInputCheck:: Ignoring action. Coordinator job is not in RUNNING/RUNNINGWITHERROR/PAUSED/PAUSEDWITHERROR state, but state=SUSPENDED], Error Code: E1100 
2016-07-02 13:04:42,460 WARN CoordActionReadyXCommand:523 - SERVER[hdmlup000a.machine.group] USER[falcon] GROUP[-] TOKEN[] APP[FALCON_PROCESS_DEFAULT_Push03to04run03] JOB[0000002-160227115902137-oozie-oozi-C] ACTION[] E1100: Command precondition does not hold before execution, [[0000002-160227115902137-oozie-oozi-C]::CoordActionReady:: Ignoring job. Coordinator job is not in RUNNING state, but state=SUSPENDED], Error Code: E1100 
2016-07-02 13:04:53,076 INFO PauseTransitService:520 - SERVER[hdmlup000a.machine.group] USER[-] GROUP[-] TOKEN[-] APP[-] JOB[-] ACTION[-] Acquired lock for [org.apache.oozie.service.PauseTransitService] 
2016-07-02 13:04:53,086 INFO PauseTransitService:520 - SERVER[hdmlup000a.machine.group] USER[-] GROUP[-] TOKEN[-] APP[-] JOB[-] ACTION[-] Released lock for [org.apache.oozie.service.PauseTransitService]

ROOT CAUSE: The issue is probably due to nproc settings. You need to modify the nproc settings for particular service user.

RESOLUTION: Below were steps performed for resolution

1.Check output of "ps -u oozie -L | wc -l" 
Nproc limit for oozie was set to 16000 in ambari oozie config. 
2. Modified the nproc limit from 16000 to 32000 using ambari->services->oozie->configs 
3. Restarted oozie.The oozie process was down from ambari UI but was showing running using ps command. 
4.The issue was with the process was in stale state and was showing running from X no of days. 
5.We tried restarting oozie server but still the process was not getting restarted as checked from cli. 
6.Killed the oozie server process from cli also tried clearing agent cache using below command - 
mv /var/lib/ambari-agent/data/structured-out-status.json /var/lib/ambari-agent/data/structured-out-status.json.bak 
7. Restarted ambari agent process. 
8. Restarted oozie server process which worked well and now oozie process is showing right status in ps command output. 
543 Views
Version history
Last update:
‎12-23-2016 05:58 AM
Updated by:
Contributors