Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Job stuck in Accepted state under a specific pool

Re: Job stuck in Accepted state under a specific pool

Rising Star

Do you still have the RM log when the job was stuck in qatest? There maybe indication there.

Highlighted

Re: Job stuck in Accepted state under a specific pool

Explorer
sorry for late reply.
unfortunately i don't have logs now. I will share logs if i see the same situation again.

Re: Job stuck in Accepted state under a specific pool

Rising Star

Sorry, was not able to come back to the forum for a while. According to your latest screen shot (08-23), the total amount of memory in the cluster is 312 GB, unless you have changed the cluster since 08-03. Therefore, I think what you experienced on Aug 3rd is expected (Resources are not available in the cluster).  Even if you did have 16GB left in the cluster, the 16GB could be fragmented all across the nodes, so none of the node could run a 6GB container

Re: Job stuck in Accepted state under a specific pool

Explorer

I experience exactly the same issue. attaching 3 images to see what's happening. RM logs are nothing unusual...

 

Using CDH 5.4.5

cm-resource-pools.pngrm-gui-scheduler.pngaccepted resources.png

Re: Job stuck in Accepted state under a specific pool

Increasing Zookeeper jute.maxbuffer has fixed the problem for me. Increased from 12 MB to 48 MB and did rolling restart to fix the issue.

 

Thanks

Surya