Support Questions

Find answers, ask questions, and share your expertise

Yunikorn UI

avatar
Explorer

We recently upgraded to CDE 1.20.3. After the upgrade we encountered issues related to resource allocation. I suspect that the yunikorn is not properly functionining because jobs are waiting time even if there are available resources to distribute to the spark jobs.

Is there a yunikorn UI available from the CDE where we can access to monitor so we can easily monitor the yunikorn pods?

1 ACCEPTED SOLUTION

avatar
Super Collaborator

Hello @Ging 

Thanks for engaging Cloudera Community. Based on the Post, Your Team is seeing Resource Allocation Issues with Jobs remaining in Waiting State even if available resources & believe the Issue is linked with YuniKorn as the Issue is being observed after Upgrade to CDE v1.20.3. 

To your Q, Please find the requested details below:

(I) YuniKorn UI: YuniKorn UI is available via CDE UI > Administration > CDE Service "Service Details" > Resource Scheduler
Screenshot 2024-04-05 at 12.52.03 PM.png

(II) For any assumed Issue with YuniKorn, Always Capture the YuniKorn StateDump [1] while the Issue is being Observed. This is Extremely Important as YuniKorn StateDump is Realtime & doesn't help, if the StateDump is captured while the Issue isn't being observed.

(III) Collect the YuniKorn Scheduler & YuniKorn Admission Controller Pod Logs in YuniKorn Namespace while the Issue is happening.

(IV) Attempt a Restart of YuniKorn Scheduler to confirm if the Issue persists after Capturing all above Info. If Yes i.e. Issue persists, Engage Cloudera Support with the StateDump, Pod Logs & Job Event Log, showing the Job being Stuck in Waiting.

Hope the above answers your ask. 

- Smarak

[1] https://yunikorn.apache.org/docs/1.3.0/user_guide/troubleshooting/#obtain-full-state-dump 

View solution in original post

2 REPLIES 2

avatar
Super Collaborator

Hello @Ging 

Thanks for engaging Cloudera Community. Based on the Post, Your Team is seeing Resource Allocation Issues with Jobs remaining in Waiting State even if available resources & believe the Issue is linked with YuniKorn as the Issue is being observed after Upgrade to CDE v1.20.3. 

To your Q, Please find the requested details below:

(I) YuniKorn UI: YuniKorn UI is available via CDE UI > Administration > CDE Service "Service Details" > Resource Scheduler
Screenshot 2024-04-05 at 12.52.03 PM.png

(II) For any assumed Issue with YuniKorn, Always Capture the YuniKorn StateDump [1] while the Issue is being Observed. This is Extremely Important as YuniKorn StateDump is Realtime & doesn't help, if the StateDump is captured while the Issue isn't being observed.

(III) Collect the YuniKorn Scheduler & YuniKorn Admission Controller Pod Logs in YuniKorn Namespace while the Issue is happening.

(IV) Attempt a Restart of YuniKorn Scheduler to confirm if the Issue persists after Capturing all above Info. If Yes i.e. Issue persists, Engage Cloudera Support with the StateDump, Pod Logs & Job Event Log, showing the Job being Stuck in Waiting.

Hope the above answers your ask. 

- Smarak

[1] https://yunikorn.apache.org/docs/1.3.0/user_guide/troubleshooting/#obtain-full-state-dump 

avatar
Super Collaborator

Hello @Ging 

We hope the above Post answers your Q. We shall mark the Post as Resolved now. If you have any further Q, Feel free to Comment & We shall get back to you on the same.

- Smarak