Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Yunikorn UI

avatar
Explorer

We recently upgraded to CDE 1.20.3. After the upgrade we encountered issues related to resource allocation. I suspect that the yunikorn is not properly functionining because jobs are waiting time even if there are available resources to distribute to the spark jobs.

Is there a yunikorn UI available from the CDE where we can access to monitor so we can easily monitor the yunikorn pods?

1 ACCEPTED SOLUTION

avatar
Super Collaborator

Hello @Ging 

Thanks for engaging Cloudera Community. Based on the Post, Your Team is seeing Resource Allocation Issues with Jobs remaining in Waiting State even if available resources & believe the Issue is linked with YuniKorn as the Issue is being observed after Upgrade to CDE v1.20.3. 

To your Q, Please find the requested details below:

(I) YuniKorn UI: YuniKorn UI is available via CDE UI > Administration > CDE Service "Service Details" > Resource Scheduler
Screenshot 2024-04-05 at 12.52.03 PM.png

(II) For any assumed Issue with YuniKorn, Always Capture the YuniKorn StateDump [1] while the Issue is being Observed. This is Extremely Important as YuniKorn StateDump is Realtime & doesn't help, if the StateDump is captured while the Issue isn't being observed.

(III) Collect the YuniKorn Scheduler & YuniKorn Admission Controller Pod Logs in YuniKorn Namespace while the Issue is happening.

(IV) Attempt a Restart of YuniKorn Scheduler to confirm if the Issue persists after Capturing all above Info. If Yes i.e. Issue persists, Engage Cloudera Support with the StateDump, Pod Logs & Job Event Log, showing the Job being Stuck in Waiting.

Hope the above answers your ask. 

- Smarak

[1] https://yunikorn.apache.org/docs/1.3.0/user_guide/troubleshooting/#obtain-full-state-dump 

View solution in original post

2 REPLIES 2

avatar
Super Collaborator

Hello @Ging 

Thanks for engaging Cloudera Community. Based on the Post, Your Team is seeing Resource Allocation Issues with Jobs remaining in Waiting State even if available resources & believe the Issue is linked with YuniKorn as the Issue is being observed after Upgrade to CDE v1.20.3. 

To your Q, Please find the requested details below:

(I) YuniKorn UI: YuniKorn UI is available via CDE UI > Administration > CDE Service "Service Details" > Resource Scheduler
Screenshot 2024-04-05 at 12.52.03 PM.png

(II) For any assumed Issue with YuniKorn, Always Capture the YuniKorn StateDump [1] while the Issue is being Observed. This is Extremely Important as YuniKorn StateDump is Realtime & doesn't help, if the StateDump is captured while the Issue isn't being observed.

(III) Collect the YuniKorn Scheduler & YuniKorn Admission Controller Pod Logs in YuniKorn Namespace while the Issue is happening.

(IV) Attempt a Restart of YuniKorn Scheduler to confirm if the Issue persists after Capturing all above Info. If Yes i.e. Issue persists, Engage Cloudera Support with the StateDump, Pod Logs & Job Event Log, showing the Job being Stuck in Waiting.

Hope the above answers your ask. 

- Smarak

[1] https://yunikorn.apache.org/docs/1.3.0/user_guide/troubleshooting/#obtain-full-state-dump 

avatar
Super Collaborator

Hello @Ging 

We hope the above Post answers your Q. We shall mark the Post as Resolved now. If you have any further Q, Feel free to Comment & We shall get back to you on the same.

- Smarak