Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

What I need to check if Job taking 15-20 min more time from previous day?

avatar
Contributor
 
1 ACCEPTED SOLUTION

avatar

Hello Rahul

Your question is a little generic so hard to help you out much without things like the service used, the data read etc... This being said since we are in the yarn thread I suppose it is a yarn service like hive or spark. In your shoed I would go to to the yarn UI and job logs to understand where the latency happens:

Is it in init phase is yarn waiting to get the containers in which case ressources or max am per queue are possible configurations to look at.

Is it in the compute phase itself do you have "mappers" that are much longer in which case you need to look at things like container errors and restart or IO throughput, or data spill. The Tez UI has a very good tool Tez swimlane to get a high level view of the dag and get a sense of where to look. Same thing on the Spark side with the Spark UI.

Hope any of this helps

View solution in original post

1 REPLY 1

avatar

Hello Rahul

Your question is a little generic so hard to help you out much without things like the service used, the data read etc... This being said since we are in the yarn thread I suppose it is a yarn service like hive or spark. In your shoed I would go to to the yarn UI and job logs to understand where the latency happens:

Is it in init phase is yarn waiting to get the containers in which case ressources or max am per queue are possible configurations to look at.

Is it in the compute phase itself do you have "mappers" that are much longer in which case you need to look at things like container errors and restart or IO throughput, or data spill. The Tez UI has a very good tool Tez swimlane to get a high level view of the dag and get a sense of where to look. Same thing on the Spark side with the Spark UI.

Hope any of this helps