Support Questions

Find answers, ask questions, and share your expertise

Is Cloudera have estimation time for jobs completion.

avatar
Explorer

Problem Summary
---------------------------------------------------
Is Cloudera have estimation time for jobs completion.

Problem Description
---------------------------------------------------
Hi Support,

I need to understand and have information on any time estimation of completion of jobs running in Hadoop respective of map reduce or spark jobs?

1 ACCEPTED SOLUTION

avatar
Master Collaborator

Hello @Shivakuk G'Day!

Thank you for bringing this to our community.

Additionally, May I also suggest our very own monitoring line-up Cloudera Observability where you can monitor/analyse/compare your workloads?

Here is a brilliant Blog introducing our Clouderas' Observability[0a]
[0a] https://blog.cloudera.com/beyond-monitoring-introducing-cloudera-observability/

With that, We also have well-written documentation on Observability covering: 

Release Notes
Overview
Configuration
How To - This section may help you manage, analyse, determine and troubleshoot, with Cloudera Observability.
Reference

Hope this helps!

If my answer helps with your question, Please click on 'Accept as Solution', and If you are satisfied with my reply, You can also hit the thumbs-up button:)

Cheers!

View solution in original post

5 REPLIES 5

avatar
Master Collaborator

Hadoop itself does not inherently provide real-time estimation of job completion time out of the box.However, Hadoop does have some features and tools that can help you monitor and estimate the progress and completion time of jobs

JobTracker/ResourceManager Web UI: Hadoop's JobTracker (in Hadoop 1.x) or ResourceManager Web UI (in Hadoop 2.x and later) provides information about the status and progress of running jobs. While it doesn't give you an exact completion time estimate, it does show the map and reduce progress, number of tasks completed, and other relevant details that can help you gauge the progress.

MapReduce Counters: Hadoop MapReduce jobs expose counters that provide insight into the progress of various phases of the job. You can use these counters to estimate how much work has been completed and how much is remaining.

Hadoop Job History Logs: Hadoop maintains detailed logs of job executions. By analyzing these logs, you can gain insights into the historical performance of jobs and potentially use this information to estimate completion times for similar jobs in the future.

Custom Scripting: You can also write custom scripts or applications that monitor the progress of jobs by querying Hadoop's APIs and estimating completion times based on historical data and current progress.

Remember that estimating job completion time in distributed systems like Hadoop can be challenging due to the dynamic nature of the environment and the potential variability in task execution times. It's important to understand that these estimates might not always be accurate and can be affected by various factors such as cluster load, data distribution, and hardware performance.





avatar
Community Manager

@Shivakuk Has the reply helped resolve your issue? If so, please mark the appropriate reply as the solution, as it will make it easier for others to find the answer in the future.  



Regards,

Vidya Sargur,
Community Manager


Was your question answered? Make sure to mark the answer as the accepted solution.
If you find a reply useful, say thanks by clicking on the thumbs up button.
Learn more about the Cloudera Community:

avatar

Yes, Cloudera's management tools, especially Cloudera Manager, do provide insights and metrics about jobs running on Hadoop, including both MapReduce and Spark jobs.

For job completion time estimation:

  1. Cloudera Manager: Within the Cloudera Manager interface, you can navigate to the specific service (like YARN or Spark) to view details about running or completed jobs. For each job, there's an estimated time of completion based on the progress and resources available. However, it's worth noting that these estimations can vary based on data skew, resource contention, and other factors.

  2. Resource Manager UI: For YARN based jobs, the YARN Resource Manager UI provides information about running applications, including their progress. The percentage completion might give a rough idea, but it doesn't directly estimate the completion time.

  3. Spark UI: For Spark jobs, the Spark UI provides insights into job stages, tasks, and their durations. While it doesn’t give a direct "time remaining" estimate, you can use the information about completed stages/tasks to infer how long the remaining stages/tasks might take.

That being said, while these tools can provide some insights, predicting the exact completion time for distributed computing jobs can be challenging due to the dynamic nature of distributed resources, data imbalances, etc.

To have more accurate estimations, it's recommended to:

  1. Monitor Resource Usage: Ensuring you have enough resources (memory, CPU, etc.) for your jobs.
  2. Optimize Your Jobs: Depending on the nature of your job, consider optimizing your code or the configuration.
  3. Historical Data: Look at the historical runtimes of similar jobs to provide a ballpark figure for future runs.

I hope this provides clarity on your query. If you have any more questions or need further insights, please let us know.

 

Resource- Cloudera

avatar
Master Collaborator

Hello @Shivakuk G'Day!

Thank you for bringing this to our community.

Additionally, May I also suggest our very own monitoring line-up Cloudera Observability where you can monitor/analyse/compare your workloads?

Here is a brilliant Blog introducing our Clouderas' Observability[0a]
[0a] https://blog.cloudera.com/beyond-monitoring-introducing-cloudera-observability/

With that, We also have well-written documentation on Observability covering: 

Release Notes
Overview
Configuration
How To - This section may help you manage, analyse, determine and troubleshoot, with Cloudera Observability.
Reference

Hope this helps!

If my answer helps with your question, Please click on 'Accept as Solution', and If you are satisfied with my reply, You can also hit the thumbs-up button:)

Cheers!

avatar
Master Collaborator

Hey @Shivakuk 
Circling back to see if my response was helpful. I am happy to help you if you have followup questions. Thanks!