Created 08-02-2023 11:09 PM
Problem Summary
---------------------------------------------------
Is Cloudera have estimation time for jobs completion.
Problem Description
---------------------------------------------------
Hi Support,
I need to understand and have information on any time estimation of completion of jobs running in Hadoop respective of map reduce or spark jobs?
Created on 09-04-2023 12:40 PM - edited 09-04-2023 12:58 PM
Hello @Shivakuk G'Day!
Thank you for bringing this to our community.
Additionally, May I also suggest our very own monitoring line-up Cloudera Observability where you can monitor/analyse/compare your workloads?
Here is a brilliant Blog introducing our Clouderas' Observability[0a]
[0a] https://blog.cloudera.com/beyond-monitoring-introducing-cloudera-observability/
With that, We also have well-written documentation on Observability covering:
Release Notes
Overview
Configuration
How To - This section may help you manage, analyse, determine and troubleshoot, with Cloudera Observability.
Reference
Hope this helps!
If my answer helps with your question, Please click on 'Accept as Solution', and If you are satisfied with my reply, You can also hit the thumbs-up button:)
Cheers!
Created 08-09-2023 05:23 AM
Hadoop itself does not inherently provide real-time estimation of job completion time out of the box.However, Hadoop does have some features and tools that can help you monitor and estimate the progress and completion time of jobs
JobTracker/ResourceManager Web UI: Hadoop's JobTracker (in Hadoop 1.x) or ResourceManager Web UI (in Hadoop 2.x and later) provides information about the status and progress of running jobs. While it doesn't give you an exact completion time estimate, it does show the map and reduce progress, number of tasks completed, and other relevant details that can help you gauge the progress.
MapReduce Counters: Hadoop MapReduce jobs expose counters that provide insight into the progress of various phases of the job. You can use these counters to estimate how much work has been completed and how much is remaining.
Hadoop Job History Logs: Hadoop maintains detailed logs of job executions. By analyzing these logs, you can gain insights into the historical performance of jobs and potentially use this information to estimate completion times for similar jobs in the future.
Custom Scripting: You can also write custom scripts or applications that monitor the progress of jobs by querying Hadoop's APIs and estimating completion times based on historical data and current progress.
Remember that estimating job completion time in distributed systems like Hadoop can be challenging due to the dynamic nature of the environment and the potential variability in task execution times. It's important to understand that these estimates might not always be accurate and can be affected by various factors such as cluster load, data distribution, and hardware performance.
Created 08-16-2023 10:13 PM
@Shivakuk Has the reply helped resolve your issue? If so, please mark the appropriate reply as the solution, as it will make it easier for others to find the answer in the future.
Regards,
Vidya Sargur,Created 08-18-2023 02:44 AM
Yes, Cloudera's management tools, especially Cloudera Manager, do provide insights and metrics about jobs running on Hadoop, including both MapReduce and Spark jobs.
For job completion time estimation:
Cloudera Manager: Within the Cloudera Manager interface, you can navigate to the specific service (like YARN or Spark) to view details about running or completed jobs. For each job, there's an estimated time of completion based on the progress and resources available. However, it's worth noting that these estimations can vary based on data skew, resource contention, and other factors.
Resource Manager UI: For YARN based jobs, the YARN Resource Manager UI provides information about running applications, including their progress. The percentage completion might give a rough idea, but it doesn't directly estimate the completion time.
Spark UI: For Spark jobs, the Spark UI provides insights into job stages, tasks, and their durations. While it doesn’t give a direct "time remaining" estimate, you can use the information about completed stages/tasks to infer how long the remaining stages/tasks might take.
That being said, while these tools can provide some insights, predicting the exact completion time for distributed computing jobs can be challenging due to the dynamic nature of distributed resources, data imbalances, etc.
To have more accurate estimations, it's recommended to:
I hope this provides clarity on your query. If you have any more questions or need further insights, please let us know.
Resource- Cloudera
Created on 09-04-2023 12:40 PM - edited 09-04-2023 12:58 PM
Hello @Shivakuk G'Day!
Thank you for bringing this to our community.
Additionally, May I also suggest our very own monitoring line-up Cloudera Observability where you can monitor/analyse/compare your workloads?
Here is a brilliant Blog introducing our Clouderas' Observability[0a]
[0a] https://blog.cloudera.com/beyond-monitoring-introducing-cloudera-observability/
With that, We also have well-written documentation on Observability covering:
Release Notes
Overview
Configuration
How To - This section may help you manage, analyse, determine and troubleshoot, with Cloudera Observability.
Reference
Hope this helps!
If my answer helps with your question, Please click on 'Accept as Solution', and If you are satisfied with my reply, You can also hit the thumbs-up button:)
Cheers!
Created 09-05-2023 01:50 PM
Hey @Shivakuk
Circling back to see if my response was helpful. I am happy to help you if you have followup questions. Thanks!