Member since
08-18-2023
4
Posts
1
Kudos Received
1
Solution
My Accepted Solutions
Title | Views | Posted |
---|---|---|
1941 | 08-18-2023 02:36 AM |
08-18-2023
03:02 AM
Hello,
Recently, I've been entrusted with a significant project at my organization that requires a deeper understanding and implementation of MLOps principles and practices. The primary objective outlined for me is to leverage DVC (Data Version Control) to automate our ML pipelines. Furthermore, there's an emphasis on mastering pipelines & experiment automation, ensuring that our ML workflows are efficient, reproducible, and scalable.
Given the above context, I'm keen on comprehensively understanding and implementing these processes. However, I'm currently at an impasse, trying to ascertain the most logical and efficient path forward. Specifically, I'm seeking guidance on how to methodically and effectively approach each of these tasks.
Could you provide me with a structured breakdown or a roadmap on how to proceed with automating pipelines using DVC, handling experiment automation, and building automated pipelines in general?
Additionally, if there are any foundational prerequisites or best practices that I should be aware of before diving in, I'd greatly appreciate that insight.
I followed this resource but didn't get too much- https://www.cloudera.com/tutorials/building-automated-ml-pipelines-in-cml.html
Thank you for your time, and I eagerly await your guidance on this matter.
... View more
Labels:
- Labels:
-
Data Visualization
08-18-2023
02:44 AM
Yes, Cloudera's management tools, especially Cloudera Manager, do provide insights and metrics about jobs running on Hadoop, including both MapReduce and Spark jobs. For job completion time estimation: Cloudera Manager: Within the Cloudera Manager interface, you can navigate to the specific service (like YARN or Spark) to view details about running or completed jobs. For each job, there's an estimated time of completion based on the progress and resources available. However, it's worth noting that these estimations can vary based on data skew, resource contention, and other factors. Resource Manager UI: For YARN based jobs, the YARN Resource Manager UI provides information about running applications, including their progress. The percentage completion might give a rough idea, but it doesn't directly estimate the completion time. Spark UI: For Spark jobs, the Spark UI provides insights into job stages, tasks, and their durations. While it doesn’t give a direct "time remaining" estimate, you can use the information about completed stages/tasks to infer how long the remaining stages/tasks might take. That being said, while these tools can provide some insights, predicting the exact completion time for distributed computing jobs can be challenging due to the dynamic nature of distributed resources, data imbalances, etc. To have more accurate estimations, it's recommended to: Monitor Resource Usage: Ensuring you have enough resources (memory, CPU, etc.) for your jobs. Optimize Your Jobs: Depending on the nature of your job, consider optimizing your code or the configuration. Historical Data: Look at the historical runtimes of similar jobs to provide a ballpark figure for future runs. I hope this provides clarity on your query. If you have any more questions or need further insights, please let us know. Resource- Cloudera
... View more
08-18-2023
02:36 AM
1 Kudo
Hi, If you're using Apache NiFi and the token you're trying to capture with the InvokeHTTP processor is too large to be stored as an attribute, you can follow the steps below to work around this limitation: Keep the token in the content of the FlowFile if it's returned by the InvokeHTTP processor. You can use processors like ReplaceText to wrap the token in the header format you need. For instance, if you need the header to be Authorization: Bearer {token}, then you can configure a ReplaceText processor to replace the content (i.e., the token) to match this format.
... View more
08-18-2023
02:29 AM
Hi, I agree with the solution, however, use tail -f /var/log/ambari-server/ambari-server.log to watch the logs while you try to start the service from the UI. This will give you real-time feedback.
... View more