Created on 04-27-2023 11:34 AM - edited on 05-05-2023 05:41 AM by VidyaSargur
… are platform instability, downtime, hardware failure, poor performance, cluster resource contention, repeated process failures, runaway live queries, critical services alarms, invisibility into alarm cacophony… the list goes on. If those are ailments you would like to remedy …
To this six-part series, where we’ll look at how to get control of the health of your Cloudera Data platform (CDP) environment. Out of the box, CDP performs superbly, but over time, if data architecture, data engineering, and DevOps best practices are not maintained, the Data City you’ve erected atop a solid CDP bedrock can become the wild, wild, west. Perhaps it’s time for some law and order to prevent further crimes against the tech.
More than a case study, we’ve interwoven best practices gleaned from multiple configurations and client sites into a comprehensive, easy to understand set of instructions to diagnose and resolve many of the issues that adversely impact CDP environmental health.
With each blog we’ll outline the symptoms and root causes of common environmental health challenges and prescribe solutions. Where we can, we’ll include valuable links to step-by-step instructions to guide you through successful implementation. When we conclude the series, we’ll share a homegrown tool, an environmental health scorecard, to monitor and manage the health of your environment.
There are many, many reasons that an environment may perform poorly, and certainly some resolutions take time and effort, but there is quite succulent low hanging fruit. Our great hope is that you find impactful quick wins that inspire you to pursue multiple avenues of health improvement. You may also decide to partner with our Cloudera Professional Services team who more than doubled a customer’s health score in two short quarters.
We’ve categorized aspects of environmental health for this series.
Into the cluster, platform, services, and processes. We won’t be able to make much progress if we do not have proper visibility into the problems. That’s observability. In this blog, we provide instructions and tools on how to gain visibility, suppress alarm noise, find and analyze the root causes of the most significant opportunities, and proactively notify your users when incidents occur
Of common datasets, pipelines, processes, and reports. Admittedly, data asset standardization is a multiyear journey; notwithstanding, addressing only your most problematic and resource-intensive processes and assets may yield more environmental health improvement than any other category. We’ll share best practices on how to locate and capitalize on those opportunities.
Includes hardware and services settings and configurations. Cloudera Data Platform (CDP) must be configured properly to function well with high performance. Furthermore, as business needs continually change, so will your use of the platform, and that will necessitate re-tuning. To help you on that journey, we’ll list some common symptoms, link them to root cause analysis steps, provide proper configuration guidelines, and outline the steps to properly tune your environment.
Includes the proper use of Impala, CDSW, Airflow, Nifi, and CM. You might be surprised at the adverse environmental impact of using CDSW as an ETL pipeline tool or using Impala to write unwieldy queries with an embarrassing number of joins. We’ve done it too. We confess. We’ll highlight the advantages of using Airflow to manage complex data pipelines with its facility to divide workflow into small independent tasks. We’ll list other do’s and don’ts.
Brings it all together by demonstrating how to measure, score, monitor, and control environmental health through dashboards that we provide for you along with instructions to hook them up to your logs.
If you’ve got the symptoms, the doctors are in. Let the healing begin!
Created on 04-27-2023 11:35 AM
Thank you, Raj!
Created on 04-27-2023 11:47 AM
Awesome article.