About MichaelBush

MichaelBush · ‎06-10-2023

Summary Cloudera Manager is the best-in-class holistic interface that provides end-to-end system management and key enterprise features to provide granular visibility into and control over every part of an enterprise data hub. From time to time, you’ll need to increase the logging level if you’re troubleshooting issues within Cloudera Manager. Change your Cloudera Manager Server logging level The Cloudera Manager Server must be configured from the command line within Cloudera Manager Server itself. Log in as root to your Cloudera Manager Server: mbush@mbush-MBP16 CDSW % ssh root@<CM SERVER FQDN> Check the Cloudera Manager Server log4j file: [root@<CM SERVER FQDN> ~]# cat /etc/cloudera-scm-server/log4j.properties # Copyright (c) 2012 Cloudera, Inc. All rights reserved. # # !!!!! IMPORTANT !!!!! # The Cloudera Manager server finds its log file by querying log4j. It # assumes that the first file appender in this file is the server log. # See LogUtil.getServerLogFile() for more details. # # Define some default values that can be overridden by system properties cmf.root.logger=INFO,CONSOLE cmf.log.dir=. cmf.log.file=cmf-server.log cmf.perf.log.file=cmf-server-perf.log cmf.jetty.log.file=cmf-server-nio.log .. .. .. The key line to control the logging level of the Cloudera Manager Server is highlighted in red. This can be done by amending the cmf.root.logger parameter and restarting the CM Server: cmf.root.logger=DEBUG,CONSOLE #RESTART THE CLOUDERA MANAGER SERVICE TO COMMIT THE AMENDMENT [root@<CM SERVER FQDN> ~]# systemctl restart cloudera-scm-server If you would like to know more about controlling further elements, refer to another very helpful Cloudera Community page - How to enable debug logging for Cloudera Manager server. This blog also goes into more detail about how you can amend the logging level at the CM Server binary level (if you are simply unable to get the CM Server to start at all).

MichaelBush · ‎06-10-2023

Summary Cloudera Manager is the best-in-class holistic interface that provides end-to-end cluster management and key enterprise features to provide granular visibility into and control over every part of an open data lakehouse. The optimization steps below complement Cloudera’s Optimize the Cloudera Manager Server page. Investigation & Resolution Monitor your Cloudera Manager Server Heap We have provided a useful suite of Cloudera Manager dashboards in the blog Deploy your Cloudera Manager Dashboards. The dashboard called “MB - MGMT Cluster - JVM GC Sizing” includes a set of charts that focus on the Cloudera Manager Server heap which will enable you to easily visualize abnormal characteristics of the heap. This portion shows a 6hr time series set: Tune your Cloudera Manager Server Heap The Cloudera Manager Server must be configured from the command line within Cloudera Manager Server itself. Log in as root to your Cloudera Manager Server: mbush@mbush-MBP16 CDSW % ssh root@<CM SERVER FQDN> Check the Cloudera Manager Server config file: [root@<CM SERVER FQDN> ~]# cat /etc/default/cloudera-scm-server # # Specify any command line arguments for the Cloudera SCM Server here. # CMF_SERVER_ARGS="" # # Locate the JDBC driver jar file. # # The default value is the default system mysql driver on RHEL/CentOS/Ubuntu # and the standard, documented location for where to put the oracle jar in CM # deployments. # export CMF_JDBC_DRIVER_JAR="/usr/share/java/mysql-connector-java.jar:/usr/share/java/oracle-connector-java.jar:/usr/share/java/postgresql-connector-java.jar" # # You can override JAVA_HOME here if your java is not on the normal search path # export JAVA_HOME=/usr/java/default # # Java Options. # # Default value sets Java maximum heap size to 2GB, and Java maximum permanent # generation size to 256MB. # export CMF_JAVA_OPTS="-Xmx4G -XX:MaxPermSize=256m -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/tmp" The key line to control the tuning of the Cloudera Manager Server heap is highlighted in red. Above, you can see the default settings that we get with any vanilla deployment of the Cloudera Manager service. These parameters will need to be tuned as your Cloudera CDP cluster becomes larger and busier, and that tuning is particularly important when any serious use of the Cloudera Manager API is introduced. We recommend that you suitably raise the overall heap size based on your cluster size and CM API needs, and disable adaptive heap sizing & control the JVM heap ratio. This can be done by amending the CMF_JAVA_OPTS and restarting the CM Server: export CMF_JAVA_OPTS="-Xms16G -Xmx16G -XX:NewRatio=2 -XX:MaxPermSize=256m -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/tmp" #RESTART THE CLOUDERA MANAGER SERVICE TO COMMIT THE AMENDMENT [root@<CM SERVER FQDN> ~]# systemctl restart cloudera-scm-server

MichaelBush · ‎06-10-2023

Summary After the upgrade from CDH to CDP, fundamental instability was observed within the Ranger Audits UI (within the Ranger Admin Service), and Infra-Solr Roles were constantly exhibiting API liveness errors. Total audits daily count: 136,553,808 Sample screenshot of Infra-Solr Health check errors Sample screenshot of specific Infra-Solr Server Health check errors Investigation Initial analysis of the number of daily audits within the Ranger service confirmed that there were as many as 1B audits per day. With only 2 Infra-Solr servers, the default configuration of a CDH to CDP upgrade needed to be tweaked to include best practices for the Infra-Solr ranger_audits collection. Reduce Ranger Audits verbosity (see this complementary document: Ranger Audit Verbosity). Assess Server Design. The count of Infra-Solr servers is important when deciding how the ranger_audits collection should be built. A single Solr Server is not recommended as it would not be resilient. Consider at least 2 replicas per shard for any collection to facilitate the split of the ranger_audits collection into 6 shards, with 2 replicas for each, while still maintaining the best practice guidelines within Solr. Resolution The following public documentation will assist with a deeper understanding of how you might choose to align with best practices given the hardware you have available and the volume of audits being recorded within the service - Calculating Infra Solr resource needs. Configure the following 3 parameters within the Ranger Service (within CM) according to best practices. The example below is for a cluster of 3 Infra-Solr servers with 3 shards configured for the ranger_audits collection, with 2 replicas per shard, and limiting the number of maximum shards to 6 (which is a multiple of the 1st two parameters): Configure the TTL (Time To Live) for audits that are propagated into the Infra-Solr ranger_audits collection. This requirement should be defined by the business, for instance, 25 days, but TTL only impacts audit visibility within the Ranger UI; all audits will remain accessible within HDFS. Ranger - Delete ranger_audits collection Ensure all Solr Servers are healthy and available. Then, in order to restructure it, fully delete the ranger_audits collection and monitor the status (example below). NOTE - the example date of 8Mar2022 in the below example is for auditing purposes - it’s the date that the full collection deletion occurred. DELETE RANGER AUDITS COLLECTION http://<Infra-Solr-Server>:18983/solr/admin/collections?action=DELETE&name=ranger_audits&async=del_ranger_audits8Mar2022 REQUEST THE STATUS OF AN ASYNC CALL http://<Infra-Solr-Server>:18983/solr/admin/collections?action=REQUESTSTATUS&requestid=del_ranger_audits8Mar2022 An example of a successful delete command issued to the URL: { "responseHeader":{ "status":0, "QTime":9}, "requestid":"del_ranger_audits20Apr2022"} Restart Ranger Admin service. When you perform the restart, it will recreate the ranger_audits collection based on the parameters defined earlier.

MichaelBush · ‎06-10-2023

Summary Infra-Solr service exhibits fundamental stability issues after upgrading CDH to CDP. Sample screenshot of Infra-Solr Health check errors Sample screenshot of specific Infra-Solr Server Health check errors Investigation The Infra-Solr service hosts the ranger_audits collection which is used to display cluster audit information within the Ranger Admin UI. Perform preliminary analysis using Ranger Admin UI - Audits for a single day as demonstrated below. [NOTE: these sample screenshots were taken after resolving the issues; your audit counts will likely be much higher]. Total audits daily count: 136,553,808 Total Impala audits daily count: 5,901,146 Total hbaseregional audits daily count: 1,178,831 Total hbaseregional (access type scanneropen) audits daily count: 0 (due to the complete exclusion of these events) Total hdfs audits daily count: 128,681,418 Total hdfs (access type liststatus) audits daily count: 0 (due to the complete exclusion of these events) Assemble and analyze audit counts. The actual pre-resolution values for this case study were: Total number of Ranger audits - 705,875,710 Application - Impala - 6,719,878 Application - hbaseRegional - 389,896,166 Application - hbaseRegional; Access Type - scannerOpen - 261,735,436 Application - hdfs - 308,644,209 Application - hdfs; Access Type - listStatus - 212,728,345 The total count of Ranger audits (700M) is excessively voluminous. Audit verbosity is a primary contributing factor to Infra-Solr service instability because Ranger Audits are stored within an Infra-Solr collection - ranger_audits, and they are presented within the Ranger Admin UI. Ranger_audits collection is overwhelming Infra-Solr Servers, leading to Web Server Status Unknown / API Liveness check failures. To reduce audit verbosity, identify meaningful and meaningless events using the Infra-Solr API. URL examples for reference only: Query by date/time range http://lannister-005.edh.cloudera.com:18983/solr/ranger_audits/select?q=evtTime:[2022-02-16T00:00:00.000Z+TO+2022-02-16T11:59:59.000Z]&sort=evtTime+desc select all: oldest http://lannister-005.edh.cloudera.com:18983/solr/ranger_audits/select?q=*:*&sort=evtTime+asc&rows=1000 select all: newest http://lannister-005.edh.cloudera.com:18983/solr/ranger_audits/select?q=*:*&sort=evtTime+desc&rows=1000 Curl examples for reference only: > Query by date/time range && number of rows to capture (important) > -g required to disable globbing of the date range > This is verbose curl -g --negotiate -u: "http://lannister-005.edh.cloudera.com:18983/solr/ranger_audits/select?q=evtTime:[2022-02-17T00:00:00.000Z+TO+2022-02-17T11:59:59.999Z]&rows=100000&sort=evtTime+desc" > RangerAuditSolrOutput17Feb22_100000Rows.text > This is the above query, but narrowing down fewer fields (that you want to see) curl -g --negotiate -u: "http://lannister-005.edh.cloudera.com:18983/solr/ranger_audits/select?fl=access%2Cenforcer%2Cagent%2Crepo%2CreqUser%2Cresource%2Caction&q=evtTime:[2022-02-17T00:00:00.000Z+TO+2022-02-17T11:59:59.999Z]&rows=100000&sort=evtTime+desc" > RangerAuditSolrOutput17Feb22_100000Rows.text > This is the above query, but narrowing down ever fewer fields (that you want to see) curl -g --negotiate -u: "http://lannister-005.edh.cloudera.com:18983/solr/ranger_audits/select?fl=access%2Crepo&q=evtTime:[2022-02-17T00:00:00.000Z+TO+2022-02-17T11:59:59.999Z]&rows=100000&sort=evtTime+desc" > RangerAuditSolrOutput17Feb22_100000Rows.text > Select all: oldest curl --negotiate -u: "http://lannister-005.edh.cloudera.com:18983/solr/ranger_audits/select?q=*:*&sort=evtTime+asc&rows=1000" > RangerAuditSolrOutput17Feb22.text > Select all: newest curl --negotiate -u: "http://lannister-005.edh.cloudera.com:18983/solr/ranger_audits/select?q=*:*&sort=evtTime+desc&rows=1000" > RangerAuditSolrOutput17Feb22.text In this case study, 48 curl commands were executed to get a balanced picture over a 24-hour period, pulling 100,000 audit events every 30 minutes. NOTE: The Infra-Solr server must render the output; 100,000+ events can easily crash a 30GB Infra-Solr Server. Do not pull any more for that time interval. curl -g --negotiate -u: "http://lannister-005.edh.cloudera.com:18983/solr/ranger_audits/select?fl=access%2Crepo&q=evtTime:[2022-02-17T23:30:00.000Z+TO+2022-02-17T23:59:59.999Z]&rows=100000&sort=evtTime+desc" > RangerAuditSolrOutput17Feb22_100000Rows_2330-2359.text curl -g --negotiate -u: "http://lannister-005.edh.cloudera.com:18983/solr/ranger_audits/select?fl=access%2Crepo&q=evtTime:[2022-02-17T23:00:00.000Z+TO+2022-02-17T23:29:59.999Z]&rows=100000&sort=evtTime+desc" > RangerAuditSolrOutput17Feb22_100000Rows_2300-2329.text curl -g --negotiate -u: "http://lannister-005.edh.cloudera.com:18983/solr/ranger_audits/select?fl=access%2Crepo&q=evtTime:[2022-02-17T22:30:00.000Z+TO+2022-02-17T22:59:59.999Z]&rows=100000&sort=evtTime+desc" > RangerAuditSolrOutput17Feb22_100000Rows_2230-2259.text curl -g --negotiate -u: "http://lannister-005.edh.cloudera.com:18983/solr/ranger_audits/select?fl=access%2Crepo&q=evtTime:[2022-02-17T22:00:00.000Z+TO+2022-02-17T22:29:59.999Z]&rows=100000&sort=evtTime+desc" > RangerAuditSolrOutput17Feb22_100000Rows_2200-2229.text .. REPEAT THE COMMANDS WITH RELEVANT EXAMPLES .. curl -g --negotiate -u: "http://lannister-005.edh.cloudera.com:18983/solr/ranger_audits/select?fl=access%2Crepo&q=evtTime:[2022-02-17T01:30:00.000Z+TO+2022-02-17T01:59:59.999Z]&rows=100000&sort=evtTime+desc" > RangerAuditSolrOutput17Feb22_100000Rows_0130-0159.text curl -g --negotiate -u: "http://lannister-005.edh.cloudera.com:18983/solr/ranger_audits/select?fl=access%2Crepo&q=evtTime:[2022-02-17T01:00:00.000Z+TO+2022-02-17T01:29:59.999Z]&rows=100000&sort=evtTime+desc" > RangerAuditSolrOutput17Feb22_100000Rows_0100-0129.text curl -g --negotiate -u: "http://lannister-005.edh.cloudera.com:18983/solr/ranger_audits/select?fl=access%2Crepo&q=evtTime:[2022-02-17T00:30:00.000Z+TO+2022-02-17T00:59:59.999Z]&rows=100000&sort=evtTime+desc" > RangerAuditSolrOutput17Feb22_100000Rows_0030-0059.text curl -g --negotiate -u: "http://lannister-005.edh.cloudera.com:18983/solr/ranger_audits/select?fl=access%2Crepo&q=evtTime:[2022-02-17T00:00:00.000Z+TO+2022-02-17T00:29:59.999Z]&rows=100000&sort=evtTime+desc" > RangerAuditSolrOutput17Feb22_100000Rows_0000-0029.text The 48 output files were simply parsed to ascertain the most frequent Ranger audit access types (see the example below when creating your own): grep access RangerAuditSolrOutput17Feb22* | more | sort -rn | uniq -c | sort -rn | awk -F ' ' '{sum+=$1;}END{print sum;}' egrep "listStatus|scannerOpen" RangerAuditSolrOutput17Feb22* | more | sort -rn | uniq -c | sort -rn | awk -F ' ' '{sum+=$1;}END{print sum;}' This example groups audit types by category to assist you in selecting what is meaningful and what is not: grep access RangerAuditSolrOutput17Feb22_100000Rows.text | more | sort -rn | uniq -c | sort -rn 50517 "access":"listStatus", 23782 "access":"scannerOpen", 14559 "access":"get", 5193 "access":"put", 2081 "access":"open", 1884 "access":"delete", 1394 "access":"WRITE", 336 "access":"rename", 126 "access":"contentSummary", 84 "access":"checkAndPut", 26 "access":"mkdirs", 6 "access":"compactSelection", 5 "access":"flush", 4 "access":"getAclStatus", 3 "access":"compact", In this case study, up to 1B audit events were being recorded per day with 65-70% sourcing from HDFS - listStatus and HBase - scannerOpen. Such pure metadata operations events were meaningless to DevOps, nevertheless, we verified they were also meaningless to the business before attempting to exclude them. Retain the ‘get’, ‘put’, ‘open’, ‘delete,’ and other key audits. Assess the Infra-Solr & ranger_audits collection design – Infra-Solr server count, shards, and replicas count – which play an important role in stability. This complimentary document covers those assessment steps: Ranger - Rebuild ranger_audits). Resolution Tune Ranger to exclude unwanted event collection. Edit the cm_hdfs service configuration: Exclude the ‘listStatus’ audit type from the ‘Audit Filter’ section: Edit the cm_hbase service configuration: Exclude the ‘scannerOpen’ audit type from the ‘Audit Filter’ section: Excluding unmeaningful events provided 3 benefits: Infra-Solr and the ranger_audits collection stability was greatly improved and facilitated manageability. Infra-Solr and ranger_audits collection required only 30-35% of the resources to perform the same tasks Ranger audit history required only 30-35% of HDFS disk space when writing to /ranger/….

MichaelBush · ‎06-10-2023

Summary It is always a good idea to review your Kudu Rebalancer settings so that all hardware is optimally utilized when Kudu Rebalancing activities are being performed. Investigation Kudu Configuration Balancer configuration properties Although the general kudu default parameters have not proven to adversely impact Kudu Rebalancing operations, the following property change is recommended to speed up that process. Property Default Cloudera Chosen Value rb_max_moves_per_server 5 10 Avoid Landmines Some key notes before performing the rebalancing activities after setting up the services/disks: Never run both the HDFS & Kudu Rebalancers at the same time The contention between both may cause issues Perform the Rebalancing activities in the order of Kudu first, HDFS second Due to Kudu being unable to track capacity utilization Performing Kudu Rebalancing Activities We recommend that you perform these actions from within CM to provide full visibility into the Rebalancer status as well as when the action has started and finished. Kudu Go to CM - Kudu - Actions - Run Kudu Rebalancer Tool

MichaelBush · ‎06-10-2023

Summary Are you having issues with more queries being handled by a single Impala Coordinator? Does this eventually lead to OOM scenarios? Let’s consider you have 3 Impala Coordinators within your cluster and notice that there are queries that skew onto any one of the Impala Coordinators and overwhelm it. Note how one of the Impala Coordinators in the above example has 73 running queries, and the other 2 have relatively few. Investigation Source IP Persistence To ascertain why any Impala Coordinator can skew the number of running queries that are active on it, look at the way the proxy is set up to handle incoming queries. ‘Source IP Persistence’ means setting up sessions from the same IP address to always go to the same coordinator. This setting is required when setting up high availability with Hue. It is also required to avoid the Hue message ‘results have expired’, which indicates when a query is sent to the cluster on one coordinator but the result doesn’t return to the user via the same coordinator/Hue Server. Example HAProxy Configuration for Source IP Persistence The public docs for setting up HAProxy for Impala - Configuring Load Balancer for Impala. Example setup of Hue-Impala connectivity within /etc/haproxy/haproxy.cfg as follows: listen impala-hue :21052 mode tcp stats enable balance source timeout connect 5000ms timeout queue 5000ms timeout client 3600000ms timeout server 3600000ms # Impala Nodes server impala-coordinator-001.fqdn impala-coordinator-001.fqdn:21050 check server impala-coordinator-002.fqdn impala-coordinator-002.fqdn:21050 check server impala-coordinator-003.fqdn impala-coordinator-003.fqdn:21050 check Now let’s review what can impact the overall connection count into an Impala Coordinator: Hue, Hive & Impala timeout settings. Example Timeout Settings The following settings might mimic what you have currently set within your Hue, Hive & Impala services. Hue Hive Impala Proposed Timeout Settings Whilst the actual settings will vary cluster by cluster, we recommend moving away from the default settings and setting all of the idle parameters to 2 hours across the board in all 3 services: Hue, Hive & Impala. This is an initial goal of introducing timeouts whilst monitoring the user experience. The ultimate best practice in this area is to head toward having: Idle Query Timeouts of 300 seconds (or 5 minutes) Idle Session Timeouts of 600 seconds (or 10 minutes) NOTE - all of the parameters being discussed relate to ‘idle’ sessions and queries; in other words, the user has to have left either the session or query in an idle state before the idle parameters will kick in. No active session or query will be captured by this change in the service(s) behavior (s). Resolution Hue Steps to perform: Go to CM - Hue - Configuration Search for “Auto Logout Timeout” Change to 2 hours Restart Hue Service Hive Steps to perform: Go to CM - Hive - Configuration Search for “Idle Operation Timeout” Change to 300 seconds Search for “Idle Session Timeout” Change to 600 seconds Restart Hive Service Hive on Tez Steps to perform: Go to CM - Hive on Tez - Configuration Search for “Idle Operation Timeout” Change to 300 seconds Search for “Idle Session Timeout” Change to 600 seconds Restart Hive on Tez Service Impala Steps to perform: Go to CM - Impala - Configuration Search for “Idle Query Timeout” Change to 300 seconds Search for “Idle Session Timeout” Change to 600 seconds Restart Impala Service

MichaelBush · ‎06-10-2023

Summary After you experience a disk failure on a worker node then replace the disk, you’ll need to ensure that the disk is suitably rebalanced within the Kudu Service at the local level. Investigation & Resolution Purging a Tablet Server There isn’t currently a method to rebalance the replicas on a single Tablet Server disk array. This means that we need to empty the node and reintroduce it so that it can be used again from scratch. We begin by quiescing the Tablet Server. Quiesce the Tablet Server Quiesce essentially means to stop the Tablet Server from hosting any leaders in order to: Make other replicas on live Tablet Servers become the leaders Prevent this Tablet Server from becoming a leader for any other reason Allow this Tablet Server to be read from (the replicas that are still present) Check Quiesce Status sudo -u kudu kudu tserver quiesce status <Worker-Node-FQDN> Quiescing | Tablet Leaders | Active Scanners -----------+----------------+----------------- true | 0 | 0 Quiesce Start sudo -u kudu kudu tserver quiesce start <Worker-Node-FQDN> Put the Tablet Server into Maintenance Mode Maintenance Mode stops the Tablet Server from being used completely. The maintenance mode commands require you to retrieve the UUID of the Tablet Server first. We can get this information from a tserver list command: sudo -u kudu kudu tserver list <Master-Node1-FQDN>,<Master-Node2-FQDN>,<Master-Note 3-FQDN> An example that then targets the server you want to work on sudo -u kudu kudu tserver list <Master-Node1-FQDN>,<Master-Node2-FQDN>,<Master-Node3-FQDN> | grep <Worker-Node-FQDN> 5e103ac84707495e843a4553ac622f20 | <Worker-Node-FQDN>:7050 Put the Tablet Server into Maintenance Mode sudo -u kudu kudu tserver state enter_maintenance <Master-Node1-FQDN>,<Master-Node2-FQDN>,<Master-Note 3-FQDN> 5e103ac84707495e843a4553ac622f20 Exit the Tablet Server from Maintenance Mode sudo -u kudu kudu tserver state exit_maintenance <Master-Node1-FQDN>,<Master-Node2-FQDN>,<Master-Note 3-FQDN> 5e103ac84707495e843a4553ac622f20 Run ksck to check the status of Kudu Service / TS to be purged This will confirm the status of both Quiesce and Maintenance Mode for every Tablet Server in the cluster, (in our example - <Worker-Node-FQDN>😞 sudo -u kudu kudu cluster ksck <Master-Node1-FQDN>,<Master-Node2-FQDN>,<Master-Node3-FQDN> 2>&1 | tee ksck.out The above command outputs the ksck to both the terminal and a file called ‘ksck.out’. This allows us to review the information from both perspectives and also create a record of the output in the file. But taking our example of purging <Worker-Node-FQDN> into account, the following information is key: Tablet Server Summary This is a list of all Tablet Servers in the cluster. We’ve focused on just <Worker-Node-FQDN> and the surrounding TS’s for illustrative purposes. Notice the text in RED - <Worker-Node-FQDN> is quiescing and has no leaders running on it. Tablet Server Summary UUID | Address | Status | Location | Quiescing | Tablet Leaders | Active Scanners ----------------------------------+---------------------------------+---------+-------------+-----------+----------------+----------------- … 59e6ca5107754c24b649ee9c9acfccfb | <Worker-Node-FQDN>:7050 | HEALTHY | /CabinetE01 | false | 47 | 0 5e103ac84707495e843a4553ac622f20 | <Worker-Node-FQDN>:7050 | HEALTHY | /CabinetA08 | true | 0 | 0 5edf82f0516b4897b3a7991a7e67d71c | <Worker-Node-FQDN>:7050 | HEALTHY | /CabinetA07 | false | 1452 | 0 … Tablet Server State (maintenance mode) This section shows that the TS is in maintenance mode. Tablet Server States Server | State ----------------------------------+------------------ 5e103ac84707495e843a4553ac622f20 | MAINTENANCE_MODE Purge the Tablet Server The following command instructs kudu to ignore the <Worker-Node-FQDN> node AND move replicas away from it: sudo -u kudu /tmp/kudu cluster rebalance <Master-Node1-FQDN>,<Master-Node2-FQDN>,<Master-Note 3-FQDN> -ignored_tservers=5e103ac84707495e843a4553ac622f20 -move_replicas_from_ignored_tservers Again, importantly, the Tablet Server has to have been successfully quiesced and put into maintenance mode to avoid any issues with the Kudu service. A simple break in VPN or shell terminal will kill the rebalance command. This won't affect Kudu, but it will stop the process. In order to work around this and retain information during the process, use the following command to output the rebalance status into the active terminal session as well as a file: sudo -u kudu /tmp/kudu cluster rebalance <Master-Node1-FQDN>,<Master-Node2-FQDN>,<Master-Node3-FQDN> -ignored_tservers=5e103ac84707495e843a4553ac622f20 -move_replicas_from_ignored_tservers 2>&1 | tee <Worker-Node-FQDN>-rebalance.out & Re-introduce the Tablet Server After the Kudu Tablet Server has been purged, it’s time to reintroduce it into the Kudu service so that it can be used again. Exit the Tablet Server from Maintenance Mode sudo -u kudu kudu tserver state exit_maintenance <Master-Node1-FQDN>,<Master-Node2-FQDN>,<Master-Note 3-FQDN> 5e103ac84707495e843a4553ac622f20 Unquiesce the Tablet Server sudo -u kudu kudu tserver quiesce stop <Worker-Node-FQDN> Rebalance the Kudu Service We now have a Kudu Tablet Server that has been quiesced and purged. It’s time to rebalance the Kudu service and share the Tablets back onto the recently purged Kudu Tablet Server. Go to CM - Kudu - Actions - Run Kudu Rebalancer Tool:

MichaelBush · ‎06-10-2023

Summary When you have experienced a disk failure on a worker node and have had the disk replaced, you’ll need to ensure that the disk is suitably rebalanced within the Kudu Service at the local level. Investigation HDFS Disk Balancer - Explained This is an area that already has a great Blog written around it: How-to: Use the New HDFS Intra-DataNode Disk Balancer in Apache Hadoop Please read through the blog and follow the guidance to verify that you have already set up the HDFS service to be able to perform this necessary action. Resolution HDFS Disk Balancer - Execution Let’s go through the process of performing an HDFS Intra-DataNode Disk Rebalancing process. Obtain a local HDFS DataNode Kerberos Ticket cd /var/run/cloudera-scm-agent/process/`ls -larth /var/run/cloudera-scm-agent/process | grep -i hdfs-DATANODE | tail -1 | awk '{print $9}'` kinit -kt hdfs.keytab hdfs/`hostname -f`@<ClusterDomain> Create a Disk Balancer Plan hdfs diskbalancer -plan `hostname -f` -bandwidth 100 -thresholdPercentage 5 Example of a successful creation of a disk balancer plan: hdfs diskbalancer -plan `hostname -f` -bandwidth 100 -thresholdPercentage 5 INFO balancer.NameNodeConnector: getBlocks calls for hdfs://nameservice1 will be rate-limited to 20 per second INFO balancer.KeyManager: Block token params received from NN: update interval=10hrs, 0sec, token lifetime=10hrs, 0sec INFO block.BlockTokenSecretManager: Setting block keys INFO balancer.KeyManager: Update block keys every 2hrs, 30mins, 0sec INFO planner.GreedyPlanner: Starting plan for Node : <Worker-Node-FQDN>:9867 INFO planner.GreedyPlanner: Disk Volume set 76c137f0-5d0c-4de3-b166-5c0ac29b77d1 Type : DISK plan completed. INFO planner.GreedyPlanner: Compute Plan for Node : <Worker-Node-FQDN>:9867 took 46 ms INFO command.Command: Writing plan to: INFO command.Command: /system/diskbalancer/2023-Mar-13-02-50-35/<Worker-Node-FQDN>.plan.json Writing plan to: /system/diskbalancer/2023-Mar-13-02-50-35/<Worker-Node-FQDN>.plan.json Execute a Disk Balancer Plan hdfs diskbalancer -execute /system/diskbalancer/2023-Mar-13-02-50-35/<Worker-Node-FQDN>.plan.json Example of a successful execution of a disk balancer plan: hdfs diskbalancer -execute /system/diskbalancer/2023-Mar-13-02-50-35/<Worker-Node-FQDN>.plan.json INFO command.Command: Executing "execute plan" command Query a running Disk Balancer Plan hdfs diskbalancer -query `hostname -f` Example of querying a running disk balancer plan: hdfs diskbalancer -query `hostname -f` INFO command.Command: Executing "query plan" command. Plan File: /system/diskbalancer/2023-Mar-13-02-50-35/<Worker-Node-FQDN>.plan.json Plan ID: 9b0d03edee9d4285cfea5fe13247d8e23cb4557d Result: PLAN_UNDER_PROGRESS Cancel a running Disk Balancer Plan (if required) hdfs diskbalancer -cancel /system/diskbalancer/2023-Mar-13-02-50-35/<Worker-Node-FQDN>.plan.json Example of cancelling a running disk balancer plan: hdfs diskbalancer -cancel /system/diskbalancer/2023-Mar-13-02-50-35/<Worker-Node-FQDN>.plan.json INFO command.Command: Executing "Cancel plan" command. HDFS Disk Balancer - No Rebalancing Required Example The following example is what you will see if you attempt to run the HDFS local disk balancer on a node that doesn’t require any rebalancing to occur: hdfs diskbalancer -plan `hostname -f` -bandwidth 100 -thresholdPercentage 5 INFO balancer.NameNodeConnector: getBlocks calls for hdfs://nameservice1 will be rate-limited to 20 per second INFO balancer.KeyManager: Block token params received from NN: update interval=10hrs, 0sec, token lifetime=10hrs, 0sec INFO block.BlockTokenSecretManager: Setting block keys INFO balancer.KeyManager: Update block keys every 2hrs, 30mins, 0sec INFO planner.GreedyPlanner: Starting plan for Node : <Worker-Node-FQDN>:9867 INFO planner.GreedyPlanner: Compute Plan for Node : <Worker-Node-FQDN>:9867 took 36 ms INFO command.Command: No plan generated. DiskBalancing not needed for node: <Worker-Node-FQDN> threshold used: 5.0 No plan generated. DiskBalancing not needed for node: <Worker-Node-FQDN> threshold used: 5.0

MichaelBush · ‎06-10-2023

Summary It is expected that you will experience worker node data disk failures whilst managing your CDP cluster. This blog takes you through the steps that you should take to gracefully replace the failed worker node disks with the least disruption to your CDP cluster. Investigation Cloudera Manager Notification One easy method to identify that you have experienced a disk failure within your cluster is with the Cloudera Manager UI. You will see the following type of error: Cloudera Manager will also track multiple disk failures: HDFS NameNode - DataNode Volume Failures The failed disks within your cluster can also be observed from within the HDFS NameNode UI: This is also useful to quickly identify exactly which storage locations have failed. Confirming from the Command Line Taking the last example from HDFS NameNode - DataNode Volume Failures, we can see that /data/20 & /data/6 are both failed directories. The following interaction from the Command Line on the worker node will also confirm the disk issue: [root@<WorkerNode> ~]# ls -larth /data/20 ls: cannot access /data/20: Input/output error [root@<WorkerNode> ~]# ls -larth /data/6 ls: cannot access /data/6: Input/output error [root@<WorkerNode> ~]# ls -larth /data/1 total 0 drwxr-xr-x. 26 root root 237 Sep 30 02:54 .. drwxr-xr-x. 3 root root 20 Oct 1 06:45 kudu drwxr-xr-x. 3 root root 16 Oct 1 06:46 dfs drwxr-xr-x. 3 root root 16 Oct 1 06:47 yarn drwxr-xr-x. 3 root root 29 Oct 1 06:48 impala drwxr-xr-x. 2 impala impala 6 Oct 1 06:48 cores drwxr-xr-x. 7 root root 68 Oct 1 06:48 . Resolution Replace a disk on a Worker Node You will have a number of roles that are running on any single worker node host. This is an example of a worker node that is showing a failed disk: Decommission the Worker Node As there are multiple roles running on a worker node, it’s best to use the decommissioning process to gracefully remove the worker node from running services. This can be found by navigating to the host within Cloudera Manager and using “Actions > Begin Maintenance” It will then take you to the following page: Click “Begin Maintenance” and wait for the process to complete. Expect this process to take hours on a busy cluster. The time the process takes to complete is dependent upon: The number of regions that the HBase RegionServer is hosting The number of blocks that the HDFS DataNode is hosting The number of tablets that the Kudu TabletServer is hosting Replace and Configure the disks Once the worker node is fully decommissioned, the disks are ready to be replaced and configured physically within your datacenter by your infrastructure team. Every cluster is going to have its own internal processes to configure the newly replaced disks. Let’s go through an example of how this work can be verified for reference. List the attached block devices [root@<WorkerNode> ~]# lsblk NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT sda 8:0 0 3.7T 0 disk /data/1 sdb 8:16 0 3.7T 0 disk /data/2 sdc 8:32 0 3.7T 0 disk /data/3 sdd 8:48 0 3.7T 0 disk /data/4 sde 8:64 0 3.7T 0 disk /data/5 sdf 8:80 0 3.7T 0 disk /data/6 sdg 8:96 0 3.7T 0 disk /data/7 sdh 8:112 0 3.7T 0 disk /data/8 sdi 8:128 0 3.7T 0 disk /data/9 sdj 8:144 0 3.7T 0 disk /data/10 sdk 8:160 0 3.7T 0 disk /data/11 sdl 8:176 0 3.7T 0 disk /data/12 sdm 8:192 0 3.7T 0 disk /data/13 sdn 8:208 0 3.7T 0 disk /data/14 sdo 8:224 0 3.7T 0 disk /data/15 sdp 8:240 0 3.7T 0 disk /data/16 sdq 65:0 0 3.7T 0 disk /data/17 sdr 65:16 0 3.7T 0 disk /data/18 sds 65:32 0 3.7T 0 disk /data/19 sdt 65:48 0 3.7T 0 disk /data/20 sdu 65:64 0 3.7T 0 disk /data/21 sdv 65:80 0 3.7T 0 disk /data/22 sdw 65:96 0 3.7T 0 disk /data/23 sdx 65:112 0 3.7T 0 disk /data/24 sdy 65:128 0 1.8T 0 disk ├─sdy1 65:129 0 1G 0 part /boot ├─sdy2 65:130 0 20G 0 part [SWAP] └─sdy3 65:131 0 1.7T 0 part ├─vg01-root 253:0 0 500G 0 lvm / ├─vg01-kuduwal 253:1 0 100G 0 lvm /kuduwal ├─vg01-home 253:2 0 50G 0 lvm /home └─vg01-var 253:3 0 100G 0 lvm /var List the IDs of the block devices [root@<WorkerNode> ~]# blkid /dev/sdy1: UUID="4b2f1296-460c-4cbc-8aca-923c9309d4fe" TYPE="xfs" /dev/sdy2: UUID="af9c4c79-21b9-4d02-9453-ede88b920c1f" TYPE="swap" /dev/sdy3: UUID="j9n4QD-60xB-rqpQ-Ck3y-s2m0-FdSo-IGWrN9" TYPE="LVM2_member" /dev/sdb: UUID="4865e719-e77c-4d1e-b1e0-80ae1d0d6e82" TYPE="xfs" /dev/sdc: UUID="59ae0b91-3cfc-4c53-a02f-e20bdf0ac209" TYPE="xfs" /dev/sdd: UUID="b80473e0-bce8-413c-9740-934e8ed7006e" TYPE="xfs" /dev/sda: UUID="684e32c8-eeb2-4215-b861-880543b1f96b" TYPE="xfs" /dev/sdg: UUID="0f0d12ac-7d93-4c76-9f5c-ac6b43f2eaff" TYPE="xfs" /dev/sde: UUID="06c0e908-dd67-4a42-8615-7b7335a7e0f6" TYPE="xfs" /dev/sdf: UUID="9346fa04-dc1a-4dcc-8233-a5cb65495998" TYPE="xfs" /dev/sdn: UUID="8f05d1dd-94d1-4376-9409-d5683ad4c225" TYPE="xfs" /dev/sdo: UUID="5e0413d1-0b82-4ec1-b3f9-bb072db39071" TYPE="xfs" /dev/sdh: UUID="08063201-f252-49dd-8402-042afbea78a2" TYPE="xfs" /dev/sdl: UUID="1e5ace85-f93c-46f7-bf65-353f774cfeaa" TYPE="xfs" /dev/sdk: UUID="195967b5-a1a0-43bb-9a33-9cf7a36fdcb6" TYPE="xfs" /dev/sdq: UUID="db81b056-587e-47a6-844e-2d952278324b" TYPE="xfs" /dev/sdr: UUID="45b4cf68-6f10-4dc7-8128-c2006e7aba5d" TYPE="xfs" /dev/sds: UUID="a8e591e9-33c8-478a-b580-aeac9ad4cf44" TYPE="xfs" /dev/sdi: UUID="a0187ae0-7598-44c4-805c-ef253dea6e7a" TYPE="xfs" /dev/sdm: UUID="720836d8-ddd6-406d-a33f-f1b92f9b40d5" TYPE="xfs" /dev/sdv: UUID="df4bdd58-e8d2-4bdb-8255-b9c7fcfe8999" TYPE="xfs" /dev/sdw: UUID="701f3516-03bc-461b-930c-ab34d0b417d7" TYPE="xfs" /dev/sdu: UUID="5e1bd2f3-8ccc-4ba1-a0f7-bb55c8246d72" TYPE="xfs" /dev/sdj: UUID="264b85f8-9740-418b-a811-20666a305caa" TYPE="xfs" /dev/sdt: UUID="53f2f06e-71e9-4796-86a3-2212c0f652ea" TYPE="xfs" /dev/sdp: UUID="e6b984c0-6d85-4df2-9a7d-cc1c87238c49" TYPE="xfs" /dev/mapper/vg01-root: UUID="18bc42fe-dbfd-4005-8e13-6f5d2272d9a7" TYPE="xfs" /dev/sdx: UUID="53e4023f-583a-4219-bfd2-1a94e15f34ef" TYPE="xfs" /dev/mapper/vg01-kuduwal: UUID="a1441e2f-718b-42eb-b398-28ce20ee50ad" TYPE="xfs" /dev/mapper/vg01-home: UUID="fbc8e522-64da-4cc3-87b6-89ea83fb0aa0" TYPE="xfs" /dev/mapper/vg01-var: UUID="93b1537f-a1a9-4616-b79a-cab9a1e39bf1" TYPE="xfs" View the /etc/fstab [root@<WorkerNode> ~]# cat /etc/fstab /dev/mapper/vg01-root / xfs defaults 0 0 UUID=4b2f1296-460c-4cbc-8aca-923c9309d4fe /boot xfs defaults 0 0 /dev/mapper/vg01-home /home xfs defaults 0 0 /dev/mapper/vg01-kuduwal /kuduwal xfs defaults 0 0 /dev/mapper/vg01-var /var xfs defaults 0 0 UUID=af9c4c79-21b9-4d02-9453-ede88b920c1f swap swap defaults 0 0 UUID=684e32c8-eeb2-4215-b861-880543b1f96b /data/1 xfs noatime,nodiratime 0 0 UUID=4865e719-e77c-4d1e-b1e0-80ae1d0d6e82 /data/2 xfs noatime,nodiratime 0 0 UUID=59ae0b91-3cfc-4c53-a02f-e20bdf0ac209 /data/3 xfs noatime,nodiratime 0 0 UUID=b80473e0-bce8-413c-9740-934e8ed7006e /data/4 xfs noatime,nodiratime 0 0 UUID=06c0e908-dd67-4a42-8615-7b7335a7e0f6 /data/5 xfs noatime,nodiratime 0 0 UUID=9346fa04-dc1a-4dcc-8233-a5cb65495998 /data/6 xfs noatime,nodiratime 0 0 UUID=0f0d12ac-7d93-4c76-9f5c-ac6b43f2eaff /data/7 xfs noatime,nodiratime 0 0 UUID=08063201-f252-49dd-8402-042afbea78a2 /data/8 xfs noatime,nodiratime 0 0 UUID=a0187ae0-7598-44c4-805c-ef253dea6e7a /data/9 xfs noatime,nodiratime 0 0 UUID=264b85f8-9740-418b-a811-20666a305caa /data/10 xfs noatime,nodiratime 0 0 UUID=195967b5-a1a0-43bb-9a33-9cf7a36fdcb6 /data/11 xfs noatime,nodiratime 0 0 UUID=1e5ace85-f93c-46f7-bf65-353f774cfeaa /data/12 xfs noatime,nodiratime 0 0 UUID=720836d8-ddd6-406d-a33f-f1b92f9b40d5 /data/13 xfs noatime,nodiratime 0 0 UUID=8f05d1dd-94d1-4376-9409-d5683ad4c225 /data/14 xfs noatime,nodiratime 0 0 UUID=5e0413d1-0b82-4ec1-b3f9-bb072db39071 /data/15 xfs noatime,nodiratime 0 0 UUID=e6b984c0-6d85-4df2-9a7d-cc1c87238c49 /data/16 xfs noatime,nodiratime 0 0 UUID=db81b056-587e-47a6-844e-2d952278324b /data/17 xfs noatime,nodiratime 0 0 UUID=45b4cf68-6f10-4dc7-8128-c2006e7aba5d /data/18 xfs noatime,nodiratime 0 0 UUID=a8e591e9-33c8-478a-b580-aeac9ad4cf44 /data/19 xfs noatime,nodiratime 0 0 UUID=53f2f06e-71e9-4796-86a3-2212c0f652ea /data/20 xfs noatime,nodiratime 0 0 UUID=5e1bd2f3-8ccc-4ba1-a0f7-bb55c8246d72 /data/21 xfs noatime,nodiratime 0 0 UUID=df4bdd58-e8d2-4bdb-8255-b9c7fcfe8999 /data/22 xfs noatime,nodiratime 0 0 UUID=701f3516-03bc-461b-930c-ab34d0b417d7 /data/23 xfs noatime,nodiratime 0 0 UUID=53e4023f-583a-4219-bfd2-1a94e15f34ef /data/24 xfs noatime,nodiratime 0 0 Recommission the Worker Node Once the disk(s) has been suitably replaced, it’s time to use the recommissioning process to gracefully reintroduce the worker node back into the cluster. This can be found by navigating to the host within Cloudera Manager and using “Actions > End Maintenance” After the node has completed its recommission cycle, follow the guidance in the next sections to perform local disk rebalancing where appropriate. Address local disk HDFS Balancing Most clusters utilize HDFS. This service has a local disk balancer that you can make use of. Please find some helpful guidance within the following - Rebalance your HDFS Disks (single node) Address local disk Kudu Balancing If you are running Kudu within your cluster, you will need to rebalance the existing Kudu data on the local disks of the worker node. Please find some helpful guidance within the following - Rebalance your Kudu Disks (single node)

MichaelBush · ‎06-10-2023

Summary Within the blog Rebalance your mixed HDFS & Kudu Services, we demonstrated how to properly review and set up a mixed HDFS / Kudu shared services cluster. Now it is time to review a method that allows you to confirm the distribution of data of your HDFS & Kudu services at the disk level of each and every worker node. Investigation Commands to check the balance of HDFS & Kudu Log in as root to each worker node that is part of the HDFS and Kudu service, and perform the following commands. Check overall disk capacity status: [root@<Worker-Node> ~]# df -h /data/* | sed 1d | sort /dev/sdb 1.9T 1.4T 478G 75% /data/1 /dev/sdc 1.9T 1.3T 560G 70% /data/2 /dev/sdd 1.9T 1.4T 513G 73% /data/3 /dev/sde 1.9T 1.4T 489G 74% /data/4 /dev/sdf 1.9T 1.4T 464G 76% /data/5 /dev/sdg 1.9T 1.4T 513G 73% /data/6 /dev/sdh 1.9T 1.4T 525G 72% /data/7 /dev/sdi 1.9T 1.4T 466G 76% /data/8 /dev/sdj 1.9T 1.3T 538G 72% /data/9 /dev/sdk 1.9T 1.5T 418G 78% /data/10 /dev/sdl 1.9T 1.3T 617G 67% /data/11 /dev/sdm 1.9T 1.3T 572G 70% /data/12 /dev/sdn 1.9T 1.4T 474G 75% /data/13 /dev/sdo 1.9T 1.3T 534G 72% /data/14 /dev/sdp 1.9T 1.4T 468G 75% /data/15 /dev/sdq 1.9T 1.4T 470G 75% /data/16 /dev/sdr 1.9T 1.4T 466G 75% /data/17 /dev/sds 1.9T 1.4T 468G 75% /data/18 /dev/sdt 1.9T 1.4T 473G 75% /data/19 /dev/sdu 1.9T 1.4T 474G 75% /data/20 /dev/sdv 1.9T 1.4T 467G 75% /data/21 /dev/sdw 1.9T 1.4T 474G 75% /data/22 /dev/sdx 1.9T 1.4T 473G 75% /data/23 /dev/sdy 1.9T 1.4T 477G 75% /data/24 Check overall HDFS disk capacity status: [root@<Worker-Node> ~]# du -h --max-depth=0 /data/*/dfs | sort -t/ -k3,3n 606G /data/1/dfs 612G /data/2/dfs 608G /data/3/dfs 609G /data/4/dfs 610G /data/5/dfs 619G /data/6/dfs 613G /data/7/dfs 634G /data/8/dfs 590G /data/9/dfs 681G /data/10/dfs 618G /data/11/dfs 621G /data/12/dfs 1.2T /data/13/dfs 1.1T /data/14/dfs 1.2T /data/15/dfs 1.2T /data/16/dfs 1.2T /data/17/dfs 1.2T /data/18/dfs 1.2T /data/19/dfs 1.2T /data/20/dfs 1.2T /data/21/dfs 1.2T /data/22/dfs 1.2T /data/23/dfs 1.2T /data/24/dfs Check overall Kudu disk capacity status: [root@<Worker-Node> ~]# du -h --max-depth=0 /data/*/kudu | sort -t/ -k3,3n 745G /data/1/kudu 691G /data/2/kudu 741G /data/3/kudu 765G /data/4/kudu 788G /data/5/kudu 730G /data/6/kudu 725G /data/7/kudu 763G /data/8/kudu 734G /data/9/kudu 768G /data/10/kudu 628G /data/11/kudu 669G /data/12/kudu 205G /data/13/kudu 204G /data/14/kudu 205G /data/15/kudu 208G /data/16/kudu 209G /data/17/kudu 205G /data/18/kudu 204G /data/19/kudu 204G /data/20/kudu 206G /data/21/kudu 203G /data/22/kudu 194G /data/23/kudu 200G /data/24/kudu Now collate all of the information retrieved from the Worker Node into an easy to read format so that it is easy for you to observe out of sync characteristics at the Worker Node layer. Worker Node Balance Example Taking the output from Commands to check the balance of HDFS & Kudu, here is an example of how you might collate the information into a format that makes it easy to notice data balance issues at the Worker Node level. Worker-Node du -h du -h Disk Size Used Avail Use% dfs kudu /data/1 1.9T 1.7T 164G 92% 414G 1.3T /data/2 1.9T 1.5T 395G 79% 499G 970G /data/3 1.9T 1.5T 351G 82% 487G 1022G /data/4 1.9T 1.5T 338G 82% 493G 1.1T /data/5 1.9T 1.5T 352G 82% 486G 1.1T /data/6 1.9T 1.5T 337G 82% 498G 1.1T /data/7 1.9T 1.5T 337G 82% 485G 1.1T /data/8 1.9T 1.5T 350G 82% 494G 1018G /data/9 1.9T 1.5T 339G 82% 475G 1.1T /data/10 1.9T 1.5T 391G 80% 487G 985G /data/11 1.9T 1.5T 338G 82% 487G 1.1T /data/12 1.9T 1.6T 320G 83% 475G 1.1T /data/13 1.9T 1.2T 688G 64% 1.2T 353M /data/14 1.9T 1.2T 679G 64% 1.2T 8.5G /data/15 1.9T 1.2T 674G 64% 1.2T 13G /data/16 1.9T 1.2T 678G 64% 1.2T 8.0G /data/17 1.9T 1.2T 686G 64% 1.2T 8.0K /data/18 1.9T 1.2T 680G 64% 1.2T 5.4G /data/19 1.9T 1.2T 694G 63% 1.2T 33M /data/20 1.9T 1.2T 688G 64% 1.2T 8.0K /data/21 1.9T 1.2T 689G 64% 1.2T 8.0K /data/22 1.9T 1.2T 686G 64% 1.2T 129M /data/23 1.9T 1.2T 679G 64% 1.2T 7.4G /data/24 1.9T 1.2T 684G 64% 1.2T 33M If you have set up the mixed HDFS & Kudu configuration sometime after it was deployed, you are likely to encounter node level disk capacity issues with the Kudu Rebalance command. This is due to how the Kudu Rebalancer, currently, is unaware of total disk capacity, or current used disk capacity. Analyze the Data Distribution Within the Worker Node Balance Example, we can see that we have 24 disks, all of them highlighting that there is a fundamental imbalance between them. All 24 disks in the example are configured within HDFS and Kudu, but the HDFS & Kudu configuration alignment happened after the cluster had been used for many years. Note how out of sync they are: Disks 1-12 are far more utilized than Disks 13-24. This can happen due to: Adding an extra 12 disks to the node at some point Configuration of the Kudu Tablet Server Role Group performed later than the node was deployed into HDFS / Kudu Disk 1 is at 92% If left unchecked, every service in the cluster that uses the data disks will be affected when this disk reaches 100% Depending on how you monitor disk level utilization at a per node level, the overall capacity of the node will not reflect that this single disk is nearly fully utilized. There are other scenarios that can cause a similar imbalance. Failed disks being replaced and then the HDFS and Kudu Rebalancing activities remain focused only at the service level Resolution Whether it is down to a later commitment or alignment of HDFS or Kudu configuration, which is focused on the disk distribution, or it’s just a cluster that has had countless disks replaced over time and had 0 local disk balancing methods applied afterward - it’s time for us to illustrate how to handle these issues. There are several blogs that can help you with this: Replace your failed Worker Node disks Rebalance your HDFS Disks (single node) Rebalance your Kudu Disks (single node)

Online	Offline
Last Visited	‎12-17-2024 04:57 PM

Member Since	‎11-13-2019 02:18 AM
Last Visited	‎12-17-2024 04:57 PM
Posts	16

Cloudera Community

Control the Cloudera Manager Logging Level

Optimize the Cloudera Manager Server

Rebuild your ranger_audits collection

Review and Optimize Ranger Audit Verbosity

Optimize your Kudu Rebalancer

Resolve your Impala Coordinator Skew

Replace your Kudu Disks (single node)

Replace your HDFS Disks (single node)

Replace your failed Worker Node disks

Analyse your HDFS & Kudu Services at the Disk Leve...