About MichaelBush

orlandoteixeira · ‎07-23-2024

Cloudera Manager also has a great way to set class level debug on the fly without restart when needed to troubleshoot. The way to do this is to navigate to: http://<cm-server>:7180/cmf/debug/logLevel Once at this page, chose the class you wish to change, choose the radio button for the level that you wish to change it to, and hit the submit button. Understand that these changes will NOT persist a restart of the server but will start logging at the new level as soon as you hit the submit button on the logLevel page.

orlandoteixeira · ‎07-23-2024

The JDK 8 HotSpot JVM is now using native memory for the representation of class metadata and is called Metaspace. The permanent generation has been removed. The PermSize and MaxPermSize are ignored and a warning is issued if they are present on the command line.

skipme · ‎07-17-2024

Has anyone encountered error 255 before? ie engine exited with error 255

MichaelBush · ‎06-10-2023

Summary After the upgrade from CDH to CDP, fundamental instability was observed within the Ranger Audits UI (within the Ranger Admin Service), and Infra-Solr Roles were constantly exhibiting API liveness errors. Total audits daily count: 136,553,808 Sample screenshot of Infra-Solr Health check errors Sample screenshot of specific Infra-Solr Server Health check errors Investigation Initial analysis of the number of daily audits within the Ranger service confirmed that there were as many as 1B audits per day. With only 2 Infra-Solr servers, the default configuration of a CDH to CDP upgrade needed to be tweaked to include best practices for the Infra-Solr ranger_audits collection. Reduce Ranger Audits verbosity (see this complementary document: Ranger Audit Verbosity). Assess Server Design. The count of Infra-Solr servers is important when deciding how the ranger_audits collection should be built. A single Solr Server is not recommended as it would not be resilient. Consider at least 2 replicas per shard for any collection to facilitate the split of the ranger_audits collection into 6 shards, with 2 replicas for each, while still maintaining the best practice guidelines within Solr. Resolution The following public documentation will assist with a deeper understanding of how you might choose to align with best practices given the hardware you have available and the volume of audits being recorded within the service - Calculating Infra Solr resource needs. Configure the following 3 parameters within the Ranger Service (within CM) according to best practices. The example below is for a cluster of 3 Infra-Solr servers with 3 shards configured for the ranger_audits collection, with 2 replicas per shard, and limiting the number of maximum shards to 6 (which is a multiple of the 1st two parameters): Configure the TTL (Time To Live) for audits that are propagated into the Infra-Solr ranger_audits collection. This requirement should be defined by the business, for instance, 25 days, but TTL only impacts audit visibility within the Ranger UI; all audits will remain accessible within HDFS. Ranger - Delete ranger_audits collection Ensure all Solr Servers are healthy and available. Then, in order to restructure it, fully delete the ranger_audits collection and monitor the status (example below). NOTE - the example date of 8Mar2022 in the below example is for auditing purposes - it’s the date that the full collection deletion occurred. DELETE RANGER AUDITS COLLECTION http://<Infra-Solr-Server>:18983/solr/admin/collections?action=DELETE&name=ranger_audits&async=del_ranger_audits8Mar2022 REQUEST THE STATUS OF AN ASYNC CALL http://<Infra-Solr-Server>:18983/solr/admin/collections?action=REQUESTSTATUS&requestid=del_ranger_audits8Mar2022 An example of a successful delete command issued to the URL: { "responseHeader":{ "status":0, "QTime":9}, "requestid":"del_ranger_audits20Apr2022"} Restart Ranger Admin service. When you perform the restart, it will recreate the ranger_audits collection based on the parameters defined earlier.

MichaelBush · ‎06-10-2023

Summary Infra-Solr service exhibits fundamental stability issues after upgrading CDH to CDP. Sample screenshot of Infra-Solr Health check errors Sample screenshot of specific Infra-Solr Server Health check errors Investigation The Infra-Solr service hosts the ranger_audits collection which is used to display cluster audit information within the Ranger Admin UI. Perform preliminary analysis using Ranger Admin UI - Audits for a single day as demonstrated below. [NOTE: these sample screenshots were taken after resolving the issues; your audit counts will likely be much higher]. Total audits daily count: 136,553,808 Total Impala audits daily count: 5,901,146 Total hbaseregional audits daily count: 1,178,831 Total hbaseregional (access type scanneropen) audits daily count: 0 (due to the complete exclusion of these events) Total hdfs audits daily count: 128,681,418 Total hdfs (access type liststatus) audits daily count: 0 (due to the complete exclusion of these events) Assemble and analyze audit counts. The actual pre-resolution values for this case study were: Total number of Ranger audits - 705,875,710 Application - Impala - 6,719,878 Application - hbaseRegional - 389,896,166 Application - hbaseRegional; Access Type - scannerOpen - 261,735,436 Application - hdfs - 308,644,209 Application - hdfs; Access Type - listStatus - 212,728,345 The total count of Ranger audits (700M) is excessively voluminous. Audit verbosity is a primary contributing factor to Infra-Solr service instability because Ranger Audits are stored within an Infra-Solr collection - ranger_audits, and they are presented within the Ranger Admin UI. Ranger_audits collection is overwhelming Infra-Solr Servers, leading to Web Server Status Unknown / API Liveness check failures. To reduce audit verbosity, identify meaningful and meaningless events using the Infra-Solr API. URL examples for reference only: Query by date/time range http://lannister-005.edh.cloudera.com:18983/solr/ranger_audits/select?q=evtTime:[2022-02-16T00:00:00.000Z+TO+2022-02-16T11:59:59.000Z]&sort=evtTime+desc select all: oldest http://lannister-005.edh.cloudera.com:18983/solr/ranger_audits/select?q=*:*&sort=evtTime+asc&rows=1000 select all: newest http://lannister-005.edh.cloudera.com:18983/solr/ranger_audits/select?q=*:*&sort=evtTime+desc&rows=1000 Curl examples for reference only: > Query by date/time range && number of rows to capture (important) > -g required to disable globbing of the date range > This is verbose curl -g --negotiate -u: "http://lannister-005.edh.cloudera.com:18983/solr/ranger_audits/select?q=evtTime:[2022-02-17T00:00:00.000Z+TO+2022-02-17T11:59:59.999Z]&rows=100000&sort=evtTime+desc" > RangerAuditSolrOutput17Feb22_100000Rows.text > This is the above query, but narrowing down fewer fields (that you want to see) curl -g --negotiate -u: "http://lannister-005.edh.cloudera.com:18983/solr/ranger_audits/select?fl=access%2Cenforcer%2Cagent%2Crepo%2CreqUser%2Cresource%2Caction&q=evtTime:[2022-02-17T00:00:00.000Z+TO+2022-02-17T11:59:59.999Z]&rows=100000&sort=evtTime+desc" > RangerAuditSolrOutput17Feb22_100000Rows.text > This is the above query, but narrowing down ever fewer fields (that you want to see) curl -g --negotiate -u: "http://lannister-005.edh.cloudera.com:18983/solr/ranger_audits/select?fl=access%2Crepo&q=evtTime:[2022-02-17T00:00:00.000Z+TO+2022-02-17T11:59:59.999Z]&rows=100000&sort=evtTime+desc" > RangerAuditSolrOutput17Feb22_100000Rows.text > Select all: oldest curl --negotiate -u: "http://lannister-005.edh.cloudera.com:18983/solr/ranger_audits/select?q=*:*&sort=evtTime+asc&rows=1000" > RangerAuditSolrOutput17Feb22.text > Select all: newest curl --negotiate -u: "http://lannister-005.edh.cloudera.com:18983/solr/ranger_audits/select?q=*:*&sort=evtTime+desc&rows=1000" > RangerAuditSolrOutput17Feb22.text In this case study, 48 curl commands were executed to get a balanced picture over a 24-hour period, pulling 100,000 audit events every 30 minutes. NOTE: The Infra-Solr server must render the output; 100,000+ events can easily crash a 30GB Infra-Solr Server. Do not pull any more for that time interval. curl -g --negotiate -u: "http://lannister-005.edh.cloudera.com:18983/solr/ranger_audits/select?fl=access%2Crepo&q=evtTime:[2022-02-17T23:30:00.000Z+TO+2022-02-17T23:59:59.999Z]&rows=100000&sort=evtTime+desc" > RangerAuditSolrOutput17Feb22_100000Rows_2330-2359.text curl -g --negotiate -u: "http://lannister-005.edh.cloudera.com:18983/solr/ranger_audits/select?fl=access%2Crepo&q=evtTime:[2022-02-17T23:00:00.000Z+TO+2022-02-17T23:29:59.999Z]&rows=100000&sort=evtTime+desc" > RangerAuditSolrOutput17Feb22_100000Rows_2300-2329.text curl -g --negotiate -u: "http://lannister-005.edh.cloudera.com:18983/solr/ranger_audits/select?fl=access%2Crepo&q=evtTime:[2022-02-17T22:30:00.000Z+TO+2022-02-17T22:59:59.999Z]&rows=100000&sort=evtTime+desc" > RangerAuditSolrOutput17Feb22_100000Rows_2230-2259.text curl -g --negotiate -u: "http://lannister-005.edh.cloudera.com:18983/solr/ranger_audits/select?fl=access%2Crepo&q=evtTime:[2022-02-17T22:00:00.000Z+TO+2022-02-17T22:29:59.999Z]&rows=100000&sort=evtTime+desc" > RangerAuditSolrOutput17Feb22_100000Rows_2200-2229.text .. REPEAT THE COMMANDS WITH RELEVANT EXAMPLES .. curl -g --negotiate -u: "http://lannister-005.edh.cloudera.com:18983/solr/ranger_audits/select?fl=access%2Crepo&q=evtTime:[2022-02-17T01:30:00.000Z+TO+2022-02-17T01:59:59.999Z]&rows=100000&sort=evtTime+desc" > RangerAuditSolrOutput17Feb22_100000Rows_0130-0159.text curl -g --negotiate -u: "http://lannister-005.edh.cloudera.com:18983/solr/ranger_audits/select?fl=access%2Crepo&q=evtTime:[2022-02-17T01:00:00.000Z+TO+2022-02-17T01:29:59.999Z]&rows=100000&sort=evtTime+desc" > RangerAuditSolrOutput17Feb22_100000Rows_0100-0129.text curl -g --negotiate -u: "http://lannister-005.edh.cloudera.com:18983/solr/ranger_audits/select?fl=access%2Crepo&q=evtTime:[2022-02-17T00:30:00.000Z+TO+2022-02-17T00:59:59.999Z]&rows=100000&sort=evtTime+desc" > RangerAuditSolrOutput17Feb22_100000Rows_0030-0059.text curl -g --negotiate -u: "http://lannister-005.edh.cloudera.com:18983/solr/ranger_audits/select?fl=access%2Crepo&q=evtTime:[2022-02-17T00:00:00.000Z+TO+2022-02-17T00:29:59.999Z]&rows=100000&sort=evtTime+desc" > RangerAuditSolrOutput17Feb22_100000Rows_0000-0029.text The 48 output files were simply parsed to ascertain the most frequent Ranger audit access types (see the example below when creating your own): grep access RangerAuditSolrOutput17Feb22* | more | sort -rn | uniq -c | sort -rn | awk -F ' ' '{sum+=$1;}END{print sum;}' egrep "listStatus|scannerOpen" RangerAuditSolrOutput17Feb22* | more | sort -rn | uniq -c | sort -rn | awk -F ' ' '{sum+=$1;}END{print sum;}' This example groups audit types by category to assist you in selecting what is meaningful and what is not: grep access RangerAuditSolrOutput17Feb22_100000Rows.text | more | sort -rn | uniq -c | sort -rn 50517 "access":"listStatus", 23782 "access":"scannerOpen", 14559 "access":"get", 5193 "access":"put", 2081 "access":"open", 1884 "access":"delete", 1394 "access":"WRITE", 336 "access":"rename", 126 "access":"contentSummary", 84 "access":"checkAndPut", 26 "access":"mkdirs", 6 "access":"compactSelection", 5 "access":"flush", 4 "access":"getAclStatus", 3 "access":"compact", In this case study, up to 1B audit events were being recorded per day with 65-70% sourcing from HDFS - listStatus and HBase - scannerOpen. Such pure metadata operations events were meaningless to DevOps, nevertheless, we verified they were also meaningless to the business before attempting to exclude them. Retain the ‘get’, ‘put’, ‘open’, ‘delete,’ and other key audits. Assess the Infra-Solr & ranger_audits collection design – Infra-Solr server count, shards, and replicas count – which play an important role in stability. This complimentary document covers those assessment steps: Ranger - Rebuild ranger_audits). Resolution Tune Ranger to exclude unwanted event collection. Edit the cm_hdfs service configuration: Exclude the ‘listStatus’ audit type from the ‘Audit Filter’ section: Edit the cm_hbase service configuration: Exclude the ‘scannerOpen’ audit type from the ‘Audit Filter’ section: Excluding unmeaningful events provided 3 benefits: Infra-Solr and the ranger_audits collection stability was greatly improved and facilitated manageability. Infra-Solr and ranger_audits collection required only 30-35% of the resources to perform the same tasks Ranger audit history required only 30-35% of HDFS disk space when writing to /ranger/….

MichaelBush · ‎06-10-2023

Summary It is always a good idea to review your Kudu Rebalancer settings so that all hardware is optimally utilized when Kudu Rebalancing activities are being performed. Investigation Kudu Configuration Balancer configuration properties Although the general kudu default parameters have not proven to adversely impact Kudu Rebalancing operations, the following property change is recommended to speed up that process. Property Default Cloudera Chosen Value rb_max_moves_per_server 5 10 Avoid Landmines Some key notes before performing the rebalancing activities after setting up the services/disks: Never run both the HDFS & Kudu Rebalancers at the same time The contention between both may cause issues Perform the Rebalancing activities in the order of Kudu first, HDFS second Due to Kudu being unable to track capacity utilization Performing Kudu Rebalancing Activities We recommend that you perform these actions from within CM to provide full visibility into the Rebalancer status as well as when the action has started and finished. Kudu Go to CM - Kudu - Actions - Run Kudu Rebalancer Tool

MichaelBush · ‎06-10-2023

Summary Are you having issues with more queries being handled by a single Impala Coordinator? Does this eventually lead to OOM scenarios? Let’s consider you have 3 Impala Coordinators within your cluster and notice that there are queries that skew onto any one of the Impala Coordinators and overwhelm it. Note how one of the Impala Coordinators in the above example has 73 running queries, and the other 2 have relatively few. Investigation Source IP Persistence To ascertain why any Impala Coordinator can skew the number of running queries that are active on it, look at the way the proxy is set up to handle incoming queries. ‘Source IP Persistence’ means setting up sessions from the same IP address to always go to the same coordinator. This setting is required when setting up high availability with Hue. It is also required to avoid the Hue message ‘results have expired’, which indicates when a query is sent to the cluster on one coordinator but the result doesn’t return to the user via the same coordinator/Hue Server. Example HAProxy Configuration for Source IP Persistence The public docs for setting up HAProxy for Impala - Configuring Load Balancer for Impala. Example setup of Hue-Impala connectivity within /etc/haproxy/haproxy.cfg as follows: listen impala-hue :21052 mode tcp stats enable balance source timeout connect 5000ms timeout queue 5000ms timeout client 3600000ms timeout server 3600000ms # Impala Nodes server impala-coordinator-001.fqdn impala-coordinator-001.fqdn:21050 check server impala-coordinator-002.fqdn impala-coordinator-002.fqdn:21050 check server impala-coordinator-003.fqdn impala-coordinator-003.fqdn:21050 check Now let’s review what can impact the overall connection count into an Impala Coordinator: Hue, Hive & Impala timeout settings. Example Timeout Settings The following settings might mimic what you have currently set within your Hue, Hive & Impala services. Hue Hive Impala Proposed Timeout Settings Whilst the actual settings will vary cluster by cluster, we recommend moving away from the default settings and setting all of the idle parameters to 2 hours across the board in all 3 services: Hue, Hive & Impala. This is an initial goal of introducing timeouts whilst monitoring the user experience. The ultimate best practice in this area is to head toward having: Idle Query Timeouts of 300 seconds (or 5 minutes) Idle Session Timeouts of 600 seconds (or 10 minutes) NOTE - all of the parameters being discussed relate to ‘idle’ sessions and queries; in other words, the user has to have left either the session or query in an idle state before the idle parameters will kick in. No active session or query will be captured by this change in the service(s) behavior (s). Resolution Hue Steps to perform: Go to CM - Hue - Configuration Search for “Auto Logout Timeout” Change to 2 hours Restart Hue Service Hive Steps to perform: Go to CM - Hive - Configuration Search for “Idle Operation Timeout” Change to 300 seconds Search for “Idle Session Timeout” Change to 600 seconds Restart Hive Service Hive on Tez Steps to perform: Go to CM - Hive on Tez - Configuration Search for “Idle Operation Timeout” Change to 300 seconds Search for “Idle Session Timeout” Change to 600 seconds Restart Hive on Tez Service Impala Steps to perform: Go to CM - Impala - Configuration Search for “Idle Query Timeout” Change to 300 seconds Search for “Idle Session Timeout” Change to 600 seconds Restart Impala Service

MichaelBush · ‎06-10-2023

Summary After you experience a disk failure on a worker node then replace the disk, you’ll need to ensure that the disk is suitably rebalanced within the Kudu Service at the local level. Investigation & Resolution Purging a Tablet Server There isn’t currently a method to rebalance the replicas on a single Tablet Server disk array. This means that we need to empty the node and reintroduce it so that it can be used again from scratch. We begin by quiescing the Tablet Server. Quiesce the Tablet Server Quiesce essentially means to stop the Tablet Server from hosting any leaders in order to: Make other replicas on live Tablet Servers become the leaders Prevent this Tablet Server from becoming a leader for any other reason Allow this Tablet Server to be read from (the replicas that are still present) Check Quiesce Status sudo -u kudu kudu tserver quiesce status <Worker-Node-FQDN> Quiescing | Tablet Leaders | Active Scanners -----------+----------------+----------------- true | 0 | 0 Quiesce Start sudo -u kudu kudu tserver quiesce start <Worker-Node-FQDN> Put the Tablet Server into Maintenance Mode Maintenance Mode stops the Tablet Server from being used completely. The maintenance mode commands require you to retrieve the UUID of the Tablet Server first. We can get this information from a tserver list command: sudo -u kudu kudu tserver list <Master-Node1-FQDN>,<Master-Node2-FQDN>,<Master-Note 3-FQDN> An example that then targets the server you want to work on sudo -u kudu kudu tserver list <Master-Node1-FQDN>,<Master-Node2-FQDN>,<Master-Node3-FQDN> | grep <Worker-Node-FQDN> 5e103ac84707495e843a4553ac622f20 | <Worker-Node-FQDN>:7050 Put the Tablet Server into Maintenance Mode sudo -u kudu kudu tserver state enter_maintenance <Master-Node1-FQDN>,<Master-Node2-FQDN>,<Master-Note 3-FQDN> 5e103ac84707495e843a4553ac622f20 Exit the Tablet Server from Maintenance Mode sudo -u kudu kudu tserver state exit_maintenance <Master-Node1-FQDN>,<Master-Node2-FQDN>,<Master-Note 3-FQDN> 5e103ac84707495e843a4553ac622f20 Run ksck to check the status of Kudu Service / TS to be purged This will confirm the status of both Quiesce and Maintenance Mode for every Tablet Server in the cluster, (in our example - <Worker-Node-FQDN>😞 sudo -u kudu kudu cluster ksck <Master-Node1-FQDN>,<Master-Node2-FQDN>,<Master-Node3-FQDN> 2>&1 | tee ksck.out The above command outputs the ksck to both the terminal and a file called ‘ksck.out’. This allows us to review the information from both perspectives and also create a record of the output in the file. But taking our example of purging <Worker-Node-FQDN> into account, the following information is key: Tablet Server Summary This is a list of all Tablet Servers in the cluster. We’ve focused on just <Worker-Node-FQDN> and the surrounding TS’s for illustrative purposes. Notice the text in RED - <Worker-Node-FQDN> is quiescing and has no leaders running on it. Tablet Server Summary UUID | Address | Status | Location | Quiescing | Tablet Leaders | Active Scanners ----------------------------------+---------------------------------+---------+-------------+-----------+----------------+----------------- … 59e6ca5107754c24b649ee9c9acfccfb | <Worker-Node-FQDN>:7050 | HEALTHY | /CabinetE01 | false | 47 | 0 5e103ac84707495e843a4553ac622f20 | <Worker-Node-FQDN>:7050 | HEALTHY | /CabinetA08 | true | 0 | 0 5edf82f0516b4897b3a7991a7e67d71c | <Worker-Node-FQDN>:7050 | HEALTHY | /CabinetA07 | false | 1452 | 0 … Tablet Server State (maintenance mode) This section shows that the TS is in maintenance mode. Tablet Server States Server | State ----------------------------------+------------------ 5e103ac84707495e843a4553ac622f20 | MAINTENANCE_MODE Purge the Tablet Server The following command instructs kudu to ignore the <Worker-Node-FQDN> node AND move replicas away from it: sudo -u kudu /tmp/kudu cluster rebalance <Master-Node1-FQDN>,<Master-Node2-FQDN>,<Master-Note 3-FQDN> -ignored_tservers=5e103ac84707495e843a4553ac622f20 -move_replicas_from_ignored_tservers Again, importantly, the Tablet Server has to have been successfully quiesced and put into maintenance mode to avoid any issues with the Kudu service. A simple break in VPN or shell terminal will kill the rebalance command. This won't affect Kudu, but it will stop the process. In order to work around this and retain information during the process, use the following command to output the rebalance status into the active terminal session as well as a file: sudo -u kudu /tmp/kudu cluster rebalance <Master-Node1-FQDN>,<Master-Node2-FQDN>,<Master-Node3-FQDN> -ignored_tservers=5e103ac84707495e843a4553ac622f20 -move_replicas_from_ignored_tservers 2>&1 | tee <Worker-Node-FQDN>-rebalance.out & Re-introduce the Tablet Server After the Kudu Tablet Server has been purged, it’s time to reintroduce it into the Kudu service so that it can be used again. Exit the Tablet Server from Maintenance Mode sudo -u kudu kudu tserver state exit_maintenance <Master-Node1-FQDN>,<Master-Node2-FQDN>,<Master-Note 3-FQDN> 5e103ac84707495e843a4553ac622f20 Unquiesce the Tablet Server sudo -u kudu kudu tserver quiesce stop <Worker-Node-FQDN> Rebalance the Kudu Service We now have a Kudu Tablet Server that has been quiesced and purged. It’s time to rebalance the Kudu service and share the Tablets back onto the recently purged Kudu Tablet Server. Go to CM - Kudu - Actions - Run Kudu Rebalancer Tool:

MichaelBush · ‎06-10-2023

Summary When you have experienced a disk failure on a worker node and have had the disk replaced, you’ll need to ensure that the disk is suitably rebalanced within the Kudu Service at the local level. Investigation HDFS Disk Balancer - Explained This is an area that already has a great Blog written around it: How-to: Use the New HDFS Intra-DataNode Disk Balancer in Apache Hadoop Please read through the blog and follow the guidance to verify that you have already set up the HDFS service to be able to perform this necessary action. Resolution HDFS Disk Balancer - Execution Let’s go through the process of performing an HDFS Intra-DataNode Disk Rebalancing process. Obtain a local HDFS DataNode Kerberos Ticket cd /var/run/cloudera-scm-agent/process/`ls -larth /var/run/cloudera-scm-agent/process | grep -i hdfs-DATANODE | tail -1 | awk '{print $9}'` kinit -kt hdfs.keytab hdfs/`hostname -f`@<ClusterDomain> Create a Disk Balancer Plan hdfs diskbalancer -plan `hostname -f` -bandwidth 100 -thresholdPercentage 5 Example of a successful creation of a disk balancer plan: hdfs diskbalancer -plan `hostname -f` -bandwidth 100 -thresholdPercentage 5 INFO balancer.NameNodeConnector: getBlocks calls for hdfs://nameservice1 will be rate-limited to 20 per second INFO balancer.KeyManager: Block token params received from NN: update interval=10hrs, 0sec, token lifetime=10hrs, 0sec INFO block.BlockTokenSecretManager: Setting block keys INFO balancer.KeyManager: Update block keys every 2hrs, 30mins, 0sec INFO planner.GreedyPlanner: Starting plan for Node : <Worker-Node-FQDN>:9867 INFO planner.GreedyPlanner: Disk Volume set 76c137f0-5d0c-4de3-b166-5c0ac29b77d1 Type : DISK plan completed. INFO planner.GreedyPlanner: Compute Plan for Node : <Worker-Node-FQDN>:9867 took 46 ms INFO command.Command: Writing plan to: INFO command.Command: /system/diskbalancer/2023-Mar-13-02-50-35/<Worker-Node-FQDN>.plan.json Writing plan to: /system/diskbalancer/2023-Mar-13-02-50-35/<Worker-Node-FQDN>.plan.json Execute a Disk Balancer Plan hdfs diskbalancer -execute /system/diskbalancer/2023-Mar-13-02-50-35/<Worker-Node-FQDN>.plan.json Example of a successful execution of a disk balancer plan: hdfs diskbalancer -execute /system/diskbalancer/2023-Mar-13-02-50-35/<Worker-Node-FQDN>.plan.json INFO command.Command: Executing "execute plan" command Query a running Disk Balancer Plan hdfs diskbalancer -query `hostname -f` Example of querying a running disk balancer plan: hdfs diskbalancer -query `hostname -f` INFO command.Command: Executing "query plan" command. Plan File: /system/diskbalancer/2023-Mar-13-02-50-35/<Worker-Node-FQDN>.plan.json Plan ID: 9b0d03edee9d4285cfea5fe13247d8e23cb4557d Result: PLAN_UNDER_PROGRESS Cancel a running Disk Balancer Plan (if required) hdfs diskbalancer -cancel /system/diskbalancer/2023-Mar-13-02-50-35/<Worker-Node-FQDN>.plan.json Example of cancelling a running disk balancer plan: hdfs diskbalancer -cancel /system/diskbalancer/2023-Mar-13-02-50-35/<Worker-Node-FQDN>.plan.json INFO command.Command: Executing "Cancel plan" command. HDFS Disk Balancer - No Rebalancing Required Example The following example is what you will see if you attempt to run the HDFS local disk balancer on a node that doesn’t require any rebalancing to occur: hdfs diskbalancer -plan `hostname -f` -bandwidth 100 -thresholdPercentage 5 INFO balancer.NameNodeConnector: getBlocks calls for hdfs://nameservice1 will be rate-limited to 20 per second INFO balancer.KeyManager: Block token params received from NN: update interval=10hrs, 0sec, token lifetime=10hrs, 0sec INFO block.BlockTokenSecretManager: Setting block keys INFO balancer.KeyManager: Update block keys every 2hrs, 30mins, 0sec INFO planner.GreedyPlanner: Starting plan for Node : <Worker-Node-FQDN>:9867 INFO planner.GreedyPlanner: Compute Plan for Node : <Worker-Node-FQDN>:9867 took 36 ms INFO command.Command: No plan generated. DiskBalancing not needed for node: <Worker-Node-FQDN> threshold used: 5.0 No plan generated. DiskBalancing not needed for node: <Worker-Node-FQDN> threshold used: 5.0

MichaelBush · ‎06-10-2023

Summary It is expected that you will experience worker node data disk failures whilst managing your CDP cluster. This blog takes you through the steps that you should take to gracefully replace the failed worker node disks with the least disruption to your CDP cluster. Investigation Cloudera Manager Notification One easy method to identify that you have experienced a disk failure within your cluster is with the Cloudera Manager UI. You will see the following type of error: Cloudera Manager will also track multiple disk failures: HDFS NameNode - DataNode Volume Failures The failed disks within your cluster can also be observed from within the HDFS NameNode UI: This is also useful to quickly identify exactly which storage locations have failed. Confirming from the Command Line Taking the last example from HDFS NameNode - DataNode Volume Failures, we can see that /data/20 & /data/6 are both failed directories. The following interaction from the Command Line on the worker node will also confirm the disk issue: [root@<WorkerNode> ~]# ls -larth /data/20 ls: cannot access /data/20: Input/output error [root@<WorkerNode> ~]# ls -larth /data/6 ls: cannot access /data/6: Input/output error [root@<WorkerNode> ~]# ls -larth /data/1 total 0 drwxr-xr-x. 26 root root 237 Sep 30 02:54 .. drwxr-xr-x. 3 root root 20 Oct 1 06:45 kudu drwxr-xr-x. 3 root root 16 Oct 1 06:46 dfs drwxr-xr-x. 3 root root 16 Oct 1 06:47 yarn drwxr-xr-x. 3 root root 29 Oct 1 06:48 impala drwxr-xr-x. 2 impala impala 6 Oct 1 06:48 cores drwxr-xr-x. 7 root root 68 Oct 1 06:48 . Resolution Replace a disk on a Worker Node You will have a number of roles that are running on any single worker node host. This is an example of a worker node that is showing a failed disk: Decommission the Worker Node As there are multiple roles running on a worker node, it’s best to use the decommissioning process to gracefully remove the worker node from running services. This can be found by navigating to the host within Cloudera Manager and using “Actions > Begin Maintenance” It will then take you to the following page: Click “Begin Maintenance” and wait for the process to complete. Expect this process to take hours on a busy cluster. The time the process takes to complete is dependent upon: The number of regions that the HBase RegionServer is hosting The number of blocks that the HDFS DataNode is hosting The number of tablets that the Kudu TabletServer is hosting Replace and Configure the disks Once the worker node is fully decommissioned, the disks are ready to be replaced and configured physically within your datacenter by your infrastructure team. Every cluster is going to have its own internal processes to configure the newly replaced disks. Let’s go through an example of how this work can be verified for reference. List the attached block devices [root@<WorkerNode> ~]# lsblk NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT sda 8:0 0 3.7T 0 disk /data/1 sdb 8:16 0 3.7T 0 disk /data/2 sdc 8:32 0 3.7T 0 disk /data/3 sdd 8:48 0 3.7T 0 disk /data/4 sde 8:64 0 3.7T 0 disk /data/5 sdf 8:80 0 3.7T 0 disk /data/6 sdg 8:96 0 3.7T 0 disk /data/7 sdh 8:112 0 3.7T 0 disk /data/8 sdi 8:128 0 3.7T 0 disk /data/9 sdj 8:144 0 3.7T 0 disk /data/10 sdk 8:160 0 3.7T 0 disk /data/11 sdl 8:176 0 3.7T 0 disk /data/12 sdm 8:192 0 3.7T 0 disk /data/13 sdn 8:208 0 3.7T 0 disk /data/14 sdo 8:224 0 3.7T 0 disk /data/15 sdp 8:240 0 3.7T 0 disk /data/16 sdq 65:0 0 3.7T 0 disk /data/17 sdr 65:16 0 3.7T 0 disk /data/18 sds 65:32 0 3.7T 0 disk /data/19 sdt 65:48 0 3.7T 0 disk /data/20 sdu 65:64 0 3.7T 0 disk /data/21 sdv 65:80 0 3.7T 0 disk /data/22 sdw 65:96 0 3.7T 0 disk /data/23 sdx 65:112 0 3.7T 0 disk /data/24 sdy 65:128 0 1.8T 0 disk ├─sdy1 65:129 0 1G 0 part /boot ├─sdy2 65:130 0 20G 0 part [SWAP] └─sdy3 65:131 0 1.7T 0 part ├─vg01-root 253:0 0 500G 0 lvm / ├─vg01-kuduwal 253:1 0 100G 0 lvm /kuduwal ├─vg01-home 253:2 0 50G 0 lvm /home └─vg01-var 253:3 0 100G 0 lvm /var List the IDs of the block devices [root@<WorkerNode> ~]# blkid /dev/sdy1: UUID="4b2f1296-460c-4cbc-8aca-923c9309d4fe" TYPE="xfs" /dev/sdy2: UUID="af9c4c79-21b9-4d02-9453-ede88b920c1f" TYPE="swap" /dev/sdy3: UUID="j9n4QD-60xB-rqpQ-Ck3y-s2m0-FdSo-IGWrN9" TYPE="LVM2_member" /dev/sdb: UUID="4865e719-e77c-4d1e-b1e0-80ae1d0d6e82" TYPE="xfs" /dev/sdc: UUID="59ae0b91-3cfc-4c53-a02f-e20bdf0ac209" TYPE="xfs" /dev/sdd: UUID="b80473e0-bce8-413c-9740-934e8ed7006e" TYPE="xfs" /dev/sda: UUID="684e32c8-eeb2-4215-b861-880543b1f96b" TYPE="xfs" /dev/sdg: UUID="0f0d12ac-7d93-4c76-9f5c-ac6b43f2eaff" TYPE="xfs" /dev/sde: UUID="06c0e908-dd67-4a42-8615-7b7335a7e0f6" TYPE="xfs" /dev/sdf: UUID="9346fa04-dc1a-4dcc-8233-a5cb65495998" TYPE="xfs" /dev/sdn: UUID="8f05d1dd-94d1-4376-9409-d5683ad4c225" TYPE="xfs" /dev/sdo: UUID="5e0413d1-0b82-4ec1-b3f9-bb072db39071" TYPE="xfs" /dev/sdh: UUID="08063201-f252-49dd-8402-042afbea78a2" TYPE="xfs" /dev/sdl: UUID="1e5ace85-f93c-46f7-bf65-353f774cfeaa" TYPE="xfs" /dev/sdk: UUID="195967b5-a1a0-43bb-9a33-9cf7a36fdcb6" TYPE="xfs" /dev/sdq: UUID="db81b056-587e-47a6-844e-2d952278324b" TYPE="xfs" /dev/sdr: UUID="45b4cf68-6f10-4dc7-8128-c2006e7aba5d" TYPE="xfs" /dev/sds: UUID="a8e591e9-33c8-478a-b580-aeac9ad4cf44" TYPE="xfs" /dev/sdi: UUID="a0187ae0-7598-44c4-805c-ef253dea6e7a" TYPE="xfs" /dev/sdm: UUID="720836d8-ddd6-406d-a33f-f1b92f9b40d5" TYPE="xfs" /dev/sdv: UUID="df4bdd58-e8d2-4bdb-8255-b9c7fcfe8999" TYPE="xfs" /dev/sdw: UUID="701f3516-03bc-461b-930c-ab34d0b417d7" TYPE="xfs" /dev/sdu: UUID="5e1bd2f3-8ccc-4ba1-a0f7-bb55c8246d72" TYPE="xfs" /dev/sdj: UUID="264b85f8-9740-418b-a811-20666a305caa" TYPE="xfs" /dev/sdt: UUID="53f2f06e-71e9-4796-86a3-2212c0f652ea" TYPE="xfs" /dev/sdp: UUID="e6b984c0-6d85-4df2-9a7d-cc1c87238c49" TYPE="xfs" /dev/mapper/vg01-root: UUID="18bc42fe-dbfd-4005-8e13-6f5d2272d9a7" TYPE="xfs" /dev/sdx: UUID="53e4023f-583a-4219-bfd2-1a94e15f34ef" TYPE="xfs" /dev/mapper/vg01-kuduwal: UUID="a1441e2f-718b-42eb-b398-28ce20ee50ad" TYPE="xfs" /dev/mapper/vg01-home: UUID="fbc8e522-64da-4cc3-87b6-89ea83fb0aa0" TYPE="xfs" /dev/mapper/vg01-var: UUID="93b1537f-a1a9-4616-b79a-cab9a1e39bf1" TYPE="xfs" View the /etc/fstab [root@<WorkerNode> ~]# cat /etc/fstab /dev/mapper/vg01-root / xfs defaults 0 0 UUID=4b2f1296-460c-4cbc-8aca-923c9309d4fe /boot xfs defaults 0 0 /dev/mapper/vg01-home /home xfs defaults 0 0 /dev/mapper/vg01-kuduwal /kuduwal xfs defaults 0 0 /dev/mapper/vg01-var /var xfs defaults 0 0 UUID=af9c4c79-21b9-4d02-9453-ede88b920c1f swap swap defaults 0 0 UUID=684e32c8-eeb2-4215-b861-880543b1f96b /data/1 xfs noatime,nodiratime 0 0 UUID=4865e719-e77c-4d1e-b1e0-80ae1d0d6e82 /data/2 xfs noatime,nodiratime 0 0 UUID=59ae0b91-3cfc-4c53-a02f-e20bdf0ac209 /data/3 xfs noatime,nodiratime 0 0 UUID=b80473e0-bce8-413c-9740-934e8ed7006e /data/4 xfs noatime,nodiratime 0 0 UUID=06c0e908-dd67-4a42-8615-7b7335a7e0f6 /data/5 xfs noatime,nodiratime 0 0 UUID=9346fa04-dc1a-4dcc-8233-a5cb65495998 /data/6 xfs noatime,nodiratime 0 0 UUID=0f0d12ac-7d93-4c76-9f5c-ac6b43f2eaff /data/7 xfs noatime,nodiratime 0 0 UUID=08063201-f252-49dd-8402-042afbea78a2 /data/8 xfs noatime,nodiratime 0 0 UUID=a0187ae0-7598-44c4-805c-ef253dea6e7a /data/9 xfs noatime,nodiratime 0 0 UUID=264b85f8-9740-418b-a811-20666a305caa /data/10 xfs noatime,nodiratime 0 0 UUID=195967b5-a1a0-43bb-9a33-9cf7a36fdcb6 /data/11 xfs noatime,nodiratime 0 0 UUID=1e5ace85-f93c-46f7-bf65-353f774cfeaa /data/12 xfs noatime,nodiratime 0 0 UUID=720836d8-ddd6-406d-a33f-f1b92f9b40d5 /data/13 xfs noatime,nodiratime 0 0 UUID=8f05d1dd-94d1-4376-9409-d5683ad4c225 /data/14 xfs noatime,nodiratime 0 0 UUID=5e0413d1-0b82-4ec1-b3f9-bb072db39071 /data/15 xfs noatime,nodiratime 0 0 UUID=e6b984c0-6d85-4df2-9a7d-cc1c87238c49 /data/16 xfs noatime,nodiratime 0 0 UUID=db81b056-587e-47a6-844e-2d952278324b /data/17 xfs noatime,nodiratime 0 0 UUID=45b4cf68-6f10-4dc7-8128-c2006e7aba5d /data/18 xfs noatime,nodiratime 0 0 UUID=a8e591e9-33c8-478a-b580-aeac9ad4cf44 /data/19 xfs noatime,nodiratime 0 0 UUID=53f2f06e-71e9-4796-86a3-2212c0f652ea /data/20 xfs noatime,nodiratime 0 0 UUID=5e1bd2f3-8ccc-4ba1-a0f7-bb55c8246d72 /data/21 xfs noatime,nodiratime 0 0 UUID=df4bdd58-e8d2-4bdb-8255-b9c7fcfe8999 /data/22 xfs noatime,nodiratime 0 0 UUID=701f3516-03bc-461b-930c-ab34d0b417d7 /data/23 xfs noatime,nodiratime 0 0 UUID=53e4023f-583a-4219-bfd2-1a94e15f34ef /data/24 xfs noatime,nodiratime 0 0 Recommission the Worker Node Once the disk(s) has been suitably replaced, it’s time to use the recommissioning process to gracefully reintroduce the worker node back into the cluster. This can be found by navigating to the host within Cloudera Manager and using “Actions > End Maintenance” After the node has completed its recommission cycle, follow the guidance in the next sections to perform local disk rebalancing where appropriate. Address local disk HDFS Balancing Most clusters utilize HDFS. This service has a local disk balancer that you can make use of. Please find some helpful guidance within the following - Rebalance your HDFS Disks (single node) Address local disk Kudu Balancing If you are running Kudu within your cluster, you will need to rebalance the existing Kudu data on the local disks of the worker node. Please find some helpful guidance within the following - Rebalance your Kudu Disks (single node)

Online	Offline
Last Visited	‎07-30-2025 06:33 AM

Member Since	‎11-13-2019 02:18 AM
Last Visited	‎07-30-2025 06:33 AM
Posts	16

Cloudera Community

Re: Control the Cloudera Manager Logging Level

Re: Optimize the Cloudera Manager Server

Re: Understand CDSW Error Codes

Rebuild your ranger_audits collection

Review and Optimize Ranger Audit Verbosity

Optimize your Kudu Rebalancer

Resolve your Impala Coordinator Skew

Replace your Kudu Disks (single node)

Replace your HDFS Disks (single node)

Replace your failed Worker Node disks