Member since
11-13-2019
16
Posts
0
Kudos Received
0
Solutions
07-23-2024
06:44 AM
Cloudera Manager also has a great way to set class level debug on the fly without restart when needed to troubleshoot. The way to do this is to navigate to: http://<cm-server>:7180/cmf/debug/logLevel Once at this page, chose the class you wish to change, choose the radio button for the level that you wish to change it to, and hit the submit button. Understand that these changes will NOT persist a restart of the server but will start logging at the new level as soon as you hit the submit button on the logLevel page.
... View more
07-23-2024
06:33 AM
The JDK 8 HotSpot JVM is now using native memory for the representation of class metadata and is called Metaspace. The permanent generation has been removed. The PermSize and MaxPermSize are ignored and a warning is issued if they are present on the command line.
... View more
07-17-2024
11:25 PM
Has anyone encountered error 255 before? ie engine exited with error 255
... View more
06-10-2023
03:36 AM
Summary
After the upgrade from CDH to CDP, fundamental instability was observed within the Ranger Audits UI (within the Ranger Admin Service), and Infra-Solr Roles were constantly exhibiting API liveness errors.
Total audits daily count: 136,553,808
Sample screenshot of Infra-Solr Health check errors
Sample screenshot of specific Infra-Solr Server Health check errors
Investigation
Initial analysis of the number of daily audits within the Ranger service confirmed that there were as many as 1B audits per day. With only 2 Infra-Solr servers, the default configuration of a CDH to CDP upgrade needed to be tweaked to include best practices for the Infra-Solr ranger_audits collection. Reduce Ranger Audits verbosity (see this complementary document: Ranger Audit Verbosity). Assess Server Design. The count of Infra-Solr servers is important when deciding how the ranger_audits collection should be built. A single Solr Server is not recommended as it would not be resilient. Consider at least 2 replicas per shard for any collection to facilitate the split of the ranger_audits collection into 6 shards, with 2 replicas for each, while still maintaining the best practice guidelines within Solr.
Resolution
The following public documentation will assist with a deeper understanding of how you might choose to align with best practices given the hardware you have available and the volume of audits being recorded within the service - Calculating Infra Solr resource needs.
Configure the following 3 parameters within the Ranger Service (within CM) according to best practices. The example below is for a cluster of 3 Infra-Solr servers with 3 shards configured for the ranger_audits collection, with 2 replicas per shard, and limiting the number of maximum shards to 6 (which is a multiple of the 1st two parameters):
Configure the TTL (Time To Live) for audits that are propagated into the Infra-Solr ranger_audits collection. This requirement should be defined by the business, for instance, 25 days, but TTL only impacts audit visibility within the Ranger UI; all audits will remain accessible within HDFS.
Ranger - Delete ranger_audits collection
Ensure all Solr Servers are healthy and available. Then, in order to restructure it, fully delete the ranger_audits collection and monitor the status (example below).
NOTE - the example date of 8Mar2022 in the below example is for auditing purposes - it’s the date that the full collection deletion occurred.
DELETE RANGER AUDITS COLLECTION
http://<Infra-Solr-Server>:18983/solr/admin/collections?action=DELETE&name=ranger_audits&async=del_ranger_audits8Mar2022
REQUEST THE STATUS OF AN ASYNC CALL
http://<Infra-Solr-Server>:18983/solr/admin/collections?action=REQUESTSTATUS&requestid=del_ranger_audits8Mar2022
An example of a successful delete command issued to the URL:
{
"responseHeader":{
"status":0,
"QTime":9},
"requestid":"del_ranger_audits20Apr2022"}
Restart Ranger Admin service. When you perform the restart, it will recreate the ranger_audits collection based on the parameters defined earlier.
... View more
Labels:
06-10-2023
03:30 AM
Summary
Infra-Solr service exhibits fundamental stability issues after upgrading CDH to CDP.
Sample screenshot of Infra-Solr Health check errors
Sample screenshot of specific Infra-Solr Server Health check errors
Investigation
The Infra-Solr service hosts the ranger_audits collection which is used to display cluster audit information within the Ranger Admin UI. Perform preliminary analysis using Ranger Admin UI - Audits for a single day as demonstrated below. [NOTE: these sample screenshots were taken after resolving the issues; your audit counts will likely be much higher].
Total audits daily count: 136,553,808
Total Impala audits daily count: 5,901,146
Total hbaseregional audits daily count: 1,178,831
Total hbaseregional (access type scanneropen) audits daily count: 0
(due to the complete exclusion of these events)
Total hdfs audits daily count: 128,681,418
Total hdfs (access type liststatus) audits daily count: 0
(due to the complete exclusion of these events)
Assemble and analyze audit counts. The actual pre-resolution values for this case study were:
Total number of Ranger audits - 705,875,710
Application - Impala - 6,719,878
Application - hbaseRegional - 389,896,166
Application - hbaseRegional; Access Type - scannerOpen - 261,735,436
Application - hdfs - 308,644,209
Application - hdfs; Access Type - listStatus - 212,728,345
The total count of Ranger audits (700M) is excessively voluminous. Audit verbosity is a primary contributing factor to Infra-Solr service instability because Ranger Audits are stored within an Infra-Solr collection - ranger_audits, and they are presented within the Ranger Admin UI. Ranger_audits collection is overwhelming Infra-Solr Servers, leading to Web Server Status Unknown / API Liveness check failures.
To reduce audit verbosity, identify meaningful and meaningless events using the Infra-Solr API.
URL examples for reference only:
Query by date/time range
http://lannister-005.edh.cloudera.com:18983/solr/ranger_audits/select?q=evtTime:[2022-02-16T00:00:00.000Z+TO+2022-02-16T11:59:59.000Z]&sort=evtTime+desc
select all: oldest
http://lannister-005.edh.cloudera.com:18983/solr/ranger_audits/select?q=*:*&sort=evtTime+asc&rows=1000
select all: newest
http://lannister-005.edh.cloudera.com:18983/solr/ranger_audits/select?q=*:*&sort=evtTime+desc&rows=1000
Curl examples for reference only:
> Query by date/time range && number of rows to capture (important)
> -g required to disable globbing of the date range
> This is verbose
curl -g --negotiate -u: "http://lannister-005.edh.cloudera.com:18983/solr/ranger_audits/select?q=evtTime:[2022-02-17T00:00:00.000Z+TO+2022-02-17T11:59:59.999Z]&rows=100000&sort=evtTime+desc" > RangerAuditSolrOutput17Feb22_100000Rows.text
> This is the above query, but narrowing down fewer fields (that you want to see)
curl -g --negotiate -u: "http://lannister-005.edh.cloudera.com:18983/solr/ranger_audits/select?fl=access%2Cenforcer%2Cagent%2Crepo%2CreqUser%2Cresource%2Caction&q=evtTime:[2022-02-17T00:00:00.000Z+TO+2022-02-17T11:59:59.999Z]&rows=100000&sort=evtTime+desc" > RangerAuditSolrOutput17Feb22_100000Rows.text
> This is the above query, but narrowing down ever fewer fields (that you want to see)
curl -g --negotiate -u: "http://lannister-005.edh.cloudera.com:18983/solr/ranger_audits/select?fl=access%2Crepo&q=evtTime:[2022-02-17T00:00:00.000Z+TO+2022-02-17T11:59:59.999Z]&rows=100000&sort=evtTime+desc" > RangerAuditSolrOutput17Feb22_100000Rows.text
> Select all: oldest
curl --negotiate -u: "http://lannister-005.edh.cloudera.com:18983/solr/ranger_audits/select?q=*:*&sort=evtTime+asc&rows=1000" > RangerAuditSolrOutput17Feb22.text
> Select all: newest
curl --negotiate -u: "http://lannister-005.edh.cloudera.com:18983/solr/ranger_audits/select?q=*:*&sort=evtTime+desc&rows=1000" > RangerAuditSolrOutput17Feb22.text
In this case study, 48 curl commands were executed to get a balanced picture over a 24-hour period, pulling 100,000 audit events every 30 minutes.
NOTE: The Infra-Solr server must render the output; 100,000+ events can easily crash a 30GB Infra-Solr Server. Do not pull any more for that time interval.
curl -g --negotiate -u: "http://lannister-005.edh.cloudera.com:18983/solr/ranger_audits/select?fl=access%2Crepo&q=evtTime:[2022-02-17T23:30:00.000Z+TO+2022-02-17T23:59:59.999Z]&rows=100000&sort=evtTime+desc" > RangerAuditSolrOutput17Feb22_100000Rows_2330-2359.text
curl -g --negotiate -u: "http://lannister-005.edh.cloudera.com:18983/solr/ranger_audits/select?fl=access%2Crepo&q=evtTime:[2022-02-17T23:00:00.000Z+TO+2022-02-17T23:29:59.999Z]&rows=100000&sort=evtTime+desc" > RangerAuditSolrOutput17Feb22_100000Rows_2300-2329.text
curl -g --negotiate -u: "http://lannister-005.edh.cloudera.com:18983/solr/ranger_audits/select?fl=access%2Crepo&q=evtTime:[2022-02-17T22:30:00.000Z+TO+2022-02-17T22:59:59.999Z]&rows=100000&sort=evtTime+desc" > RangerAuditSolrOutput17Feb22_100000Rows_2230-2259.text
curl -g --negotiate -u: "http://lannister-005.edh.cloudera.com:18983/solr/ranger_audits/select?fl=access%2Crepo&q=evtTime:[2022-02-17T22:00:00.000Z+TO+2022-02-17T22:29:59.999Z]&rows=100000&sort=evtTime+desc" > RangerAuditSolrOutput17Feb22_100000Rows_2200-2229.text
..
REPEAT THE COMMANDS WITH RELEVANT EXAMPLES
..
curl -g --negotiate -u: "http://lannister-005.edh.cloudera.com:18983/solr/ranger_audits/select?fl=access%2Crepo&q=evtTime:[2022-02-17T01:30:00.000Z+TO+2022-02-17T01:59:59.999Z]&rows=100000&sort=evtTime+desc" > RangerAuditSolrOutput17Feb22_100000Rows_0130-0159.text
curl -g --negotiate -u: "http://lannister-005.edh.cloudera.com:18983/solr/ranger_audits/select?fl=access%2Crepo&q=evtTime:[2022-02-17T01:00:00.000Z+TO+2022-02-17T01:29:59.999Z]&rows=100000&sort=evtTime+desc" > RangerAuditSolrOutput17Feb22_100000Rows_0100-0129.text
curl -g --negotiate -u: "http://lannister-005.edh.cloudera.com:18983/solr/ranger_audits/select?fl=access%2Crepo&q=evtTime:[2022-02-17T00:30:00.000Z+TO+2022-02-17T00:59:59.999Z]&rows=100000&sort=evtTime+desc" > RangerAuditSolrOutput17Feb22_100000Rows_0030-0059.text
curl -g --negotiate -u: "http://lannister-005.edh.cloudera.com:18983/solr/ranger_audits/select?fl=access%2Crepo&q=evtTime:[2022-02-17T00:00:00.000Z+TO+2022-02-17T00:29:59.999Z]&rows=100000&sort=evtTime+desc" > RangerAuditSolrOutput17Feb22_100000Rows_0000-0029.text
The 48 output files were simply parsed to ascertain the most frequent Ranger audit access types (see the example below when creating your own):
grep access RangerAuditSolrOutput17Feb22* | more | sort -rn | uniq -c | sort -rn | awk -F ' ' '{sum+=$1;}END{print sum;}'
egrep "listStatus|scannerOpen" RangerAuditSolrOutput17Feb22* | more | sort -rn | uniq -c | sort -rn | awk -F ' ' '{sum+=$1;}END{print sum;}'
This example groups audit types by category to assist you in selecting what is meaningful and what is not:
grep access RangerAuditSolrOutput17Feb22_100000Rows.text | more | sort -rn | uniq -c | sort -rn
50517 "access":"listStatus",
23782 "access":"scannerOpen",
14559 "access":"get",
5193 "access":"put",
2081 "access":"open",
1884 "access":"delete",
1394 "access":"WRITE",
336 "access":"rename",
126 "access":"contentSummary",
84 "access":"checkAndPut",
26 "access":"mkdirs",
6 "access":"compactSelection",
5 "access":"flush",
4 "access":"getAclStatus",
3 "access":"compact",
In this case study, up to 1B audit events were being recorded per day with 65-70% sourcing from HDFS - listStatus and HBase - scannerOpen. Such pure metadata operations events were meaningless to DevOps, nevertheless, we verified they were also meaningless to the business before attempting to exclude them. Retain the ‘get’, ‘put’, ‘open’, ‘delete,’ and other key audits.
Assess the Infra-Solr & ranger_audits collection design – Infra-Solr server count, shards, and replicas count – which play an important role in stability. This complimentary document covers those assessment steps: Ranger - Rebuild ranger_audits).
Resolution
Tune Ranger to exclude unwanted event collection.
Edit the cm_hdfs service configuration:
Exclude the ‘listStatus’ audit type from the ‘Audit Filter’ section:
Edit the cm_hbase service configuration:
Exclude the ‘scannerOpen’ audit type from the ‘Audit Filter’ section:
Excluding unmeaningful events provided 3 benefits:
Infra-Solr and the ranger_audits collection stability was greatly improved and facilitated manageability.
Infra-Solr and ranger_audits collection required only 30-35% of the resources to perform the same tasks
Ranger audit history required only 30-35% of HDFS disk space when writing to /ranger/….
... View more
Labels:
06-10-2023
03:15 AM
Summary
It is always a good idea to review your Kudu Rebalancer settings so that all hardware is optimally utilized when Kudu Rebalancing activities are being performed.
Investigation
Kudu Configuration
Balancer configuration properties
Although the general kudu default parameters have not proven to adversely impact Kudu Rebalancing operations, the following property change is recommended to speed up that process.
Property
Default
Cloudera Chosen Value
rb_max_moves_per_server
5
10
Avoid Landmines
Some key notes before performing the rebalancing activities after setting up the services/disks:
Never run both the HDFS & Kudu Rebalancers at the same time
The contention between both may cause issues
Perform the Rebalancing activities in the order of Kudu first, HDFS second
Due to Kudu being unable to track capacity utilization
Performing Kudu Rebalancing Activities
We recommend that you perform these actions from within CM to provide full visibility into the Rebalancer status as well as when the action has started and finished.
Kudu
Go to CM - Kudu - Actions - Run Kudu Rebalancer Tool
... View more
Labels:
06-10-2023
03:10 AM
Summary
Are you having issues with more queries being handled by a single Impala Coordinator?
Does this eventually lead to OOM scenarios?
Let’s consider you have 3 Impala Coordinators within your cluster and notice that there are queries that skew onto any one of the Impala Coordinators and overwhelm it.
Note how one of the Impala Coordinators in the above example has 73 running queries, and the other 2 have relatively few.
Investigation
Source IP Persistence
To ascertain why any Impala Coordinator can skew the number of running queries that are active on it, look at the way the proxy is set up to handle incoming queries.
‘Source IP Persistence’ means setting up sessions from the same IP address to always go to the same coordinator. This setting is required when setting up high availability with Hue. It is also required to avoid the Hue message ‘results have expired’, which indicates when a query is sent to the cluster on one coordinator but the result doesn’t return to the user via the same coordinator/Hue Server.
Example HAProxy Configuration for Source IP Persistence
The public docs for setting up HAProxy for Impala - Configuring Load Balancer for Impala.
Example setup of Hue-Impala connectivity within /etc/haproxy/haproxy.cfg as follows:
listen impala-hue :21052
mode tcp
stats enable
balance source
timeout connect 5000ms
timeout queue 5000ms
timeout client 3600000ms
timeout server 3600000ms
# Impala Nodes
server impala-coordinator-001.fqdn impala-coordinator-001.fqdn:21050 check
server impala-coordinator-002.fqdn impala-coordinator-002.fqdn:21050 check
server impala-coordinator-003.fqdn impala-coordinator-003.fqdn:21050 check
Now let’s review what can impact the overall connection count into an Impala Coordinator: Hue, Hive & Impala timeout settings.
Example Timeout Settings
The following settings might mimic what you have currently set within your Hue, Hive & Impala services.
Hue
Hive
Impala
Proposed Timeout Settings
Whilst the actual settings will vary cluster by cluster, we recommend moving away from the default settings and setting all of the idle parameters to 2 hours across the board in all 3 services: Hue, Hive & Impala.
This is an initial goal of introducing timeouts whilst monitoring the user experience. The ultimate best practice in this area is to head toward having:
Idle Query Timeouts of 300 seconds (or 5 minutes)
Idle Session Timeouts of 600 seconds (or 10 minutes)
NOTE - all of the parameters being discussed relate to ‘idle’ sessions and queries; in other words, the user has to have left either the session or query in an idle state before the idle parameters will kick in. No active session or query will be captured by this change in the service(s) behavior (s).
Resolution
Hue
Steps to perform:
Go to CM - Hue - Configuration
Search for “Auto Logout Timeout”
Change to 2 hours
Restart Hue Service
Hive
Steps to perform:
Go to CM - Hive - Configuration
Search for “Idle Operation Timeout”
Change to 300 seconds
Search for “Idle Session Timeout”
Change to 600 seconds
Restart Hive Service
Hive on Tez
Steps to perform:
Go to CM - Hive on Tez - Configuration
Search for “Idle Operation Timeout”
Change to 300 seconds
Search for “Idle Session Timeout”
Change to 600 seconds
Restart Hive on Tez Service
Impala
Steps to perform:
Go to CM - Impala - Configuration
Search for “Idle Query Timeout”
Change to 300 seconds
Search for “Idle Session Timeout”
Change to 600 seconds
Restart Impala Service
... View more
06-10-2023
03:04 AM
Summary
After you experience a disk failure on a worker node then replace the disk, you’ll need to ensure that the disk is suitably rebalanced within the Kudu Service at the local level.
Investigation & Resolution
Purging a Tablet Server
There isn’t currently a method to rebalance the replicas on a single Tablet Server disk array. This means that we need to empty the node and reintroduce it so that it can be used again from scratch. We begin by quiescing the Tablet Server.
Quiesce the Tablet Server
Quiesce essentially means to stop the Tablet Server from hosting any leaders in order to:
Make other replicas on live Tablet Servers become the leaders
Prevent this Tablet Server from becoming a leader for any other reason
Allow this Tablet Server to be read from (the replicas that are still present)
Check Quiesce Status
sudo -u kudu kudu tserver quiesce status <Worker-Node-FQDN>
Quiescing | Tablet Leaders | Active Scanners
-----------+----------------+-----------------
true | 0 | 0
Quiesce Start
sudo -u kudu kudu tserver quiesce start <Worker-Node-FQDN>
Put the Tablet Server into Maintenance Mode
Maintenance Mode stops the Tablet Server from being used completely. The maintenance mode commands require you to retrieve the UUID of the Tablet Server first. We can get this information from a tserver list command:
sudo -u kudu kudu tserver list <Master-Node1-FQDN>,<Master-Node2-FQDN>,<Master-Note 3-FQDN>
An example that then targets the server you want to work on
sudo -u kudu kudu tserver list <Master-Node1-FQDN>,<Master-Node2-FQDN>,<Master-Node3-FQDN> | grep <Worker-Node-FQDN>
5e103ac84707495e843a4553ac622f20 | <Worker-Node-FQDN>:7050
Put the Tablet Server into Maintenance Mode
sudo -u kudu kudu tserver state enter_maintenance <Master-Node1-FQDN>,<Master-Node2-FQDN>,<Master-Note 3-FQDN> 5e103ac84707495e843a4553ac622f20
Exit the Tablet Server from Maintenance Mode
sudo -u kudu kudu tserver state exit_maintenance <Master-Node1-FQDN>,<Master-Node2-FQDN>,<Master-Note 3-FQDN> 5e103ac84707495e843a4553ac622f20
Run ksck to check the status of Kudu Service / TS to be purged
This will confirm the status of both Quiesce and Maintenance Mode for every Tablet Server in the cluster, (in our example - <Worker-Node-FQDN>😞
sudo -u kudu kudu cluster ksck <Master-Node1-FQDN>,<Master-Node2-FQDN>,<Master-Node3-FQDN> 2>&1 | tee ksck.out
The above command outputs the ksck to both the terminal and a file called ‘ksck.out’. This allows us to review the information from both perspectives and also create a record of the output in the file. But taking our example of purging <Worker-Node-FQDN> into account, the following information is key:
Tablet Server Summary
This is a list of all Tablet Servers in the cluster. We’ve focused on just <Worker-Node-FQDN> and the surrounding TS’s for illustrative purposes. Notice the text in RED - <Worker-Node-FQDN> is quiescing and has no leaders running on it.
Tablet Server Summary
UUID | Address | Status | Location | Quiescing | Tablet Leaders | Active Scanners
----------------------------------+---------------------------------+---------+-------------+-----------+----------------+-----------------
…
59e6ca5107754c24b649ee9c9acfccfb | <Worker-Node-FQDN>:7050 | HEALTHY | /CabinetE01 | false | 47 | 0
5e103ac84707495e843a4553ac622f20 | <Worker-Node-FQDN>:7050 | HEALTHY | /CabinetA08 | true | 0 | 0
5edf82f0516b4897b3a7991a7e67d71c | <Worker-Node-FQDN>:7050 | HEALTHY | /CabinetA07 | false | 1452 | 0
…
Tablet Server State (maintenance mode)
This section shows that the TS is in maintenance mode.
Tablet Server States
Server | State
----------------------------------+------------------
5e103ac84707495e843a4553ac622f20 | MAINTENANCE_MODE
Purge the Tablet Server
The following command instructs kudu to ignore the <Worker-Node-FQDN> node AND move replicas away from it:
sudo -u kudu /tmp/kudu cluster rebalance <Master-Node1-FQDN>,<Master-Node2-FQDN>,<Master-Note 3-FQDN> -ignored_tservers=5e103ac84707495e843a4553ac622f20 -move_replicas_from_ignored_tservers
Again, importantly, the Tablet Server has to have been successfully quiesced and put into maintenance mode to avoid any issues with the Kudu service.
A simple break in VPN or shell terminal will kill the rebalance command. This won't affect Kudu, but it will stop the process. In order to work around this and retain information during the process, use the following command to output the rebalance status into the active terminal session as well as a file:
sudo -u kudu /tmp/kudu cluster rebalance <Master-Node1-FQDN>,<Master-Node2-FQDN>,<Master-Node3-FQDN> -ignored_tservers=5e103ac84707495e843a4553ac622f20 -move_replicas_from_ignored_tservers 2>&1 | tee <Worker-Node-FQDN>-rebalance.out &
Re-introduce the Tablet Server
After the Kudu Tablet Server has been purged, it’s time to reintroduce it into the Kudu service so that it can be used again.
Exit the Tablet Server from Maintenance Mode
sudo -u kudu kudu tserver state exit_maintenance <Master-Node1-FQDN>,<Master-Node2-FQDN>,<Master-Note 3-FQDN> 5e103ac84707495e843a4553ac622f20
Unquiesce the Tablet Server
sudo -u kudu kudu tserver quiesce stop <Worker-Node-FQDN>
Rebalance the Kudu Service
We now have a Kudu Tablet Server that has been quiesced and purged. It’s time to rebalance the Kudu service and share the Tablets back onto the recently purged Kudu Tablet Server.
Go to CM - Kudu - Actions - Run Kudu Rebalancer Tool:
... View more
Labels:
06-10-2023
03:01 AM
Summary
When you have experienced a disk failure on a worker node and have had the disk replaced, you’ll need to ensure that the disk is suitably rebalanced within the Kudu Service at the local level.
Investigation
HDFS Disk Balancer - Explained
This is an area that already has a great Blog written around it:
How-to: Use the New HDFS Intra-DataNode Disk Balancer in Apache Hadoop
Please read through the blog and follow the guidance to verify that you have already set up the HDFS service to be able to perform this necessary action.
Resolution
HDFS Disk Balancer - Execution
Let’s go through the process of performing an HDFS Intra-DataNode Disk Rebalancing process.
Obtain a local HDFS DataNode Kerberos Ticket
cd /var/run/cloudera-scm-agent/process/`ls -larth /var/run/cloudera-scm-agent/process | grep -i hdfs-DATANODE | tail -1 | awk '{print $9}'`
kinit -kt hdfs.keytab hdfs/`hostname -f`@<ClusterDomain>
Create a Disk Balancer Plan
hdfs diskbalancer -plan `hostname -f` -bandwidth 100 -thresholdPercentage 5
Example of a successful creation of a disk balancer plan:
hdfs diskbalancer -plan `hostname -f` -bandwidth 100 -thresholdPercentage 5
INFO balancer.NameNodeConnector: getBlocks calls for hdfs://nameservice1 will be rate-limited to 20 per second
INFO balancer.KeyManager: Block token params received from NN: update interval=10hrs, 0sec, token lifetime=10hrs, 0sec
INFO block.BlockTokenSecretManager: Setting block keys
INFO balancer.KeyManager: Update block keys every 2hrs, 30mins, 0sec
INFO planner.GreedyPlanner: Starting plan for Node : <Worker-Node-FQDN>:9867
INFO planner.GreedyPlanner: Disk Volume set 76c137f0-5d0c-4de3-b166-5c0ac29b77d1 Type : DISK plan completed.
INFO planner.GreedyPlanner: Compute Plan for Node : <Worker-Node-FQDN>:9867 took 46 ms
INFO command.Command: Writing plan to:
INFO command.Command: /system/diskbalancer/2023-Mar-13-02-50-35/<Worker-Node-FQDN>.plan.json
Writing plan to:
/system/diskbalancer/2023-Mar-13-02-50-35/<Worker-Node-FQDN>.plan.json
Execute a Disk Balancer Plan
hdfs diskbalancer -execute /system/diskbalancer/2023-Mar-13-02-50-35/<Worker-Node-FQDN>.plan.json
Example of a successful execution of a disk balancer plan:
hdfs diskbalancer -execute /system/diskbalancer/2023-Mar-13-02-50-35/<Worker-Node-FQDN>.plan.json
INFO command.Command: Executing "execute plan" command
Query a running Disk Balancer Plan
hdfs diskbalancer -query `hostname -f`
Example of querying a running disk balancer plan:
hdfs diskbalancer -query `hostname -f`
INFO command.Command: Executing "query plan" command.
Plan File: /system/diskbalancer/2023-Mar-13-02-50-35/<Worker-Node-FQDN>.plan.json
Plan ID: 9b0d03edee9d4285cfea5fe13247d8e23cb4557d
Result: PLAN_UNDER_PROGRESS
Cancel a running Disk Balancer Plan (if required)
hdfs diskbalancer -cancel /system/diskbalancer/2023-Mar-13-02-50-35/<Worker-Node-FQDN>.plan.json
Example of cancelling a running disk balancer plan:
hdfs diskbalancer -cancel /system/diskbalancer/2023-Mar-13-02-50-35/<Worker-Node-FQDN>.plan.json
INFO command.Command: Executing "Cancel plan" command.
HDFS Disk Balancer - No Rebalancing Required Example
The following example is what you will see if you attempt to run the HDFS local disk balancer on a node that doesn’t require any rebalancing to occur:
hdfs diskbalancer -plan `hostname -f` -bandwidth 100 -thresholdPercentage 5
INFO balancer.NameNodeConnector: getBlocks calls for hdfs://nameservice1 will be rate-limited to 20 per second
INFO balancer.KeyManager: Block token params received from NN: update interval=10hrs, 0sec, token lifetime=10hrs, 0sec
INFO block.BlockTokenSecretManager: Setting block keys
INFO balancer.KeyManager: Update block keys every 2hrs, 30mins, 0sec
INFO planner.GreedyPlanner: Starting plan for Node : <Worker-Node-FQDN>:9867
INFO planner.GreedyPlanner: Compute Plan for Node : <Worker-Node-FQDN>:9867 took 36 ms
INFO command.Command: No plan generated. DiskBalancing not needed for node: <Worker-Node-FQDN> threshold used: 5.0
No plan generated. DiskBalancing not needed for node: <Worker-Node-FQDN> threshold used: 5.0
... View more
Labels:
06-10-2023
02:59 AM
Summary
It is expected that you will experience worker node data disk failures whilst managing your CDP cluster. This blog takes you through the steps that you should take to gracefully replace the failed worker node disks with the least disruption to your CDP cluster.
Investigation
Cloudera Manager Notification
One easy method to identify that you have experienced a disk failure within your cluster is with the Cloudera Manager UI. You will see the following type of error:
Cloudera Manager will also track multiple disk failures:
HDFS NameNode - DataNode Volume Failures
The failed disks within your cluster can also be observed from within the HDFS NameNode UI:
This is also useful to quickly identify exactly which storage locations have failed.
Confirming from the Command Line
Taking the last example from HDFS NameNode - DataNode Volume Failures, we can see that /data/20 & /data/6 are both failed directories.
The following interaction from the Command Line on the worker node will also confirm the disk issue:
[root@<WorkerNode> ~]# ls -larth /data/20
ls: cannot access /data/20: Input/output error
[root@<WorkerNode> ~]# ls -larth /data/6
ls: cannot access /data/6: Input/output error
[root@<WorkerNode> ~]# ls -larth /data/1
total 0
drwxr-xr-x. 26 root root 237 Sep 30 02:54 ..
drwxr-xr-x. 3 root root 20 Oct 1 06:45 kudu
drwxr-xr-x. 3 root root 16 Oct 1 06:46 dfs
drwxr-xr-x. 3 root root 16 Oct 1 06:47 yarn
drwxr-xr-x. 3 root root 29 Oct 1 06:48 impala
drwxr-xr-x. 2 impala impala 6 Oct 1 06:48 cores
drwxr-xr-x. 7 root root 68 Oct 1 06:48 .
Resolution
Replace a disk on a Worker Node
You will have a number of roles that are running on any single worker node host. This is an example of a worker node that is showing a failed disk:
Decommission the Worker Node
As there are multiple roles running on a worker node, it’s best to use the decommissioning process to gracefully remove the worker node from running services. This can be found by navigating to the host within Cloudera Manager and using “Actions > Begin Maintenance”
It will then take you to the following page:
Click “Begin Maintenance” and wait for the process to complete.
Expect this process to take hours on a busy cluster. The time the process takes to complete is dependent upon:
The number of regions that the HBase RegionServer is hosting
The number of blocks that the HDFS DataNode is hosting
The number of tablets that the Kudu TabletServer is hosting
Replace and Configure the disks
Once the worker node is fully decommissioned, the disks are ready to be replaced and configured physically within your datacenter by your infrastructure team.
Every cluster is going to have its own internal processes to configure the newly replaced disks. Let’s go through an example of how this work can be verified for reference.
List the attached block devices
[root@<WorkerNode> ~]# lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
sda 8:0 0 3.7T 0 disk /data/1
sdb 8:16 0 3.7T 0 disk /data/2
sdc 8:32 0 3.7T 0 disk /data/3
sdd 8:48 0 3.7T 0 disk /data/4
sde 8:64 0 3.7T 0 disk /data/5
sdf 8:80 0 3.7T 0 disk /data/6
sdg 8:96 0 3.7T 0 disk /data/7
sdh 8:112 0 3.7T 0 disk /data/8
sdi 8:128 0 3.7T 0 disk /data/9
sdj 8:144 0 3.7T 0 disk /data/10
sdk 8:160 0 3.7T 0 disk /data/11
sdl 8:176 0 3.7T 0 disk /data/12
sdm 8:192 0 3.7T 0 disk /data/13
sdn 8:208 0 3.7T 0 disk /data/14
sdo 8:224 0 3.7T 0 disk /data/15
sdp 8:240 0 3.7T 0 disk /data/16
sdq 65:0 0 3.7T 0 disk /data/17
sdr 65:16 0 3.7T 0 disk /data/18
sds 65:32 0 3.7T 0 disk /data/19
sdt 65:48 0 3.7T 0 disk /data/20
sdu 65:64 0 3.7T 0 disk /data/21
sdv 65:80 0 3.7T 0 disk /data/22
sdw 65:96 0 3.7T 0 disk /data/23
sdx 65:112 0 3.7T 0 disk /data/24
sdy 65:128 0 1.8T 0 disk
├─sdy1 65:129 0 1G 0 part /boot
├─sdy2 65:130 0 20G 0 part [SWAP]
└─sdy3 65:131 0 1.7T 0 part
├─vg01-root 253:0 0 500G 0 lvm /
├─vg01-kuduwal 253:1 0 100G 0 lvm /kuduwal
├─vg01-home 253:2 0 50G 0 lvm /home
└─vg01-var 253:3 0 100G 0 lvm /var
List the IDs of the block devices
[root@<WorkerNode> ~]# blkid
/dev/sdy1: UUID="4b2f1296-460c-4cbc-8aca-923c9309d4fe" TYPE="xfs"
/dev/sdy2: UUID="af9c4c79-21b9-4d02-9453-ede88b920c1f" TYPE="swap"
/dev/sdy3: UUID="j9n4QD-60xB-rqpQ-Ck3y-s2m0-FdSo-IGWrN9" TYPE="LVM2_member"
/dev/sdb: UUID="4865e719-e77c-4d1e-b1e0-80ae1d0d6e82" TYPE="xfs"
/dev/sdc: UUID="59ae0b91-3cfc-4c53-a02f-e20bdf0ac209" TYPE="xfs"
/dev/sdd: UUID="b80473e0-bce8-413c-9740-934e8ed7006e" TYPE="xfs"
/dev/sda: UUID="684e32c8-eeb2-4215-b861-880543b1f96b" TYPE="xfs"
/dev/sdg: UUID="0f0d12ac-7d93-4c76-9f5c-ac6b43f2eaff" TYPE="xfs"
/dev/sde: UUID="06c0e908-dd67-4a42-8615-7b7335a7e0f6" TYPE="xfs"
/dev/sdf: UUID="9346fa04-dc1a-4dcc-8233-a5cb65495998" TYPE="xfs"
/dev/sdn: UUID="8f05d1dd-94d1-4376-9409-d5683ad4c225" TYPE="xfs"
/dev/sdo: UUID="5e0413d1-0b82-4ec1-b3f9-bb072db39071" TYPE="xfs"
/dev/sdh: UUID="08063201-f252-49dd-8402-042afbea78a2" TYPE="xfs"
/dev/sdl: UUID="1e5ace85-f93c-46f7-bf65-353f774cfeaa" TYPE="xfs"
/dev/sdk: UUID="195967b5-a1a0-43bb-9a33-9cf7a36fdcb6" TYPE="xfs"
/dev/sdq: UUID="db81b056-587e-47a6-844e-2d952278324b" TYPE="xfs"
/dev/sdr: UUID="45b4cf68-6f10-4dc7-8128-c2006e7aba5d" TYPE="xfs"
/dev/sds: UUID="a8e591e9-33c8-478a-b580-aeac9ad4cf44" TYPE="xfs"
/dev/sdi: UUID="a0187ae0-7598-44c4-805c-ef253dea6e7a" TYPE="xfs"
/dev/sdm: UUID="720836d8-ddd6-406d-a33f-f1b92f9b40d5" TYPE="xfs"
/dev/sdv: UUID="df4bdd58-e8d2-4bdb-8255-b9c7fcfe8999" TYPE="xfs"
/dev/sdw: UUID="701f3516-03bc-461b-930c-ab34d0b417d7" TYPE="xfs"
/dev/sdu: UUID="5e1bd2f3-8ccc-4ba1-a0f7-bb55c8246d72" TYPE="xfs"
/dev/sdj: UUID="264b85f8-9740-418b-a811-20666a305caa" TYPE="xfs"
/dev/sdt: UUID="53f2f06e-71e9-4796-86a3-2212c0f652ea" TYPE="xfs"
/dev/sdp: UUID="e6b984c0-6d85-4df2-9a7d-cc1c87238c49" TYPE="xfs"
/dev/mapper/vg01-root: UUID="18bc42fe-dbfd-4005-8e13-6f5d2272d9a7" TYPE="xfs"
/dev/sdx: UUID="53e4023f-583a-4219-bfd2-1a94e15f34ef" TYPE="xfs"
/dev/mapper/vg01-kuduwal: UUID="a1441e2f-718b-42eb-b398-28ce20ee50ad" TYPE="xfs"
/dev/mapper/vg01-home: UUID="fbc8e522-64da-4cc3-87b6-89ea83fb0aa0" TYPE="xfs"
/dev/mapper/vg01-var: UUID="93b1537f-a1a9-4616-b79a-cab9a1e39bf1" TYPE="xfs"
View the /etc/fstab
[root@<WorkerNode> ~]# cat /etc/fstab
/dev/mapper/vg01-root / xfs defaults 0 0
UUID=4b2f1296-460c-4cbc-8aca-923c9309d4fe /boot xfs defaults 0 0
/dev/mapper/vg01-home /home xfs defaults 0 0
/dev/mapper/vg01-kuduwal /kuduwal xfs defaults 0 0
/dev/mapper/vg01-var /var xfs defaults 0 0
UUID=af9c4c79-21b9-4d02-9453-ede88b920c1f swap swap defaults 0 0
UUID=684e32c8-eeb2-4215-b861-880543b1f96b /data/1 xfs noatime,nodiratime 0 0
UUID=4865e719-e77c-4d1e-b1e0-80ae1d0d6e82 /data/2 xfs noatime,nodiratime 0 0
UUID=59ae0b91-3cfc-4c53-a02f-e20bdf0ac209 /data/3 xfs noatime,nodiratime 0 0
UUID=b80473e0-bce8-413c-9740-934e8ed7006e /data/4 xfs noatime,nodiratime 0 0
UUID=06c0e908-dd67-4a42-8615-7b7335a7e0f6 /data/5 xfs noatime,nodiratime 0 0
UUID=9346fa04-dc1a-4dcc-8233-a5cb65495998 /data/6 xfs noatime,nodiratime 0 0
UUID=0f0d12ac-7d93-4c76-9f5c-ac6b43f2eaff /data/7 xfs noatime,nodiratime 0 0
UUID=08063201-f252-49dd-8402-042afbea78a2 /data/8 xfs noatime,nodiratime 0 0
UUID=a0187ae0-7598-44c4-805c-ef253dea6e7a /data/9 xfs noatime,nodiratime 0 0
UUID=264b85f8-9740-418b-a811-20666a305caa /data/10 xfs noatime,nodiratime 0 0
UUID=195967b5-a1a0-43bb-9a33-9cf7a36fdcb6 /data/11 xfs noatime,nodiratime 0 0
UUID=1e5ace85-f93c-46f7-bf65-353f774cfeaa /data/12 xfs noatime,nodiratime 0 0
UUID=720836d8-ddd6-406d-a33f-f1b92f9b40d5 /data/13 xfs noatime,nodiratime 0 0
UUID=8f05d1dd-94d1-4376-9409-d5683ad4c225 /data/14 xfs noatime,nodiratime 0 0
UUID=5e0413d1-0b82-4ec1-b3f9-bb072db39071 /data/15 xfs noatime,nodiratime 0 0
UUID=e6b984c0-6d85-4df2-9a7d-cc1c87238c49 /data/16 xfs noatime,nodiratime 0 0
UUID=db81b056-587e-47a6-844e-2d952278324b /data/17 xfs noatime,nodiratime 0 0
UUID=45b4cf68-6f10-4dc7-8128-c2006e7aba5d /data/18 xfs noatime,nodiratime 0 0
UUID=a8e591e9-33c8-478a-b580-aeac9ad4cf44 /data/19 xfs noatime,nodiratime 0 0
UUID=53f2f06e-71e9-4796-86a3-2212c0f652ea /data/20 xfs noatime,nodiratime 0 0
UUID=5e1bd2f3-8ccc-4ba1-a0f7-bb55c8246d72 /data/21 xfs noatime,nodiratime 0 0
UUID=df4bdd58-e8d2-4bdb-8255-b9c7fcfe8999 /data/22 xfs noatime,nodiratime 0 0
UUID=701f3516-03bc-461b-930c-ab34d0b417d7 /data/23 xfs noatime,nodiratime 0 0
UUID=53e4023f-583a-4219-bfd2-1a94e15f34ef /data/24 xfs noatime,nodiratime 0 0
Recommission the Worker Node
Once the disk(s) has been suitably replaced, it’s time to use the recommissioning process to gracefully reintroduce the worker node back into the cluster. This can be found by navigating to the host within Cloudera Manager and using “Actions > End Maintenance”
After the node has completed its recommission cycle, follow the guidance in the next sections to perform local disk rebalancing where appropriate.
Address local disk HDFS Balancing
Most clusters utilize HDFS. This service has a local disk balancer that you can make use of. Please find some helpful guidance within the following - Rebalance your HDFS Disks (single node)
Address local disk Kudu Balancing
If you are running Kudu within your cluster, you will need to rebalance the existing Kudu data on the local disks of the worker node. Please find some helpful guidance within the following - Rebalance your Kudu Disks (single node)
... View more
Labels: