Cloudera Data Analytics (CDA) Articles

Labels (1)
avatar
Cloudera Employee

Summary

When you have experienced a disk failure on a worker node and have had the disk replaced, you’ll need to ensure that the disk is suitably rebalanced within the Kudu Service at the local level.

Investigation

HDFS Disk Balancer - Explained

This is an area that already has a great Blog written around it:

How-to: Use the New HDFS Intra-DataNode Disk Balancer in Apache Hadoop

 

Please read through the blog and follow the guidance to verify that you have already set up the HDFS service to be able to perform this necessary action.

Resolution

HDFS Disk Balancer - Execution

Let’s go through the process of performing an HDFS Intra-DataNode Disk Rebalancing process.

Obtain a local HDFS DataNode Kerberos Ticket

cd /var/run/cloudera-scm-agent/process/`ls -larth /var/run/cloudera-scm-agent/process | grep -i hdfs-DATANODE | tail -1 | awk '{print $9}'`

kinit -kt hdfs.keytab hdfs/`hostname -f`@<ClusterDomain>

Create a Disk Balancer Plan

hdfs diskbalancer -plan `hostname -f` -bandwidth 100 -thresholdPercentage 5

 

Example of a successful creation of a disk balancer plan:

hdfs diskbalancer -plan `hostname -f` -bandwidth 100 -thresholdPercentage 5

INFO balancer.NameNodeConnector: getBlocks calls for hdfs://nameservice1 will be rate-limited to 20 per second

INFO balancer.KeyManager: Block token params received from NN: update interval=10hrs, 0sec, token lifetime=10hrs, 0sec

INFO block.BlockTokenSecretManager: Setting block keys

INFO balancer.KeyManager: Update block keys every 2hrs, 30mins, 0sec

INFO planner.GreedyPlanner: Starting plan for Node : <Worker-Node-FQDN>:9867

INFO planner.GreedyPlanner: Disk Volume set 76c137f0-5d0c-4de3-b166-5c0ac29b77d1 Type : DISK plan completed.

INFO planner.GreedyPlanner: Compute Plan for Node : <Worker-Node-FQDN>:9867 took 46 ms

INFO command.Command: Writing plan to:

INFO command.Command: /system/diskbalancer/2023-Mar-13-02-50-35/<Worker-Node-FQDN>.plan.json

Writing plan to:

/system/diskbalancer/2023-Mar-13-02-50-35/<Worker-Node-FQDN>.plan.json

Execute a Disk Balancer Plan

hdfs diskbalancer -execute /system/diskbalancer/2023-Mar-13-02-50-35/<Worker-Node-FQDN>.plan.json

 

Example of a successful execution of a disk balancer plan:

hdfs diskbalancer -execute /system/diskbalancer/2023-Mar-13-02-50-35/<Worker-Node-FQDN>.plan.json

INFO command.Command: Executing "execute plan" command

Query a running Disk Balancer Plan

hdfs diskbalancer -query `hostname -f`

 

Example of querying a running disk balancer plan:

hdfs diskbalancer -query `hostname -f`

INFO command.Command: Executing "query plan" command.

Plan File: /system/diskbalancer/2023-Mar-13-02-50-35/<Worker-Node-FQDN>.plan.json

Plan ID: 9b0d03edee9d4285cfea5fe13247d8e23cb4557d

Result: PLAN_UNDER_PROGRESS

Cancel a running Disk Balancer Plan (if required)

hdfs diskbalancer -cancel /system/diskbalancer/2023-Mar-13-02-50-35/<Worker-Node-FQDN>.plan.json

 

Example of cancelling a running disk balancer plan:

hdfs diskbalancer -cancel /system/diskbalancer/2023-Mar-13-02-50-35/<Worker-Node-FQDN>.plan.json

INFO command.Command: Executing "Cancel plan" command.

HDFS Disk Balancer - No Rebalancing Required Example

The following example is what you will see if you attempt to run the HDFS local disk balancer on a node that doesn’t require any rebalancing to occur:

hdfs diskbalancer -plan `hostname -f` -bandwidth 100 -thresholdPercentage 5

INFO balancer.NameNodeConnector: getBlocks calls for hdfs://nameservice1 will be rate-limited to 20 per second

INFO balancer.KeyManager: Block token params received from NN: update interval=10hrs, 0sec, token lifetime=10hrs, 0sec

INFO block.BlockTokenSecretManager: Setting block keys

INFO balancer.KeyManager: Update block keys every 2hrs, 30mins, 0sec

INFO planner.GreedyPlanner: Starting plan for Node : <Worker-Node-FQDN>:9867

INFO planner.GreedyPlanner: Compute Plan for Node : <Worker-Node-FQDN>:9867 took 36 ms

INFO command.Command: No plan generated. DiskBalancing not needed for node: <Worker-Node-FQDN> threshold used: 5.0

No plan generated. DiskBalancing not needed for node: <Worker-Node-FQDN> threshold used: 5.0

 

1,108 Views
0 Kudos