About PabitraDas

PabitraDas · ‎03-30-2021

Hello @wert_1311 You can balance the disk usage of the DN storage volumes using "intra-disk balancer" feature available in CDH starting release 5.8.2 and later. You need to enable the feature by adding the "dfs.disk.balancer.enabled" configuration to HDFS via the HDFS safety valve snippet in Cloudera Manager following the blog here - https://blog.cloudera.com/how-to-use-the-new-hdfs-intra-datanode-disk-balancer-in-apache-hadoop/ A typical disk-balancer task involves three steps (implemented via the "hdfs diskbalancer" command): plan, execute, and query. The steps are as follows: 1. Enable intra disk balancer config in HDFS 2. "Plan" the intra disk balancer 3. Execute the created plan 4. Query the running/executed plan 5. Verify the balancer report For more info refer the apache doc here - https://hadoop.apache.org/docs/r3.0.0/hadoop-project-dist/hadoop-hdfs/HDFSDiskbalancer.html Thanks and Regards, Pabitra Das

PabitraDas · ‎03-30-2021

Hello @Amn_468 Please note that, you get the block count alert after hitting the warning/critical threshold value set in HDFS Configuration. It is a Monitoring alert and doesn't impact any HDFS operations as such. You may increase the monitoring threshold value in CM ( CM > HDFS > Configurations > DataNode Block Count Thresholds) However, CM monitors the block counts on the DataNodes is to ensure you are not writing many small files into HDFS. Increase in block counts on DNs is an early warning of small files accumulation in HDFS. The simplest way to check if you are hitting small files issue is to check the average block size of HDFS files. Fsck should show the average block size. If it's too low a value (eg ~ 1MB), you might be hitting the problems of small files which would be worth looking at, otherwise, there is no need to review the number of blocks. [..] $ hdfs fsck / .. ... Total blocks (validated): 2899 (avg. block size 11475601 B) <<<<< [..] Similarly, you can get the average file size in HDFS by running a script as follows: $hdfs dfs -ls -R / | grep -v "^d" |awk '{OFMT="%f"; sum+=$5} END {print "AVG File Size =",sum/NR/1024/1024 " MB"}' The file size reported by Reports Manager under "HDFS Reports" in Cloudera Manager can be different as the report is extracted from >1hour old FSImage (not a latest one). Hope this helps. Any question further, feel free to update the thread. Else mark solved. Regards, Pabitra Das

PabitraDas · ‎03-23-2021

Hello @meenzoon It seems the Cloudera Manager Service itself is not running. Could you please check the CM Server (#service cloudera-scm-server status) status on the host? If not running, please restart the CM service (cloudera-scm-server) and then check the role status. If it still reports unknown health for the management host, then check the health alert and share the message here. In case of CM Server startup failure, please check the CM Server log on the host. The CM Server log would provide an insight to the cause of failure.

PabitraDas · ‎03-22-2021

Hello @pauljoshiva You need to add the new nodes with a new config group. One set of DNs in default config group (where the storage directories are laid from /hdp/hdfs01 - /hdp/hdfs09) and anotehr set of DNs in new config group (with directories /hdp/hdfs01, /hdp/hdfs02, /hdp/hdfs03). That way you can have all DNs added to cluster with 2 separate config groups.

PabitraDas · ‎03-16-2021

Hello @Monds you can recover the lease on the file, running below command: #hdfs debug recoverLease -path <path-of-the-file> [-retries <retry-times>] This command will ask the NameNode to try to recover the lease for the file (successfully close the file if there are still healthy replicas) Ref: https://blog.cloudera.com/understanding-hdfs-recovery-processes-part-1/

PabitraDas · ‎03-15-2021

Hello @Babar Thank you for resolving the issue and marking the thread as solved. Glad to know that you identify the problem and resolved it. Please note HDFS-14383 (Compute datanode load based on StoragePolicy) has been included in the recent release of CDP 7.1.5 and 7.2.x

PabitraDas · ‎03-13-2021

Yes, it is applicable for CDP 7.x release @novice_tester

PabitraDas · ‎03-12-2021

Hello @novice_tester Cloudera validates and tests against all the latest browsers like Google Chrome, Firefox, Safari and MS Edge. Please refer page on supported browser here - https://my.cloudera.com/supported-browsers.html and - https://docs.cloudera.com/management-console/cloud/requirements-aws/topics/mc-supported-browsers.html

PabitraDas · ‎03-12-2021

Hello @Babar, It seems the DN disk configuration (dfs.datanode.data.dir) is not appropriate. Could you please configure the disks as cited here - https://docs.cloudera.com/documentation/enterprise/6/6.3/topics/admin_heterogeneous_storage_oview.html#admin_heterogeneous_storage_config If your SSD disk is mounted as below: /dn_vg1/vol1_ssd -----> mounted as ----> /data/1 /dn_vg2/vol2_ssd -----> mounted as -----> /data/2 /dn_vg3/vol3_ssd -----> mounted as -----> /data/3 and scsi/sata disks are mounted as below: /dn_vg1/vol1_disk -----> mounted as ----> /data/4 /dn_vg2/vol2_disk ------> mounted as -----> /data/5 Then configure the DN data directories (dfs.datanode.data.dir) as follows: - dn-1: "[SSD]/data/1/dfs/dn" - dn-2: "[SSD]/data/1/dfs/dn,[SSD]/data/2/dfs/dn" - dn-3: "[DISK]/data/4/dfs/dn,[SSD]/data/3/dfs/dn,[DISK]/data/5/dfs/dn" You need to create the /dfs/dn directories with ownership of hdfs:hadoop and permission of 700 on each mount point so that the volume can be used to store the blocks. Please check the mount points and reconfigure the data directories.

PabitraDas · ‎03-05-2021

Hello @uxadmin, Thank you for asking a follow-up question. Please note that, NameNode is responsible for keeping metadata of the files/blocks written into HDFS. Hence an increase in block count means NameNode has to keep more metadata information and may need more heap memory. As a thumb rule, we suggest 1GB of heap memory allocation for NameNode for every1 Million blocks in HDFS. Similarly, every 1Million block in DN requires ~1GB heap memory to operate smoothly. As I said earlier, there is no hard limit to store blocks in DN but having too many blocks is an indication of small file accumulation in HDFS. You need to check the average block size in HDFS to understand if you are hitting small file issue. Fsck should show the average block size. If it's too low a value (eg ~ 1MB), you might be hitting the problems of small files which would be worth looking at, otherwise, there is no need to review the number of blocks. [..] $ hdfs fsck / .. ... Total blocks (validated): 2899 (avg. block size 11475601 B) <<<<< [..] In short, there is no limit for block count threshold for DN but an increase in block counts of DN is an early indicator of small files issue in cluster. Of course, more small files mean more heap memory requirement for both NN and DN. In a perfect world where all files are created with 128MiB block size (default block size of HDFS), a 1 TB filesystem on DN can hold 8192 blocks (1024*1024/128). By that calculation, a DN with 23 TB can hold 188,416 blocks, but realistically we don't have all files created with 128MiB block and not all files occupy an entire block. So in a normal CDH cluster installation, we keep a minimal value of 500000 as a warning threshold for DN block counts. However, depending upon your use case and file write in HDFS, the block count may hit over a period of time. However, a value for the block count threshold can be determined by the data node disk size used for storing blocks. Say you have allocated 10 numbers of 2TB disks (starting /data/1/dfs/dn to /data/10/dfs/dn) for block write in DataNode, which means 20TB is available to write blocks and if you are writing files with average block size of 10MB, it means you can accommodate maximum 2,097,152 blocks (20TB/10MB) on that DN. So a threshold value of 1M (1000000) is a good value to be set as the WArning threshold. Hope this helps. Any question further, feel free to revert back. Cheers! In case your question has been answered, make sure to mark the answer as the accepted solution. If you find a reply useful, say thanks by clicking on the thumbs up button.

Online	Offline
Last Visited	‎03-08-2025 09:18 AM

Member Since	‎03-22-2017 02:53 AM
Last Visited	‎03-08-2025 09:18 AM
Posts	63
Kudos received	18

Cloudera Community

Re: TLSv1.3 Support for Zookeeper 3.8.0

Re: All Hdfs file names older than N days

Re: Reduce Non-HDFS Space

Re: HDFS Reports

Re: Cloudera management service any service not ru...

Re: Intra Disk Balancing

Re: HDFS Reports

Re: Cloudera management service any service not ru...

Re: How to partition different number of disks for...

Re: Failed to CREATE_FILE because the file lease i...

Re: HDFS attempting to use invalid datanodes when ...

Re: What browsers are supported by CDP 7.1.4?

Re: What browsers are supported by CDP 7.1.4?

Re: HDFS attempting to use invalid datanodes when ...

Re: Block Count threshold configuration