Member since
03-22-2017
63
Posts
18
Kudos Received
12
Solutions
My Accepted Solutions
| Title | Views | Posted |
|---|---|---|
| 4194 | 07-08-2023 03:09 AM | |
| 7130 | 10-21-2021 12:49 AM | |
| 3356 | 04-01-2021 05:31 AM | |
| 4639 | 03-30-2021 04:23 AM | |
| 7565 | 03-23-2021 04:30 AM |
03-30-2021
04:23 AM
1 Kudo
Hello @Amn_468 Please note that, you get the block count alert after hitting the warning/critical threshold value set in HDFS Configuration. It is a Monitoring alert and doesn't impact any HDFS operations as such. You may increase the monitoring threshold value in CM ( CM > HDFS > Configurations > DataNode Block Count Thresholds) However, CM monitors the block counts on the DataNodes is to ensure you are not writing many small files into HDFS. Increase in block counts on DNs is an early warning of small files accumulation in HDFS. The simplest way to check if you are hitting small files issue is to check the average block size of HDFS files. Fsck should show the average block size. If it's too low a value (eg ~ 1MB), you might be hitting the problems of small files which would be worth looking at, otherwise, there is no need to review the number of blocks. [..] $ hdfs fsck / .. ... Total blocks (validated): 2899 (avg. block size 11475601 B) <<<<< [..] Similarly, you can get the average file size in HDFS by running a script as follows: $hdfs dfs -ls -R / | grep -v "^d" |awk '{OFMT="%f"; sum+=$5} END {print "AVG File Size =",sum/NR/1024/1024 " MB"}' The file size reported by Reports Manager under "HDFS Reports" in Cloudera Manager can be different as the report is extracted from >1hour old FSImage (not a latest one). Hope this helps. Any question further, feel free to update the thread. Else mark solved. Regards, Pabitra Das
... View more
03-23-2021
04:30 AM
1 Kudo
Hello @meenzoon It seems the Cloudera Manager Service itself is not running. Could you please check the CM Server (#service cloudera-scm-server status) status on the host? If not running, please restart the CM service (cloudera-scm-server) and then check the role status. If it still reports unknown health for the management host, then check the health alert and share the message here. In case of CM Server startup failure, please check the CM Server log on the host. The CM Server log would provide an insight to the cause of failure.
... View more
03-15-2021
04:40 AM
Hello @Babar Thank you for resolving the issue and marking the thread as solved. Glad to know that you identify the problem and resolved it. Please note HDFS-14383 (Compute datanode load based on StoragePolicy) has been included in the recent release of CDP 7.1.5 and 7.2.x
... View more
03-13-2021
04:53 AM
1 Kudo
Yes, it is applicable for CDP 7.x release @novice_tester
... View more
03-12-2021
11:00 AM
2 Kudos
Hello @novice_tester Cloudera validates and tests against all the latest browsers like Google Chrome, Firefox, Safari and MS Edge. Please refer page on supported browser here - https://my.cloudera.com/supported-browsers.html and - https://docs.cloudera.com/management-console/cloud/requirements-aws/topics/mc-supported-browsers.html
... View more
03-12-2021
10:47 AM
Hello @Babar, It seems the DN disk configuration (dfs.datanode.data.dir) is not appropriate. Could you please configure the disks as cited here - https://docs.cloudera.com/documentation/enterprise/6/6.3/topics/admin_heterogeneous_storage_oview.html#admin_heterogeneous_storage_config If your SSD disk is mounted as below: /dn_vg1/vol1_ssd -----> mounted as ----> /data/1 /dn_vg2/vol2_ssd -----> mounted as -----> /data/2 /dn_vg3/vol3_ssd -----> mounted as -----> /data/3 and scsi/sata disks are mounted as below: /dn_vg1/vol1_disk -----> mounted as ----> /data/4 /dn_vg2/vol2_disk ------> mounted as -----> /data/5 Then configure the DN data directories (dfs.datanode.data.dir) as follows: - dn-1: "[SSD]/data/1/dfs/dn" - dn-2: "[SSD]/data/1/dfs/dn,[SSD]/data/2/dfs/dn" - dn-3: "[DISK]/data/4/dfs/dn,[SSD]/data/3/dfs/dn,[DISK]/data/5/dfs/dn" You need to create the /dfs/dn directories with ownership of hdfs:hadoop and permission of 700 on each mount point so that the volume can be used to store the blocks. Please check the mount points and reconfigure the data directories.
... View more
03-05-2021
04:33 AM
2 Kudos
Hello @uxadmin, Thank you for asking a follow-up question. Please note that, NameNode is responsible for keeping metadata of the files/blocks written into HDFS. Hence an increase in block count means NameNode has to keep more metadata information and may need more heap memory. As a thumb rule, we suggest 1GB of heap memory allocation for NameNode for every1 Million blocks in HDFS. Similarly, every 1Million block in DN requires ~1GB heap memory to operate smoothly. As I said earlier, there is no hard limit to store blocks in DN but having too many blocks is an indication of small file accumulation in HDFS. You need to check the average block size in HDFS to understand if you are hitting small file issue. Fsck should show the average block size. If it's too low a value (eg ~ 1MB), you might be hitting the problems of small files which would be worth looking at, otherwise, there is no need to review the number of blocks. [..] $ hdfs fsck / .. ... Total blocks (validated): 2899 (avg. block size 11475601 B) <<<<< [..] In short, there is no limit for block count threshold for DN but an increase in block counts of DN is an early indicator of small files issue in cluster. Of course, more small files mean more heap memory requirement for both NN and DN. In a perfect world where all files are created with 128MiB block size (default block size of HDFS), a 1 TB filesystem on DN can hold 8192 blocks (1024*1024/128). By that calculation, a DN with 23 TB can hold 188,416 blocks, but realistically we don't have all files created with 128MiB block and not all files occupy an entire block. So in a normal CDH cluster installation, we keep a minimal value of 500000 as a warning threshold for DN block counts. However, depending upon your use case and file write in HDFS, the block count may hit over a period of time. However, a value for the block count threshold can be determined by the data node disk size used for storing blocks. Say you have allocated 10 numbers of 2TB disks (starting /data/1/dfs/dn to /data/10/dfs/dn) for block write in DataNode, which means 20TB is available to write blocks and if you are writing files with average block size of 10MB, it means you can accommodate maximum 2,097,152 blocks (20TB/10MB) on that DN. So a threshold value of 1M (1000000) is a good value to be set as the WArning threshold. Hope this helps. Any question further, feel free to revert back. Cheers! In case your question has been answered, make sure to mark the answer as the accepted solution. If you find a reply useful, say thanks by clicking on the thumbs up button.
... View more
03-04-2021
10:30 AM
Hello @samglo , Please note Solr CDCR is not supported in CDP yet. Refer to Cloudera blog on Solr CDCR (Cross Data Center Replication) support: - https://blog.cloudera.com/backup-and-disaster-recovery-for-cloudera-search/ Solr CDCR The future holds the promise of a Solr to Solr replication feature as well, a.k.a. CDCR. This is still maturing upstream and will need some time to further progress before it can be considered for mission critical production environments. Once it matures we will evaluate its value in addition to all our existing options of recovery for Search. The above solutions, presented in this blog, are production-proven and provides a very good coverage along with flexibility for today’s workloads. However, you can refer apache document on Solr CDCR below for some information about setup: - https://solr.apache.org/guide/6_6/cross-data-center-replication-cdcr.html or Cloudera Community article - https://community.cloudera.com/t5/Community-Articles/How-to-setup-cross-data-center-replication-in-SolrCloud-6/ta-p/247945
... View more
03-04-2021
07:25 AM
Hello @uxadmin please note that block count threshold configuration is intended for DataNodes only. This is a DataNode health test that checks for whether the DataNode has too many blocks. It's because having too many blocks on a DataNode may affect the DataNode's performance. There's no hard limit on the # of blocks writable to a DN, as block size is merely a logical concept, not a physical layout. However, the block count alert serves to indicate an early warning to a growing number of small files issue. While your DN can handle a lot of blocks in general, going too high will cause performance issues. Your processing speeds may get lower if you keep a lot of tiny files on HDFS (depends on your use-case of course) so would be worth looking into. You can find the block count threshold in HDFS config by navigating to CM > HDFS > Configuration > DataNode Block Count Thresholds When the block counts on each DN goes above the threshold, CM triggers an alert. So you need to adjust the threshold value based on the block counts on each DN. You can determine the block counts on each DN, navigating to CM > HDFS > WebUI > Active NN > DataNodes tab > Block counts column under Datanode section. Hope this helps.
... View more
03-01-2021
09:06 AM
1 Kudo
Hello @muslihuddin , Please note, while enabling HA, CM puts all 3 Journal Nodes into a single group call "Default Group" by default assuming you are going to use the same config value for the 3 JN directories. Since you are using /app/jn for one node and /data/jn for the other 2 JN nodes, it created two separate JN config groups. However, to prevent the CM alert, you can mention /data/jn in the JN default group config so that 2 JNs will be part of the Default config group rather than a separate one, and the 3rd JN will continue to operate in a separate config group till you use /data/jn directory as its edits directory. Just in case you need to change the JN directory on any JN refer teh steps here - https://docs.cloudera.com/documentation/enterprise/latest/topics/cm_mc_jn.html
... View more
- « Previous
-
- 1
- 2
- Next »