Hi,
I have a test cluster running HDP3.15 and Ambari2.7.5 where, using Ambari config groups, I've confgured 3 datanodes with the following values of "dfs.datanode.data.dir":
- dn-1: "[SSD]file:///dn_vg1/vol1_ssd"
- dn-2: "[SSD]file:///dn_vg1/vol1_ssd,[SSD]file:///dn_vg2/vol2_ssd"
- dn-3: "[DISK]file:///dn_vg1/vol1_disk,[SSD]file:///dn_vg3/vol3_ssd,[DISK]file:///dn_vg2/vol2_disk"
"dfs.replication" is set to 1 and the storage policies are all default (HOT).
Most of my attempts to "hdfs dfs -put" a file into HDFS fail with:
2021-03-11 14:58:33,315 WARN blockmanagement.BlockPlacementPolicy (BlockPlacementPolicyDefault.java:chooseTarget(432)) - Failed to place enough replicas, still in need of 1 to reach 1 (unavailableStorages=[], storagePolicy=BlockStoragePolicy{HOT:7, storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}, newBlock=true) For more information, please enable DEBUG log level on org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy and org.apache.hadoop.net.NetworkTopology
So it appears that HDFS is attempts to use an invalid datanode (datanodes with no "DISK" storage type) before realising it's missing a "DISK" storage type.
Running "hdfs storagepolicies -setStoragePolicy -path / -policy All_SSD" makes it so that all "hdfs dfs -put"s go through without issue.
I've tried running "hadoop daemonlog -setlevel <namenode>:<port> org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy DEBUG" to get debug logs but that returns:
Connecting to http://<namenode>:<port>/logLevel?log=org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy&level=DEBUG
Exception in thread "main" java.io.IOException: Server returned HTTP response code: 403 for URL: http://<namenode>:<port>/logLevel?log=org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy&level=DEBUG
at sun.net.www.protocol.http.HttpURLConnection.getInputStream0(HttpURLConnection.java:1900)
at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1498)
at org.apache.hadoop.log.LogLevel$CLI.process(LogLevel.java:297)
at org.apache.hadoop.log.LogLevel$CLI.doSetLevel(LogLevel.java:244)
at org.apache.hadoop.log.LogLevel$CLI.sendLogLevelRequest(LogLevel.java:130)
at org.apache.hadoop.log.LogLevel$CLI.run(LogLevel.java:110)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:90)
at org.apache.hadoop.log.LogLevel.main(LogLevel.java:72)
How can I ensure HDFS selects the correct datanodes?
Any help would be appreciated.
Created 03-13-2021 03:44 PM
I hit the same error after applying your suggested changes. However I think I've "fixed" it.
I managed to get "hadoop daemonlog" commands working by adding group "hadoop" to "dfs.permissions.superusergroup" and "dfs.cluster.administrators". Turns out I had the same problem as described here, https://www.gresearch.co.uk/article/hdfs-troubleshooting-why-does-a-tier-get-blacklisted/
For now I've set "dfs.namenode.replication.considerLoad.factor" to 3 which solved the problem. Eventually a proper fix for the block placement policy to account for Storage policies will come with Hadoop3.4 whenever that releases, https://issues.apache.org/jira/browse/HDFS-14383
Created 03-12-2021 10:47 AM
Hello @Babar, It seems the DN disk configuration (dfs.datanode.data.dir) is not appropriate. Could you please configure the disks as cited here - https://docs.cloudera.com/documentation/enterprise/6/6.3/topics/admin_heterogeneous_storage_oview.ht...
If your SSD disk is mounted as below:
/dn_vg1/vol1_ssd -----> mounted as ----> /data/1
/dn_vg2/vol2_ssd -----> mounted as -----> /data/2
/dn_vg3/vol3_ssd -----> mounted as -----> /data/3
and scsi/sata disks are mounted as below:
/dn_vg1/vol1_disk -----> mounted as ----> /data/4
/dn_vg2/vol2_disk ------> mounted as -----> /data/5
Then configure the DN data directories (dfs.datanode.data.dir) as follows:
- dn-1: "[SSD]/data/1/dfs/dn"
- dn-2: "[SSD]/data/1/dfs/dn,[SSD]/data/2/dfs/dn"
- dn-3: "[DISK]/data/4/dfs/dn,[SSD]/data/3/dfs/dn,[DISK]/data/5/dfs/dn"
You need to create the /dfs/dn directories with ownership of hdfs:hadoop and permission of 700 on each mount point so that the volume can be used to store the blocks.
Please check the mount points and reconfigure the data directories.
Created 03-13-2021 03:44 PM
I hit the same error after applying your suggested changes. However I think I've "fixed" it.
I managed to get "hadoop daemonlog" commands working by adding group "hadoop" to "dfs.permissions.superusergroup" and "dfs.cluster.administrators". Turns out I had the same problem as described here, https://www.gresearch.co.uk/article/hdfs-troubleshooting-why-does-a-tier-get-blacklisted/
For now I've set "dfs.namenode.replication.considerLoad.factor" to 3 which solved the problem. Eventually a proper fix for the block placement policy to account for Storage policies will come with Hadoop3.4 whenever that releases, https://issues.apache.org/jira/browse/HDFS-14383
Created 03-15-2021 04:40 AM
Hello @Babar Thank you for resolving the issue and marking the thread as solved.
Glad to know that you identify the problem and resolved it. Please note HDFS-14383 (Compute datanode load based on StoragePolicy) has been included in the recent release of CDP 7.1.5 and 7.2.x