Support Questions

Find answers, ask questions, and share your expertise

HDFS attempting to use invalid datanodes when StoragePolicies are configured

avatar
New Contributor

Hi,
I have a test cluster running HDP3.15 and Ambari2.7.5 where, using Ambari config groups, I've confgured 3 datanodes with the following values of "dfs.datanode.data.dir":
- dn-1: "[SSD]file:///dn_vg1/vol1_ssd"

- dn-2: "[SSD]file:///dn_vg1/vol1_ssd,[SSD]file:///dn_vg2/vol2_ssd"

- dn-3: "[DISK]file:///dn_vg1/vol1_disk,[SSD]file:///dn_vg3/vol3_ssd,[DISK]file:///dn_vg2/vol2_disk"

 

"dfs.replication" is set to 1 and the storage policies are all default (HOT).

 

Most of my attempts to "hdfs dfs -put" a file into HDFS fail with:

 

2021-03-11 14:58:33,315 WARN  blockmanagement.BlockPlacementPolicy (BlockPlacementPolicyDefault.java:chooseTarget(432)) - Failed to place enough replicas, still in need of 1 to reach 1 (unavailableStorages=[], storagePolicy=BlockStoragePolicy{HOT:7, storageTypes=[DISK], creationFallbacks=[], replicationFallbacks=[ARCHIVE]}, newBlock=true) For more information, please enable DEBUG log level on org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy and org.apache.hadoop.net.NetworkTopology

 

 

So it appears that HDFS is attempts to use an invalid datanode (datanodes with no "DISK" storage type) before realising it's missing a "DISK" storage type.

 

Running "hdfs storagepolicies -setStoragePolicy -path / -policy All_SSD" makes it so that all "hdfs dfs -put"s go through without issue.

 

I've tried running "hadoop daemonlog -setlevel <namenode>:<port> org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy DEBUG" to get debug logs but that returns:

 

Connecting to http://<namenode>:<port>/logLevel?log=org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy&level=DEBUG
Exception in thread "main" java.io.IOException: Server returned HTTP response code: 403 for URL: http://<namenode>:<port>/logLevel?log=org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicy&level=DEBUG
	at sun.net.www.protocol.http.HttpURLConnection.getInputStream0(HttpURLConnection.java:1900)
	at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1498)
	at org.apache.hadoop.log.LogLevel$CLI.process(LogLevel.java:297)
	at org.apache.hadoop.log.LogLevel$CLI.doSetLevel(LogLevel.java:244)
	at org.apache.hadoop.log.LogLevel$CLI.sendLogLevelRequest(LogLevel.java:130)
	at org.apache.hadoop.log.LogLevel$CLI.run(LogLevel.java:110)
	at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
	at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:90)
	at org.apache.hadoop.log.LogLevel.main(LogLevel.java:72)

 

 

How can I ensure HDFS selects the correct datanodes?

Any help would be appreciated.

1 ACCEPTED SOLUTION

avatar
New Contributor

I hit the same error after applying your suggested changes. However I think I've "fixed" it.


I managed to get "hadoop daemonlog" commands working by adding group "hadoop" to "dfs.permissions.superusergroup" and "dfs.cluster.administrators". Turns out I had the same problem as described here, https://www.gresearch.co.uk/article/hdfs-troubleshooting-why-does-a-tier-get-blacklisted/

 

For now I've set "dfs.namenode.replication.considerLoad.factor" to 3 which solved the problem. Eventually a proper fix for the block placement policy to account for Storage policies will come with Hadoop3.4 whenever that releases, https://issues.apache.org/jira/browse/HDFS-14383

View solution in original post

3 REPLIES 3

avatar
Expert Contributor

Hello @Babar, It seems the  DN disk configuration (dfs.datanode.data.dir) is not appropriate. Could you please configure the disks as cited here - https://docs.cloudera.com/documentation/enterprise/6/6.3/topics/admin_heterogeneous_storage_oview.ht...

 

If your SSD disk is mounted as below:

/dn_vg1/vol1_ssd -----> mounted as ----> /data/1

/dn_vg2/vol2_ssd -----> mounted as -----> /data/2

/dn_vg3/vol3_ssd -----> mounted as -----> /data/3

and scsi/sata disks are mounted as below: 

/dn_vg1/vol1_disk  -----> mounted as ----> /data/4

/dn_vg2/vol2_disk ------> mounted as -----> /data/5

 

Then configure the DN data directories (dfs.datanode.data.dir) as follows:

- dn-1: "[SSD]/data/1/dfs/dn"

- dn-2: "[SSD]/data/1/dfs/dn,[SSD]/data/2/dfs/dn"

- dn-3: "[DISK]/data/4/dfs/dn,[SSD]/data/3/dfs/dn,[DISK]/data/5/dfs/dn"

You need to create the /dfs/dn directories with ownership of hdfs:hadoop and permission of 700 on each mount point so that the volume can be used to store the blocks. 

 

Please check the mount points and reconfigure the data directories. 

avatar
New Contributor

I hit the same error after applying your suggested changes. However I think I've "fixed" it.


I managed to get "hadoop daemonlog" commands working by adding group "hadoop" to "dfs.permissions.superusergroup" and "dfs.cluster.administrators". Turns out I had the same problem as described here, https://www.gresearch.co.uk/article/hdfs-troubleshooting-why-does-a-tier-get-blacklisted/

 

For now I've set "dfs.namenode.replication.considerLoad.factor" to 3 which solved the problem. Eventually a proper fix for the block placement policy to account for Storage policies will come with Hadoop3.4 whenever that releases, https://issues.apache.org/jira/browse/HDFS-14383

avatar
Expert Contributor

Hello @Babar Thank you for resolving the issue and marking the thread as solved.  

Glad to know that you identify the problem and resolved it. Please note HDFS-14383 (Compute datanode load based on StoragePolicy) has been included in the recent release of CDP 7.1.5 and 7.2.x