Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

hdfs dfsadmin -reconfig datanode <datanode>:50020 start hung

hdfs dfsadmin -reconfig datanode <datanode>:50020 start hung

Contributor

The datanode refresh command hdfs dfsadmin -reconfig datanode <datanode>:50020 start hung too more than 20 hours. It should just take a few seconds.

Reconfiguring status for DataNode[datanod]: started at Thu Nov 10 18:34:59 PST 2016 and is still running.

 

Here is the status command output

hdfs dfsadmin -reconfig datanode <datanode>:50020 status

Reconfiguring status for DataNode[datanod]: started at Thu Nov 10 18:34:59 PST 2016 and is still running.

 

 

There is no such command like hdfs dfsadmin -reconfig datanode <datanode>:50020 stop

 which I can abort the command and retry. Does anyone have any workarounds?

 

Thanks!

5 REPLIES 5

Re: hdfs dfsadmin -reconfig datanode <datanode>:50020 start hung

Rising Star

I am not aware of incidences where DataNode reconfiguration would hang. In addition, abrutly interrupt reconfiguration might leave the DataNode in an inconsistent state, so I wouldn't think adding a command to stop it is a good idea.

 

You might want to take a look at the DataNode log to see if something went wrong (paused by garbage collection, maybe?) or maybe just restart the DataNode.

Re: hdfs dfsadmin -reconfig datanode <datanode>:50020 start hung

Expert Contributor
Agreed, probably the next step is to look at the logs for potential root cause, the fix will likely be restart the datanode (essentially reloading the config on startup)

Re: hdfs dfsadmin -reconfig datanode <datanode>:50020 start hung

Contributor

How long does it take for the reconfig command to run?

 

This is what I saw from datanode log when removing /hadoop2 patition.

 

2016-12-07 16:42:07,935 INFO org.apache.hadoop.conf.ReconfigurableBase: Starting reconfiguration task.
2016-12-07 16:42:08,047 INFO org.apache.hadoop.conf.ReconfigurableBase: Change property: dfs.datanode.data.dir from "file:///hadoop1/data,file:///hadoop2/data,file:///hadoop3/data,file:///hadoop4/data,file:///hadoop5/data,file:///hadoop6/data,file:///hadoop7/data,file:///hadoop8/data,file:///hadoop9/data,file:///hadoop10/data,file:///hadoop11/data,file:///hadoop12/data" to "file:///hadoop1/data,file:///hadoop3/data,file:///hadoop4/data,file:///hadoop5/data,file:///hadoop6/data,file:///hadoop7/data,file:///hadoop8/data,file:///hadoop9/data,file:///hadoop10/data,file:///hadoop11/data,file:///hadoop12/data".
2016-12-07 16:42:08,047 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Reconfiguring dfs.datanode.data.dir to file:///hadoop1/data,file:///hadoop3/data,file:///hadoop4/data,file:///hadoop5/data,file:///hadoop6/data,file:///hadoop7/data,file:///hadoop8/data,file:///hadoop9/data,file:///hadoop10/data,file:///hadoop11/data,file:///hadoop12/data
2016-12-07 16:42:08,114 WARN org.apache.hadoop.hdfs.server.common.Util: Path /hadoop2/data should be specified as a URI in configuration files. Please update hdfs configuration.
2016-12-07 16:42:08,115 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Deactivating volumes (clear failure=true): /hadoop2/data
2016-12-07 16:42:08,115 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Removing /hadoop2/data from FsDataset.
2016-12-07 16:42:08,115 INFO org.apache.hadoop.hdfs.server.datanode.BlockScanner: Removing scanner for volume /hadoop2/data (StorageID DS-8d8c5851-258b-440d-85da-d6219389c9d1)
2016-12-07 16:42:08,116 INFO org.apache.hadoop.hdfs.server.datanode.VolumeScanner: VolumeScanner(/hadoop2/data, DS-8d8c5851-258b-440d-85da-d6219389c9d1) exiting.

 

I assume the message indicate the refresh was done but the hdfs dfsadmin -reconfig datanode mynode:50020 status command still shows the reconfig is running after a long time.

Highlighted

Re: hdfs dfsadmin -reconfig datanode <datanode>:50020 start hung

Contributor

By doing more tests, I found the reconfig command hung if I remove a datanode partition from the config. It doesn't hung if I add a partition.

Re: hdfs dfsadmin -reconfig datanode <datanode>:50020 start hung

Expert Contributor
Seems like a bug then. It looks like it is removing the drive from config. That WARN before the deactivation starts is suspicious, curious if there is something preventing the drive removal from completing, and thus hanging the reconfig command. A stack trace of the datanode during a reconfig command would tell us more.