Community Articles

Find and share helpful community-sourced technical articles.
Announcements
Celebrating as our community reaches 100,000 members! Thank you!
Labels (1)
avatar

Requirement:

Currently we have /hadoop/hdfs/data and /hadoop/hdfs/data1 as datanode directories.

I have new mountpoint (/hadoop/hdfs/data/datanew) with faster disk and I want to keep only this mountpoint as datanode directory.

Steps:

  1. Stop the cluster.
  2. Go to the ambari HDFS configuration and edit the datanode directory configuration: Remove /hadoop/hdfs/data and /hadoop/hdfs/data1. Add /hadoop/hdfs/datanew save.
  3. Login into each datanode VM and copy the contents of /data and /data1 into /datanew
  4. Change the ownership of /datanew and everything under it to “hdfs”.
  5. Start the cluster.
17,512 Views
Comments
avatar

@Ancil McBarnett is there anyway to do this without downtime? Could you add a disk drive into a hot-swappable bay, add it to DataNode's list of directories, force a rebalance, and remove one of the old drives?

avatar

@Vladimir Zlatkin should work as well. You can add a new drive, mount it and add the new mount point to the list of HDFS directories. If you have a lot of drives or mount points that you need to change, I'd probably decommission the Datanode and re-commission it once the changes are finished. Keep in mind that the latter can cause some additional network traffic.

avatar

I found the documentation on how to do this without downtime: https://hadoop.apache.org/docs/r2.7.2/hadoop-project-dist/hadoop-hdfs/HdfsUserGuide.html#DataNode_Ho...

The only challenge that I encountered was the :port: in the command. It is the dfs.datanode.ipc.address parameter from hdfs-site.xml. My full command looked like this

su - hdfs -c "hdfs dfsadmin -reconfig datanode sandbox.hortonworks.com:8010 start"