We currently have 13 EBS disks volumes in our one datanode, we have around 10 such datanodes
each disk of the data node is of 1.5 TB used. We want to copy 1.5TB of volume to new disk.
copying 100GB take about an hour or more. so 1.5TB will take so many hours to copy. Is there any faster way to copy the data?
I am using rsync command to copy to new disk.
Hi @Madhura Mhatre,
This may not the perfect answer but you can try this way:
First check the below process in one data node, if it work's perfectly please replicate it others
1. Create the new config group under one data node
for example you have configured /data1(1.5 T.B) datanode
2. overwrite the dfs.datanode.dir config parameter in new config group
Remove /data1 and add /data2(1 T.B) /data3 (1 T.B)
3. save the changes and restart required services.
4. blocks are automatically copied from other data nodes since old drive is missing configuration
Note: Cluster may get slowness due to heavy data lifting from other datanodes.
The fastest way to copy files between 2 EBS volumes attached to the same instance, it's fastest if you can unmount both drives or at least remount the first one as read-only, the fastest way would be to use 'dd' to copy everything (including filesystem structures).
dd if=/dev/device1 of=/dev/device2
Since you are copying to a bigger volume, you might want to run 'resize2fs /dev/device2' after that (to expand the filesystem).
Hey @Geoffrey Shelton Okot . I tried this command but it does not work as my current disk is 2.5TB and disk I am copying to is 1.7TB ... and this command requires that it should be of same disk size. Is there any other alternate way?