Created on 04-02-2024 11:59 PM - edited 04-03-2024 03:07 AM
We have a requirement to transfer files from an HDFS directory to a remote server.
I've noticed options to copy files from HDFS to the local filesystem first (using copyToLocal) and then transfer files from the local filesystem to the remote server (using scp). But is there any direct method to copy files from HDFS to a remote server, such as using Sqoop functionality or any other methods like without copying to the local file system
Created 04-03-2024 03:11 AM
Hi @s198
You can use DistCp command to achieve the same.
refer ---> https://hadoop.apache.org/docs/current/hadoop-distcp/DistCp.html
Created on 04-03-2024 03:20 AM - edited 04-03-2024 03:41 AM
Thanks @jAnshula for your suggestion
My remote server does not support Hadoop-compatible file systems, so the DistCp command does not work for me. The primary objective is to copy the HDFS files as is to a Linux machine.
Created 04-03-2024 04:23 AM
You may try and use NIFI, refer below article
Created 04-03-2024 05:13 AM
Thanks @jAnshula for your suggestion.
We are trying to achieve this without using Nifi. Could you please let us know if any options are available using hdfs/hadoop commands?
Created on 04-03-2024 09:30 AM - edited 04-03-2024 09:31 AM
Hi @s198, You do not need to have hadoop file system or datanode role on the remote server. You just need to set up some hdfs gateway on the remote server and pull it using distcp. If you are using HDP or CDP, you can add the remote server as a gateway and perform distcp in the remote server.
Another option is to share one of the directories in the remote server, mount it in hadoop cluster node, and perform distcp to that mounted directory.