Looking for your advice for this issue.
We have an use case where almost 50TB of files need to be moved from local file server to HDFS. The files are kept under multiple folders in local file system and we need to maintain a similar HDFS folder structure. Looking for suggestions, any utilities through which we achieve this ojective.
Let me know in case you need any more information.
Thanks and Regards,
Have you considered NiFi? you have out of the box processors to list/fetch files and to write to HDFS. You can also use a NiFi cluster if you want to distribute the load on several nodes.
@rajdip chaudhuri As mentioned by @Abdelkrim Hadjidj, NiFi is a great candidate to solve these kinds of issues! He talked about processors like list & fetch files. As the name would have suggested, they list and fetch the data without you having to write any code and also gives you the properties attached to those files. For example the directory structure, a very important need you have mentioned in your use case.
A bunch of advantages that you can have using NiFi for this use case.
Let know if you need some any other help!
If this is a one off, and that file server is visible to all nodes in the cluster, you can actually use distcp with the source being a file://store/path URL and the destination hdfs://hdfsserver:port/path.. Use the -bandwidth option to limit the max bandwidth of every mapper so that the (mappers * bandwidth) value is less than the bandwidth off the file server