We have large data folders sitting on Windows shares that we would like to replicate to HDFS for processing. I have tried NFS, but it is too slow. I have tried Nifi and it works much better, but I still experience issues. For example, the listfile processor takes so long to run and frequently errors out/crashes Nifi. Is there a better way to move large preexisting data into HDFS from a remote source using Nifi?
Your best bet is to use Nifi. If Nifi is crashing, you need to see your Nifi memory, number of threads you are using to read data and other performance related parameters. How much data do you need to move?
After posting this I did reconfigure Nifi and was able to get it to copy all of the data (java heap size issue). The data sizes could vary. Whatever we get from clients. The job I am copying right now is over 600GB.