Support Questions
Find answers, ask questions, and share your expertise

Add disconnect=true option when copying files from ftp into hdfs with "-get"?

Expert Contributor

Attempting to copy multiple files into HDFS from FTP location via "hadoop fs -get <ftp url> <hdfs location>" and was seeing errors



-get: Fatal internal error
org.apache.hadoop.fs.ftp.FTPException: Failed to get home directory
    at org.apache.hadoop.fs.ftp.FTPFileSystem.getHomeDirectory(
Caused by: Connection closed without indication.



Looking here ( I am assuming that when calling "hadoop fs -get <ftp url> <hdfs location>" multiple times, the connection is getting left open and goes stale after some time which is causing the error (do let me know if it's actually from something else (though I will note I was not getting this error until increased the concurrency to a certain limit)).


When adding the "disconnect=true" option to the ftp url as


I got error where hadoop thought that "?disconnect=true" was part of the file name.

get: `ftp://myuser:mypassword@': No such file or directory

So how can that option be added to the command properly?