Support Questions
Find answers, ask questions, and share your expertise

Add disconnect=true option when copying files from ftp into hdfs with "-get"?

Expert Contributor

Attempting to copy multiple files into HDFS from FTP location via "hadoop fs -get <ftp url> <hdfs location>" and was seeing errors

 

 

-get: Fatal internal error
....
org.apache.hadoop.fs.ftp.FTPException: Failed to get home directory
    at org.apache.hadoop.fs.ftp.FTPFileSystem.getHomeDirectory(FTPFileSystem.java:699)
    ....
Caused by: org.apache.commons.net.ftp.FTPConnectionClosedException: Connection closed without indication.
    at org.apache.commons.net.ftp.FTP.__getReply(FTP.java:324)
    ....

 

 

Looking here (https://stackoverflow.com/a/34506003/8236733) I am assuming that when calling "hadoop fs -get <ftp url> <hdfs location>" multiple times, the connection is getting left open and goes stale after some time which is causing the error (do let me know if it's actually from something else (though I will note I was not getting this error until increased the concurrency to a certain limit)).

 

When adding the "disconnect=true" option to the ftp url as

ftp://myuser:mypassword@172.18.5.27/path/to/some/file.tsv?disconnect=true

I got error where hadoop thought that "?disconnect=true" was part of the file name.

get: `ftp://myuser:mypassword@172.18.5.27/path/to/some/file.tsv?disconnect=true': No such file or directory

So how can that option be added to the command properly?