Support Questions
Find answers, ask questions, and share your expertise

how to get files from SFTP to HDFS?

How to get files which are dropped in SFTP server directory to HDFS.thise file are generating timely bases, i need to get those file to HDFS.


Super Collaborator

NiFI has a GetFTP and PutHDFS processor. Are you using an HDF cluster?

Hi Jordan.. I'm using HDP2.6 cluster,I don't have any idea in NiFi. Can we install Nifi in hdp cluster? And if I install Nifi then how to get files from SFTp to hdfs? And those files are droping in one folder.every day one folder will create with corresponding date.please help me develop this.

Super Collaborator

NiFI is not your only option. You could install a Flume Agent on the SFTP Server to read this folder as a spooling directory.

You can use Spark to read from the FTP directory and write to HDFS as it's just a filesystem. Add FTP Java clients to your code, and read from a folder.

Whatever route you chose, you either need

1. additional software installed on the SFTP Server itself

2. setup a process "upstream" of the SFTP server that also sends files to HDFS. That could be by WebHDFS, HTTPFS, or a NFS Gateway

3. Some software that HDP does not provide out of the box between that server and HDFS. This includes NiFi, but Streamsets is another option. The official documentation for those softwares are going to tell you more than I would be able to here.

If you want to use HDF, I believe you see if this documentation suits your needs.

New Contributor

Hi @Jordan Moore, what option would you suggest if you have 100 different sftp sources and 10-15 files in each of them. Configuring individual NiFi processes is not an option here. I've played around with NiFi processors and they are not very good at working with parameters. Would Spark be a good solution for my case?



hi @Jordan.. Thanks for response.. I don't have any idea about Nifi and I'm using HDP 2.6 can I achive the same? Please guide me

Thanks @Jordan Moore.. My case Remote SFTP server is belongs to 3rd party so I can not install any software on that,only thing is I can access SFTP and get files into my LFS, but I want those files into my HDFS. For this what have to do?

For WebHDFS also I need to configure in remote server,is there any chance get with out doing anything on remote server?

I am having HDP cluster but not HDF for utilizing NiFi.

Can I install NiFi on HDP cluster?

How to configure Flume-ng for getting SFTP server files? what are the additional things required for this? Because Flume is installed in my cluster,instead of Installing new thing we can utilize flume.

Super Collaborator

@Ravikiran Dasari, You can of course install NiFi as an extra service, just as anything else. You are not locked to only packages HDP provides. You just lose the advantage of using Ambari to monitor and configure it. Feel free to read over the NiFi installation documentation, if you want to use it.

Or you can install HDF services (such as NiFi) to your existing HDP cluster.

If you want to use Flume, it seems there is an external FTP source, however, I personally don't know to install or configure it

Also see

Thank You @Jordan Moore