Support Questions
Find answers, ask questions, and share your expertise

can we import filles from SFTP to HDFS directly?

I need to get files from SFTP server to HDFS directly

1 ACCEPTED SOLUTION

Accepted Solutions

Re: can we import filles from SFTP to HDFS directly?

Super Guru

Apache NiFi can read from sFTP and then use the PutHDFS to put that raw file in an HDFS directory.

Two boxes, one line, no code.

5 minutes of work.

View solution in original post

6 REPLIES 6

Re: can we import filles from SFTP to HDFS directly?

Expert Contributor

If you have an HDF cluster running, you can create a NiFi flow to accomplish this. Otherwise you will need a client to download the file first before importing into HDFS.

Re: can we import filles from SFTP to HDFS directly?

Hi @anarasimham, Thanks for response..

I am having HDP cluster, how to work out with Nifi in my cluster? And I need to automate this job, because I need to insert those files continuously to Hive tables.

Re: can we import filles from SFTP to HDFS directly?

Expert Contributor

If you're using Ambari 2.5.2 you should be able to install NiFi on the same cluster using the HDF management pack: https://docs.hortonworks.com/HDPDocuments/HDF3/HDF-3.0.1.1/bk_installing-hdf-and-hdp/content/ch_inst...

Yes you can automate the job with NiFi - you'll have to create a way to query your SFTP endpoint for incremental changes and then get those new files.

Re: can we import filles from SFTP to HDFS directly?

Super Guru

Apache NiFi can read from sFTP and then use the PutHDFS to put that raw file in an HDFS directory.

Two boxes, one line, no code.

5 minutes of work.

View solution in original post

Re: can we import filles from SFTP to HDFS directly?

Hi @Timothy Spann, Thank u for response..can we automate this job? because I need to get files continusly to Hive tables.I am having HDP cluster,can we install Nifi in this?

Re: can we import filles from SFTP to HDFS directly?

Super Guru

Yes continuously, automatically.

By default it polls for new files every 60 seconds, you can shrink that.

You can also convert those files to Apache ORC and auto build new Hive tables on them if the files are CSV, TSV, Avro, Excel, JSON, XML, EDI, HL7 or C-CDA.

Install Apache NiFi on an edge node, there are ways to combine them with HDP 2.6 and HDF 3 with the new Ambari. But it's easiest to have a separate node for Apache NiFi to start.

You can also just download nifi unzip and run on a laptop that has JDK 8 installed

https://www.apache.org/dyn/closer.lua?path=/nifi/1.4.0/nifi-1.4.0-bin.zip

42948-sftp.png