Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

can we import filles from SFTP to HDFS directly?

avatar
Rising Star

I need to get files from SFTP server to HDFS directly

1 ACCEPTED SOLUTION

avatar
Master Guru

Apache NiFi can read from sFTP and then use the PutHDFS to put that raw file in an HDFS directory.

Two boxes, one line, no code.

5 minutes of work.

View solution in original post

6 REPLIES 6

avatar
Super Collaborator

If you have an HDF cluster running, you can create a NiFi flow to accomplish this. Otherwise you will need a client to download the file first before importing into HDFS.

avatar
Rising Star

Hi @anarasimham, Thanks for response..

I am having HDP cluster, how to work out with Nifi in my cluster? And I need to automate this job, because I need to insert those files continuously to Hive tables.

avatar
Super Collaborator

If you're using Ambari 2.5.2 you should be able to install NiFi on the same cluster using the HDF management pack: https://docs.hortonworks.com/HDPDocuments/HDF3/HDF-3.0.1.1/bk_installing-hdf-and-hdp/content/ch_inst...

Yes you can automate the job with NiFi - you'll have to create a way to query your SFTP endpoint for incremental changes and then get those new files.

avatar
Master Guru

Apache NiFi can read from sFTP and then use the PutHDFS to put that raw file in an HDFS directory.

Two boxes, one line, no code.

5 minutes of work.

avatar
Rising Star

Hi @Timothy Spann, Thank u for response..can we automate this job? because I need to get files continusly to Hive tables.I am having HDP cluster,can we install Nifi in this?

avatar
Master Guru

Yes continuously, automatically.

By default it polls for new files every 60 seconds, you can shrink that.

You can also convert those files to Apache ORC and auto build new Hive tables on them if the files are CSV, TSV, Avro, Excel, JSON, XML, EDI, HL7 or C-CDA.

Install Apache NiFi on an edge node, there are ways to combine them with HDP 2.6 and HDF 3 with the new Ambari. But it's easiest to have a separate node for Apache NiFi to start.

You can also just download nifi unzip and run on a laptop that has JDK 8 installed

https://www.apache.org/dyn/closer.lua?path=/nifi/1.4.0/nifi-1.4.0-bin.zip

42948-sftp.png