Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

How to copy files from a remote linux box using Nifi

avatar
Expert Contributor

Hi,

I have some files sitting on Linux box outside Nifi cluster.

How do I copy these files through Nifi?

I want to copy these files, make few changes and store them on S3.

ListFiles and FetchFiles are for local disk I believe.

Thanks

Obaid

4 REPLIES 4

avatar

Hi @Obaid Salikeen,

you can use SFTP or FTP processors to pull the data from your remote box:

FetchSFTP, GetFTP, GetSFTP, ListSFTP

6024-screen-shot-2016-07-24-at-100847-pm.png

Thanks

avatar
Expert Contributor

Thanks for your reply,

A follow-up question: The files are sitting on a plain linux server which does not have any server setup like FTP server etc.

So do I need to specifically 'host' those files by a server, or can Nifi just SSH (or any other way) into the host and copy the files ?

Obaid

avatar

Hi,

If you think you will have challenges to get those processors to work, you have an option of RemoteProcessGroup using nifi site-to-site protocol to transfer data from your remote box to your cluster. But for that you need to have an instance of nifi running on the remote server and cluster should be accessible from it. flow can be:

@remote-box:

getfile --> RemoteProcessGroup [with cluster url]

@nifi-cluster:

inputport --> [transformation-you-need] --> [upload-to-s3]

you can refer below docs to set up RPG:

https://nifi.apache.org/docs/nifi-docs/html/user-guide.html#Remote_Group_Transmission

https://docs.hortonworks.com/HDPDocuments/HDF1/HDF-1.1.1/bk_UserGuide/content/site-to-site.html

Site to Site Properties section under

https://docs.hortonworks.com/HDPDocuments/HDF1/HDF-1.1.1/bk_UserGuide/content/site-to-site.html

Thanks

avatar

Hi @Obaid Salikeen,

Another option is to use ExecuteProcess or ExecuteStreamCommand to execute a custom script that will SCP to your remote Linux instance.

Otherwise there is a JIRA for a SCP processor:

https://issues.apache.org/jira/browse/NIFI-539

Hope this helps.