Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

What is the Big data framework for collect data from ftp/sftp?

avatar
Explorer

Hello everyone, I am new for big data framework. Currently I am working for collect many type of files from ftp/sftp server and put in HDFS. I just research about apache Nifi it good to go, but the interface look not good to monitor because all flow in the same place. Anyone have more experience about this could you recommend the best framework for this task?

 

Thanks,

 

1 ACCEPTED SOLUTION

avatar
Contributor

Hi ,

 

Please find the sample flow for List SFTPand Fetch SFTP processor and put into target HDFS path.

 

adhishankarit_0-1639138383603.png

 

1. Processor ListSFTP - Keep listening input folder for example /opt/landing/project/data from Fileshare server. Once a new file arrival , the listsftp takes only name of the file and pass to FetchSFTP nifi processor to fetch the file from source folder.

 

Properties to mention in ListSFTP processor are highlighted below

 

adhishankarit_1-1639138765376.png

 

2. Once latest file has been idenified by ListSFTP processor , the fetchSFTP processor to fetch the file from Source path.

Properties to configure in FetchSFTP processor.

adhishankarit_2-1639138923513.png

3. In PUTHDFS processor , please configure the highlighted values of your project and required folder.

adhishankarit_3-1639139017431.png

If your cluster is kerberos enabled , please add the kerbers controller service to access HDFS from NiFi.

4. Success and failure relationship of the PutHDFS nifi processor can be used to monitor the Flow status and status can stored in Hbase for queering flow status.

 

 

View solution in original post

3 REPLIES 3

avatar
Contributor

Hi, 

Currently we are using NiFi to pull the files from SFTP server and put into HDFS  using NiFi listSFTP processor and FetchSTP processor and we can track the status of the flow whether it is success or failure and the ingestion status can be stored in persistent storage for example hbase or hdfs.We can query the status of the Ingestion anytime .

 

Another option we did in previous projects that we pulled the files from NAS storage to local file system(edge node or gateway node) then to Hadoop using SFTP copy unix command to put the file to HDFS using hdfs commands. The process has been done in shell script .Scheduled through control-M.

 Between what you mean by all flow in same place ?

We can develop  single NiFi flow to pull the files from SFTP server and put into Hadoop files system target path.

 

avatar
Explorer

Thanks for reply, Do you have example to build ftp/sftp in Nifi?

 

avatar
Contributor

Hi ,

 

Please find the sample flow for List SFTPand Fetch SFTP processor and put into target HDFS path.

 

adhishankarit_0-1639138383603.png

 

1. Processor ListSFTP - Keep listening input folder for example /opt/landing/project/data from Fileshare server. Once a new file arrival , the listsftp takes only name of the file and pass to FetchSFTP nifi processor to fetch the file from source folder.

 

Properties to mention in ListSFTP processor are highlighted below

 

adhishankarit_1-1639138765376.png

 

2. Once latest file has been idenified by ListSFTP processor , the fetchSFTP processor to fetch the file from Source path.

Properties to configure in FetchSFTP processor.

adhishankarit_2-1639138923513.png

3. In PUTHDFS processor , please configure the highlighted values of your project and required folder.

adhishankarit_3-1639139017431.png

If your cluster is kerberos enabled , please add the kerbers controller service to access HDFS from NiFi.

4. Success and failure relationship of the PutHDFS nifi processor can be used to monitor the Flow status and status can stored in Hbase for queering flow status.