Hello everyone, I am new for big data framework. Currently I am working for collect many type of files from ftp/sftp server and put in HDFS. I just research about apache Nifi it good to go, but the interface look not good to monitor because all flow in the same place. Anyone have more experience about this could you recommend the best framework for this task?
Thanks,
Created 12-10-2021 04:25 AM
Hi ,
Please find the sample flow for List SFTPand Fetch SFTP processor and put into target HDFS path.
1. Processor ListSFTP - Keep listening input folder for example /opt/landing/project/data from Fileshare server. Once a new file arrival , the listsftp takes only name of the file and pass to FetchSFTP nifi processor to fetch the file from source folder.
Properties to mention in ListSFTP processor are highlighted below
2. Once latest file has been idenified by ListSFTP processor , the fetchSFTP processor to fetch the file from Source path.
Properties to configure in FetchSFTP processor.
3. In PUTHDFS processor , please configure the highlighted values of your project and required folder.
If your cluster is kerberos enabled , please add the kerbers controller service to access HDFS from NiFi.
4. Success and failure relationship of the PutHDFS nifi processor can be used to monitor the Flow status and status can stored in Hbase for queering flow status.
Created 11-24-2021 11:58 AM
Hi,
Currently we are using NiFi to pull the files from SFTP server and put into HDFS using NiFi listSFTP processor and FetchSTP processor and we can track the status of the flow whether it is success or failure and the ingestion status can be stored in persistent storage for example hbase or hdfs.We can query the status of the Ingestion anytime .
Another option we did in previous projects that we pulled the files from NAS storage to local file system(edge node or gateway node) then to Hadoop using SFTP copy unix command to put the file to HDFS using hdfs commands. The process has been done in shell script .Scheduled through control-M.
Between what you mean by all flow in same place ?
We can develop single NiFi flow to pull the files from SFTP server and put into Hadoop files system target path.
Created 11-25-2021 01:38 AM
Thanks for reply, Do you have example to build ftp/sftp in Nifi?
Created 12-10-2021 04:25 AM
Hi ,
Please find the sample flow for List SFTPand Fetch SFTP processor and put into target HDFS path.
1. Processor ListSFTP - Keep listening input folder for example /opt/landing/project/data from Fileshare server. Once a new file arrival , the listsftp takes only name of the file and pass to FetchSFTP nifi processor to fetch the file from source folder.
Properties to mention in ListSFTP processor are highlighted below
2. Once latest file has been idenified by ListSFTP processor , the fetchSFTP processor to fetch the file from Source path.
Properties to configure in FetchSFTP processor.
3. In PUTHDFS processor , please configure the highlighted values of your project and required folder.
If your cluster is kerberos enabled , please add the kerbers controller service to access HDFS from NiFi.
4. Success and failure relationship of the PutHDFS nifi processor can be used to monitor the Flow status and status can stored in Hbase for queering flow status.