Created 03-10-2016 01:11 PM
I have two servers that need to put files into HDFS. Some files are created every 15 minutes and some files are created once a day.
I am trying to setup a secure way of transferring files from a server and loading them into HDFS. What would you recommend is the best practice for this?
Created 03-10-2016 01:56 PM
@Sagar Shimpi I would consider using Apache Nifi, combination of ListFile, FetchFile, EncryptContent, MergeContent and PutHDFS processors would achieve a great convenient process for you.
Created 03-10-2016 02:48 PM
Apache nifi is good but a lot of times it is simply scripts. I like python.
I normally either delete correctly uploaded files or create a little done file. Just make sure you can rerun the script with the same results. Also create an empty _SUCCESS file in the hdfs folder after successful upload ( and then mark the files on the local system) and make sure the script only runs once ( a lock file helps)
Then use falcon/ oozie to pick up these 15 min 1h folders and process them further ( they wait for a success file)
Regarding security. Use sftp or rsync to move data to the edge node or mount the file server on the edge node ( sshfs is nice) . You can also use webhdfs but I prefer a native client on an edge node ( called through python system calls)
Created 12-05-2017 07:39 AM
@Sagar Shimpi hi ... now my requirement also same..how did u achieve tht? I dont know NiFi .. but my case server of some where else..please help me