Support Questions
Find answers, ask questions, and share your expertise

Secure way to transfer files to hdfs

I have two servers that need to put files into HDFS. Some files are created every 15 minutes and some files are created once a day.

I am trying to setup a secure way of transferring files from a server and loading them into HDFS. What would you recommend is the best practice for this?

3 REPLIES 3

Mentor

@Sagar Shimpi I would consider using Apache Nifi, combination of ListFile, FetchFile, EncryptContent, MergeContent and PutHDFS processors would achieve a great convenient process for you.

https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi.processors.standard.EncryptContent...

Apache nifi is good but a lot of times it is simply scripts. I like python.

I normally either delete correctly uploaded files or create a little done file. Just make sure you can rerun the script with the same results. Also create an empty _SUCCESS file in the hdfs folder after successful upload ( and then mark the files on the local system) and make sure the script only runs once ( a lock file helps)

Then use falcon/ oozie to pick up these 15 min 1h folders and process them further ( they wait for a success file)

Regarding security. Use sftp or rsync to move data to the edge node or mount the file server on the edge node ( sshfs is nice) . You can also use webhdfs but I prefer a native client on an edge node ( called through python system calls)

@Sagar Shimpi hi ... now my requirement also same..how did u achieve tht? I dont know NiFi .. but my case server of some where else..please help me