Support Questions

Find answers, ask questions, and share your expertise
Announcements
Check out our newest addition to the community, the Cloudera Data Analytics (CDA) group hub.

how to copy data from local hdfs to another hdfs in AWS using NIFI

I have a local hdp Cluster in meinem local machine and also NIFI installed. On the AWS I installed another hdp Cluster which is not kerberized. My Problem is how to copy all my data from the local cluster to the cluster in AWS using NIFI. Can I use puthdfs? How can I configure it for AWS? I will be thankful if someone can help.

7 REPLIES 7

Super Guru

@Chokri Ben Necib

Read data from HDFS using a local nifi install and then send to Nifi installed in AWS using site-to-site protocol. Here is a link to documentation on site to site configuration.

https://docs.hortonworks.com/HDPDocuments/HDF3/HDF-3.0.1/bk_user-guide/content/configure-site-to-sit...

If Nifi is not an option, distcp can be used. Distcp is widely used for copying data between clusters, when Nifi is not used.

For security keys, please see if you can use the following method. the document shows it for S3, but I am wondering if you might be able to use this same methid for your keys also.

https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.6.1/bk_cloud-data-access/content/s3-credential-...

@mqureshi thanks for the idea. Now, NIFI ist not installed on the AWS Cluster (Edge Node) only locally. ist that to be done first ? I hink with gethdfs and puthdfs processors of nifi at the local Cluster could be work !!!

Super Guru

@Chokri Ben Necib

You need to install Nifi on both AWS and in your local data center from where you will be moving data. You cannot put data into a remote HDFS cluster. Even if it works (it shouldn't, if for nothing, then for at least security reasons), it would be ridiculously slow.

ok. Installing NIFI on AWS would take a lot time. Is there another way without using NIFI? distcp is also a tool to copy data between two clusters but I am not sure that it works for AWS Cluster.

Super Guru

I just updated my answer. Yes, Distcp can be used.

Unfortunately, Distcp does not work for AWS cluster. I could not add the creditential of AWS in the command (xxxx.pem File).

Super Guru

@Chokri Ben Necib

Please see my updated answer. Not sure if it will help, but it might work.

Take a Tour of the Community
Don't have an account?
Your experience may be limited. Sign in to explore more.