Created 09-26-2017 08:34 PM
I have a local hdp Cluster in meinem local machine and also NIFI installed. On the AWS I installed another hdp Cluster which is not kerberized. My Problem is how to copy all my data from the local cluster to the cluster in AWS using NIFI. Can I use puthdfs? How can I configure it for AWS? I will be thankful if someone can help.
Created 09-27-2017 12:27 AM
Read data from HDFS using a local nifi install and then send to Nifi installed in AWS using site-to-site protocol. Here is a link to documentation on site to site configuration.
If Nifi is not an option, distcp can be used. Distcp is widely used for copying data between clusters, when Nifi is not used.
For security keys, please see if you can use the following method. the document shows it for S3, but I am wondering if you might be able to use this same methid for your keys also.
Created 09-27-2017 01:08 PM
@mqureshi thanks for the idea. Now, NIFI ist not installed on the AWS Cluster (Edge Node) only locally. ist that to be done first ? I hink with gethdfs and puthdfs processors of nifi at the local Cluster could be work !!!
Created 09-28-2017 02:27 PM
You need to install Nifi on both AWS and in your local data center from where you will be moving data. You cannot put data into a remote HDFS cluster. Even if it works (it shouldn't, if for nothing, then for at least security reasons), it would be ridiculously slow.
Created 09-29-2017 02:25 PM
ok. Installing NIFI on AWS would take a lot time. Is there another way without using NIFI? distcp is also a tool to copy data between two clusters but I am not sure that it works for AWS Cluster.
Created 09-29-2017 02:36 PM
I just updated my answer. Yes, Distcp can be used.
Created 09-29-2017 07:47 PM
Unfortunately, Distcp does not work for AWS cluster. I could not add the creditential of AWS in the command (xxxx.pem File).
Created 09-30-2017 06:46 AM
Please see my updated answer. Not sure if it will help, but it might work.