I have a local hdp Cluster in meinem local machine and also NIFI installed. On the AWS I installed another hdp Cluster which is not kerberized. My Problem is how to copy all my data from the local cluster to the cluster in AWS using NIFI. Can I use puthdfs? How can I configure it for AWS? I will be thankful if someone can help.
Read data from HDFS using a local nifi install and then send to Nifi installed in AWS using site-to-site protocol. Here is a link to documentation on site to site configuration.
If Nifi is not an option, distcp can be used. Distcp is widely used for copying data between clusters, when Nifi is not used.
For security keys, please see if you can use the following method. the document shows it for S3, but I am wondering if you might be able to use this same methid for your keys also.
@mqureshi thanks for the idea. Now, NIFI ist not installed on the AWS Cluster (Edge Node) only locally. ist that to be done first ? I hink with gethdfs and puthdfs processors of nifi at the local Cluster could be work !!!
You need to install Nifi on both AWS and in your local data center from where you will be moving data. You cannot put data into a remote HDFS cluster. Even if it works (it shouldn't, if for nothing, then for at least security reasons), it would be ridiculously slow.
ok. Installing NIFI on AWS would take a lot time. Is there another way without using NIFI? distcp is also a tool to copy data between two clusters but I am not sure that it works for AWS Cluster.
Unfortunately, Distcp does not work for AWS cluster. I could not add the creditential of AWS in the command (xxxx.pem File).