What are the best practices around copying data between two clusters located in different datacenter on different LAN, the scope is to limit loops.
You can use Apache Falcon http://hortonworks.com/hadoop/falcon/
or see this https://community.hortonworks.com/articles/9933/apache-nifi-aka-hdf-data-flow-across-data-center.htm...
View solution in original post
Today client is using couple of staging/ftp servers but want to know if there are other practices, all the data is in HDFS.