Hi. We have a client who has 2 clusters. On the security cluster, they have sensitive data that they redact and copy to the analysis cluster. For security reasons, they would like to minimize the number of open ports on the security cluster. We have successfully tested using distcp from the shell to copy the data with port 8020 open. They would now like to automate the process through oozie. In testing, we have run into an error that port 8042 (Node Manager External Port) is not open.
We do not understand why distcp works fine without port 8042 available when run through the shell but fails when called through Oozie.
Any help would be appreciated. Thanks.
Thanks for your reply. Could we ask a related question? Our client is very reluctant to open the ports on these 2 clusters. Could you tell us what ports need to be open for distcp to function properly? After many fails, our client has briefly allowed all the ports to be open. With that change, distcp if working properly. We have already looked at the ports specified in https://www.cloudera.com/documentation/enterprise/latest/topics/install_ports_distcp.html#topic_9_1.
Are there any hidden ports or secondary ports beyond the above documentation that could be causing the problem?
Harsh J: Thanks for the help on the previous issue. We finally resolved the issue. It was due to an undocumented port required in the CDH 6.2 to CDH 6.2 distcp. Now, we are migrating the task over to Oozie and having some trouble. Could you elaborate a bit more or give us some links or pointers? Thanks.
We could not find "mapreduce.job.hdfs-servers" . Where is that?