when trying to distcp from insecure to secure hadoop cluster getting below error.
hdfs@master02:~> hadoop distcp -Dipc.client.fallback-to-simple-auth-allowed=true hdfs://HDP23:8020/test01.txt hdfs://HDP24:8020/ 17/04/05 00:09:28 ERROR tools.DistCp: Invalid arguments: org.apache.hadoop.security.AccessControlException: SIMPLE authentication is not enabled. Available:[TOKEN, KERBEROS]
Can anyone has any idea about this error, what changes needs to made in source or target cluster. Please suggest.
Security settings dictate whether DistCp should be run on the source cluster or the destination cluster. The general rule-of-thumb is that if one cluster is secure and the other is not secure, DistCp should be run from the secure cluster (Where as you are trying to distcp from insecure to secure hadoop cluster) otherwise there may be security- related issues.
When copying data from a secure cluster to an non-secure cluster, the following configuration setting is required for the DistCp client:
<property> <name>ipc.client.fallback-to-simple-auth-allowed</name> <value>true</value> </property>
When copying data from a secure cluster to a secure cluster, the following configuration setting is required in the core-site.xml file:
<property> <name>hadoop.security.auth_to_local</name> <value></value> <description>Maps kerberos principals to local user names</description> </property>
See Hortonworks recommendation suggests "one cluster is secure and the other is not secure, DistCp should be run from the secure cluster"
To pull data from the insecure cluster to the secure cluster, connect to the secure cluster and run the below:
kinit as your user of choice. klist to confirm
hdfs dfs -D ipc.client.fallback-to-simple-auth-allowed=true -ls hdfs://<insecure cluster>/
hadoop distcp -D ipc.client.fallback-to-simple-auth-allowed=true hdfs://<insecure cluster>/path/to/source destination
Note the space after the -D parameter.
I prefer this way rather than modifying the global HDFS configuration to allow simple auth.
If not using HA on the insecure cluster you can use the insecure cluster's active namenode address and port for <insecure cluster>.
Otherwise if using HA, you can use the nameservice for the insecure cluster as the value for <insecure cluster>, but only if the secure cluster HDFS has already been configured to know about the insecure cluster nameservice. Let me know if you need help with that also.