Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Running distcp between two cluster: One Kerberized and the other is not

Solved Go to solution

Running distcp between two cluster: One Kerberized and the other is not

hadoop distcp -i -log /tmp/ hdfs://xxx:8020/apps/yyyy hdfs://xxx_cid/tmp/

In this case the "xxx" is the "un-secure" cluster, while "xxx_cid" in the secure cluster.

We are launching the job from the Kerberos cluster, with the appropriate kinit for the user and getting the following error:

java.io.IOException: Failed on local exception: java.io.IOException: Server asks us to fall back to SIMPLE auth, but this client is configured to only allow secure connections.; Host Details : local host is: "xxx/10.x.x.x"; destination host is: "xxx":8020;

...

Caused by: java.io.IOException: Server asks us to fall back to SIMPLE auth, but this client is configured to only allow secure connections.

I thought by launching the job from the secure cluster, that we could avoid any access issues. But it appears that the processes are kicked off from the "source" cluster. In this case, that's the insecure cluster.

Idea's on getting around this?

1 ACCEPTED SOLUTION

Accepted Solutions

Re: Running distcp between two cluster: One Kerberized and the other is not

I recommend not setting this in core-site.xml, and instead setting it on the command line invocation specifically for the DistCp command that needs to communicate with the unsecured cluster. Setting it in core-site.xml means that all RPC connections for any application are eligible for fallback to simple authentication. This potentially expands the attack surface for man-in-the-middle attacks.

Here is an example of overriding the setting on the command line while running DistCp:

hadoop distcp -D ipc.client.fallback-to-simple-auth-allowed=true hdfs://nn1:8020/foo/bar hdfs://nn2:8020/bar/foo

The command must be run while logged into the secured cluster, not the unsecured cluster.

16 REPLIES 16

Re: Running distcp between two cluster: One Kerberized and the other is not

sounds like the distcp process is running secure, but is configured to not like simple connections.

try setting the config option

ipc.client.fallback-to-simple-auth-allowed=true

Re: Running distcp between two cluster: One Kerberized and the other is not

@dstreever@hortonworks.com To use Distcp for copying between a secure cluster and an insecure one, add the following to the HDFS core-default.xml, by using Ambari.

<property>
  <name>ipc.client.fallback-to-simple-auth-allowed</name>
  <value>true</value> 
</property>

Re: Running distcp between two cluster: One Kerberized and the other is not

New Contributor

Adding this property in core-site.xml helped resolve the error.

Re: Running distcp between two cluster: One Kerberized and the other is not

@Pardeep Nice find! Link

When copying data from a secure cluster to a secure cluster, the following configuration setting is required in the core-site.xml file:

<property>
    <name>hadoop.security.auth_to_local</name>
    <value></value>
    <description>Maps kerberos principals to local user names</description>
</property> 

Re: Running distcp between two cluster: One Kerberized and the other is not

I recommend not setting this in core-site.xml, and instead setting it on the command line invocation specifically for the DistCp command that needs to communicate with the unsecured cluster. Setting it in core-site.xml means that all RPC connections for any application are eligible for fallback to simple authentication. This potentially expands the attack surface for man-in-the-middle attacks.

Here is an example of overriding the setting on the command line while running DistCp:

hadoop distcp -D ipc.client.fallback-to-simple-auth-allowed=true hdfs://nn1:8020/foo/bar hdfs://nn2:8020/bar/foo

The command must be run while logged into the secured cluster, not the unsecured cluster.

Re: Running distcp between two cluster: One Kerberized and the other is not

@Chris Nauroth Thanks for sharing this. Could you update the answer with more details? I believe this is the best answer if you can add more details.

Highlighted

Re: Running distcp between two cluster: One Kerberized and the other is not

@Neeraj Sabharwal, thank you. I updated the answer to show an example of overriding the property from the DistCp command line.

Re: Running distcp between two cluster: One Kerberized and the other is not

Re: Running distcp between two cluster: One Kerberized and the other is not

New Contributor
  1. getting below error after running the command "hadoop distcp -D ipc.client.fallback-to-simple-auth-allowed=true hdfs://nn1:8020/foo/bar hdfs://nn2:8020/bar/foo"
  2. java.io.EOFException:End of FileException between local host is***; destination host is:***;
  3. please suggest