Created on 06-08-2016 04:45 PM - edited 09-16-2022 03:24 AM
I'm trying to communicate programmatically to a Hadoop cluster which is kerberized (CDH 5.3/HDFS 2.5.0).
I have a valid Kerberos token on the client side. But I'm getting an error as below, "No common protection layer between client and server".
What does this error mean and are there any ways to fix or work around it?
Is this something related to HDFS-5688? The ticket seems to imply that the property "hadoop.rpc.protection" must be set, presumably to "authentication" (also per e.g. this).
Would this need to be set on all servers in the cluster and then the cluster bounced? I don't have easy access to the cluster so I need to understand whether 'hadoop.rpc.protection' is the actual cause. It seems that 'authentication' should be the value used by default, at least according to the core-default.xml documentation.
java.io.IOException: Failed on local exception: java.io.IOException: Couldn't setup connection for principal1/server1.acme.net@xxx.acme.net to server2.acme.net/10.XX.XXX.XXX:8020; Host Details : local host is: “some-host.acme.net/168.XX.XXX.XX”; destination host is: “server2.acme.net”:8020;
at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:764) at org.apache.hadoop.ipc.Client.call(Client.java:1415) at org.apache.hadoop.ipc.Client.call(Client.java:1364) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206) at com.sun.proxy.$Proxy24.getFileInfo(Unknown Source) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102) at com.sun.proxy.$Proxy24.getFileInfo(Unknown Source) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:707) at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:1785) at org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1068) at org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1064) at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1064) at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1398) ... 11 more
Caused by: java.io.IOException: Couldn't setup connection for principal1/server1.acme.net@xxx.acme.net to server2.acme.net/10.XX.XXX.XXX:8020;
at org.apache.hadoop.ipc.Client$Connection$1.run(Client.java:671) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614) at org.apache.hadoop.ipc.Client$Connection.handleSaslConnectionFailure(Client.java:642) at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:725) at org.apache.hadoop.ipc.Client$Connection.access$2800(Client.java:367) at org.apache.hadoop.ipc.Client.getConnection(Client.java:1463) at org.apache.hadoop.ipc.Client.call(Client.java:1382) ... 31 more
Caused by: javax.security.sasl.SaslException: No common protection layer between client and server
at com.sun.security.sasl.gsskerb.GssKrb5Client.doFinalHandshake(GssKrb5Client.java:251) at com.sun.security.sasl.gsskerb.GssKrb5Client.evaluateChallenge(GssKrb5Client.java:186) at org.apache.hadoop.security.SaslRpcClient.saslEvaluateToken(SaslRpcClient.java:483) at org.apache.hadoop.security.SaslRpcClient.saslConnect(SaslRpcClient.java:427) at org.apache.hadoop.ipc.Client$Connection.setupSaslConnection(Client.java:552) at org.apache.hadoop.ipc.Client$Connection.access$1800(Client.java:367) at org.apache.hadoop.ipc.Client$Connection$2.run(Client.java:717) at org.apache.hadoop.ipc.Client$Connection$2.run(Client.java:713) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614) at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:712) ... 34 more
Created 06-10-2016 06:02 AM
Thanks, Harsh, very helpful.
I've been poking around on an edge node, so I do have access to hdfs-site.xml and core-site.xml. We may have to munge these files before we can use them, as they contain some values such as host names for fs.defaultFS which are cluster internal; we'll have to use different host names to be able to get in from outside the cluster...
Since we deal with multiple clusters organized by stage (dev, prod, etc.), we'd have to maintain multiple pairs of core-site.xml, hdfs-site.xml files, and load them dynamically at runtime via Configuration.addResource() method...
Created 06-16-2016 05:47 AM
Because of our requirements (to be able to be targeted toward a different cluster based on deployment, and the HDFS config files potentially having cluster-internal host names in them), we're going with the approach of maintaining the minimalistic set of Configuration properties required to make Kerberos work on the client side. These are, again:
* dfs.namenode.kerberos.principal
* hadoop.rpc.protection
Having said that, Harsh's comments are all valid and relevant.
Created 06-08-2016 05:33 PM
Created 06-08-2016 08:10 PM
Hi Harsh,
My Hadoop dependencies are: hadoop-common, hadoop-hdfs, version=2.5.0, since we're running CDH 5.3. Does that sound like the right version?
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-common</artifactId>
<version>2.5.0</version>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-hdfs</artifactId>
<version>2.5.0</version>
</dependency>
Thanks.
- Dmitry
Created 06-08-2016 09:25 PM
Created 06-09-2016 06:25 AM
Harsh,
Thanks for the hadoop-client suggestion, I've changed the pom file. That did not make any difference, however, as far as the issue is concerned.
As far as downloading a client configuration zip, is that something I could do via Hue? I do not have access to the main SCM interface. Any other means of retrieving this?
Per your comment "You certainly do need to set hadoop.rpc.protection to the exact value the cluster expects", I've tried the other values. The "authentication" and "integrity" did not make a difference, I was still getting the error “No common protection layer between client and server”.
However, setting "hadoop.rpc.protection" to "privacy" caused a different type of error (see below). Any recommendations at this point? Thanks.
Exception in thread "main" org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.ipc.StandbyException): Operation category READ is not supported in state standby
at org.apache.hadoop.hdfs.server.namenode.ha.StandbyState.checkOperation(StandbyState.java:87)
at org.apache.hadoop.hdfs.server.namenode.NameNode$NameNodeHAContext.checkOperation(NameNode.java:1775)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkOperation(FSNamesystem.java:1402)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getFileInfo(FSNamesystem.java:4221)
at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getFileInfo(NameNodeRpcServer.java:881)
at org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.getFileInfo(AuthorizationProviderProxyClientProtocol.java:526)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getFileInfo(ClientNamenodeProtocolServerSideTranslatorPB.java:822)
at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:619)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1060)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2044)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2040)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2038)
at org.apache.hadoop.ipc.Client.call(Client.java:1405)
at org.apache.hadoop.ipc.Client.call(Client.java:1364)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
at com.sun.proxy.$Proxy9.getFileInfo(Unknown Source)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:744)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
at java.lang.reflect.Method.invoke(Unknown Source)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:191)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
at com.sun.proxy.$Proxy10.getFileInfo(Unknown Source)
at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:1912)
at org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1089)
at org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1085)
at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1085)
at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1400)
Created 06-09-2016 07:31 AM
Created 06-09-2016 06:56 PM
Harsh,
There are 3 host names at play: A, B, and C. Things have started working, actually, when I set fs.defaultFS to one of these (B); originally I was using A. I'm told, however, that all 3 are supposed to be 'active'.
>> you are passing only a single hostname for the NN
Per this comment you made, should I be passing in all 3 hostnames? If so, how? The doc states that fs.defaultFS is "The name of the default file system." so a) should all 3 names be passed and b) if so, how?
Thanks for your help.
Created 06-10-2016 02:07 AM
HDFS currently is deeply tested only with 2x NameNodes, so while you can technically run 3x NNs, not everything would behave as intended. There is work ongoing to have 2+ NameNodes in future of HDFS.
HDFS HA architecture is also Active-Standby based, so 2x NNs being active is not possible by at least HDFS HA design. If you're using CDH, then this certainly isn't available, so am unsure what they are trying to mean by 3x Active NameNodes.
As to HA configuration, it involves a few configurations that are associated to one another. Here's an example core-site.xml and hdfs-site.xml properties that are relevant to HA config description from one such cluster. You can adapt them to your hostnames, but once again I'd like to recommend you obtain a client configuration zip from your administrator to make it easier in deploying with your cluster's configs, vs. hand-setting each relevant property. If you have access to some form of command/gateway/edge host, you can also usually find such config files under its /etc/hadoop/conf/ directory:
core-site.xml
<configuration> <property> <name>fs.defaultFS</name> <value>hdfs://ha-nameservice-name</value> </property>
… </configuration>
hdfs-site.xml
<configuration> <property> <name>dfs.nameservices</name> <value>ha-nameservice-name</value> </property> <property> <name>dfs.client.failover.proxy.provider.ha-nameservice-name</name> <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value> </property> <property> <name>dfs.ha.automatic-failover.enabled.ha-nameservice-name</name> <value>true</value> </property> <property> <name>ha.zookeeper.quorum</name> <value>ZKHOST:2181</value> </property> <property> <name>dfs.ha.namenodes.ha-nameservice-name</name> <value>namenode10,namenode142</value> </property> <property> <name>dfs.namenode.rpc-address.ha-nameservice-name.namenode10</name> <value>NN1HOST:8020</value> </property> <property> <name>dfs.namenode.servicerpc-address.ha-nameservice-name.namenode10</name> <value>NN1HOST:8022</value> </property> <property> <name>dfs.namenode.http-address.ha-nameservice-name.namenode10</name> <value>NN1HOST:20101</value> </property> <property> <name>dfs.namenode.https-address.ha-nameservice-name.namenode10</name> <value>NN1HOST:20102</value> </property> <property> <name>dfs.namenode.rpc-address.ha-nameservice-name.namenode142</name> <value>NN2HOST:8020</value> </property> <property> <name>dfs.namenode.servicerpc-address.ha-nameservice-name.namenode142</name> <value>NN2HOST:8022</value> </property> <property> <name>dfs.namenode.http-address.ha-nameservice-name.namenode142</name> <value>NN2HOST:20101</value> </property> <property> <name>dfs.namenode.https-address.ha-nameservice-name.namenode142</name> <value>NN2HOST:20102</value> </property>
… </configuration>
With this configuration in place, all HDFS URIs must be accessed with FS URI hdfs://ha-nameservice-name. Ideally you want to use the same name your cluster uses, so remote services can reuse the name too, which is why grabbing an actual cluster client configuration set is important.
Created 06-10-2016 06:02 AM
Thanks, Harsh, very helpful.
I've been poking around on an edge node, so I do have access to hdfs-site.xml and core-site.xml. We may have to munge these files before we can use them, as they contain some values such as host names for fs.defaultFS which are cluster internal; we'll have to use different host names to be able to get in from outside the cluster...
Since we deal with multiple clusters organized by stage (dev, prod, etc.), we'd have to maintain multiple pairs of core-site.xml, hdfs-site.xml files, and load them dynamically at runtime via Configuration.addResource() method...
Created 06-16-2016 05:47 AM
Because of our requirements (to be able to be targeted toward a different cluster based on deployment, and the HDFS config files potentially having cluster-internal host names in them), we're going with the approach of maintaining the minimalistic set of Configuration properties required to make Kerberos work on the client side. These are, again:
* dfs.namenode.kerberos.principal
* hadoop.rpc.protection
Having said that, Harsh's comments are all valid and relevant.