Created 08-27-2018 04:52 AM
Hadoop cluster has 8 nodes with high availability of resource manager. Active ResourceManager is in node 3. and Standby ResourceManager in node 2.
when i submit the application in cluster mode. the driver Container can be in any of the 8 nodes. If driver Container goto the node 3 (where Active ResourceManager is service is running) then i am able to open application master UI, but in other cases it is not opening, after some time ambari will show critical alert with message Connection failed to resource manager host url.
if i check the resource manager logs. It shows some access exception for spark user while calling getServiceState.
Here is the full stack strace :
2018-08-25 05:02:30,209 WARN resourcemanager.AdminService (RMServerUtils.java:verifyAdminAccess(185)) - User spark doesn't have permission to call 'getServiceState' 2018-08-25 05:02:30,210 WARN resourcemanager.RMAuditLogger (RMAuditLogger.java:logFailure(345)) - USER=spark IP=11.111.1.11 OPERATION=getServiceState TARGET=AdminService RESULT=FAILURE DESCRIPTION=Unauthorized user PERMISSIONS= 2018-08-25 05:02:30,210 INFO ipc.Server (Server.java:logException(2294)) - IPC Server handler 0 on 8033, call org.apache.hadoop.ha.HAServiceProtocol.getServiceStatus from 11.111.1.11:40169 Call#51845 Retry#0 org.apache.hadoop.security.AccessControlException: User spark doesn't have permission to call 'getServiceState' at org.apache.hadoop.yarn.server.resourcemanager.RMServerUtils.verifyAdminAccess(RMServerUtils.java:191) at org.apache.hadoop.yarn.server.resourcemanager.RMServerUtils.verifyAdminAccess(RMServerUtils.java:157) at org.apache.hadoop.yarn.server.resourcemanager.AdminService.checkAccess(AdminService.java:232) at org.apache.hadoop.yarn.server.resourcemanager.AdminService.getServiceStatus(AdminService.java:365) at org.apache.hadoop.ha.protocolPB.HAServiceProtocolServerSideTranslatorPB.getServiceStatus(HAServiceProtocolServerSideTranslatorPB.java:131) at org.apache.hadoop.ha.proto.HAServiceProtocolProtos$HAServiceProtocolService$2.callBlockingMethod(HAServiceProtocolProtos.java:4464) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:969) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2206) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2202) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1709) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2200) 2018-08-25 05:05:43,300 INFO client.DefaultHttpClient (DefaultRequestDirector.java:tryExecute(726)) - I/O exception (org.apache.http.NoHttpResponseException) caught when processing request: The target server failed to respond
Created 08-27-2018 05:47 AM
Hi @Shashi Vk,
Can you add user 'spark' in yarn.
Please accept answer if this helped.
Created 08-28-2018 06:35 AM
Hi Akhil, updated the yarn.admin.acl with yarn,spark and restarted all required components. But still facing the same issue. This time if I click on application master. Even resource manager UI is not opening for some time. resource manager lists below log for many times.
2018-08-28 06:24:17,048 INFO webproxy.WebAppProxyServlet (WebAppProxyServlet.java:doGet(370)) - dr.who is accessing unchecked http://111.11.1.11:58944 which is the app master GUI of application_1535435544017_0008 owned by spark