Created on 06-13-2017 11:24 AM - edited 09-16-2022 04:44 AM
Hello!
I'm facing an error due to an Exception from container-launch.
This is happening with the same application, is a high resource consuming job that failed ramdomly.
I copied the logs from the hystory server web interface, the resource manager server and the nodemanager.
User: | my_user |
Name: | my_app_name |
Application Type: | MAPREDUCE |
Application Tags: | |
State: | FAILED |
FinalStatus: | FAILED |
Started: | Sat May 27 04:00:19 +0000 2017 |
Elapsed: | 12hrs, 29mins, 43sec |
Tracking URL: | History |
Diagnostics: | Application application_3091752970321_0018 failed 2 times due to AM Container for appattempt_3091752970321_0018_000002 exited with exitCode: 255 For more detailed output, check application tracking page:http://my_active_name_node:8088/proxy/application_3091752970321_0018/Then, click on links to logs of each attempt. Diagnostics: Exception from container-launch. Container id: container_3091752970321_0018_02_000001 Exit code: 255 Stack trace: ExitCodeException exitCode=255: at org.apache.hadoop.util.Shell.runCommand(Shell.java:561) at org.apache.hadoop.util.Shell.run(Shell.java:478) at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:738) at org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.launchContainer(LinuxContainerExecutor.java:367) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) Shell output: main : command provided 1 main : run as user is my_user main : requested yarn user is my_user Writing to tmp file /storage/vol/06/yarn/local/nmPrivate/application_3091752970321_0018/container_3091752970321_0018_02_000001/container_3091752970321_0018_02_000001.pid.tmp Container exited with a non-zero exit code 255 Failing this attempt. Failing the application. |
LOGS on resourcemanager
2017-05-27 16:30:03,637 WARN org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl: Unable to write fail flag file for application appattempt_3091752970321_0018_000002
org.apache.hadoop.security.AccessControlException: Permission denied: user=yarn, access=WRITE, inode="/tmp/hadoop-yarn":hdfs:supergroup:drwxrwxr-x
at org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.checkFsPermission(DefaultAuthorizationProvider.java:281)
at org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.check(DefaultAuthorizationProvider.java:262)
at org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.check(DefaultAuthorizationProvider.java:242)
at org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.checkPermission(DefaultAuthorizationProvider.java:169)
at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:152)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPermission(FSNamesystem.java:6590)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPermission(FSNamesystem.java:6572)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkAncestorAccess(FSNamesystem.java:6524)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirsInternal(FSNamesystem.java:4322)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirsInt(FSNamesystem.java:4292)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirs(FSNamesystem.java:4265)
at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.mkdirs(NameNodeRpcServer.java:867)
at org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.mkdirs(AuthorizationProviderProxyClientProtocol.java:322)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.mkdirs(ClientNamenodeProtocolServerSideTranslatorPB.java:603)
at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:617)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1073)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2086)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2082)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1693)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2080)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
at org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:106)
at org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:73)
at org.apache.hadoop.hdfs.DFSClient.primitiveMkdir(DFSClient.java:3084)
at org.apache.hadoop.hdfs.DFSClient.mkdirs(DFSClient.java:3049)
at org.apache.hadoop.hdfs.DistributedFileSystem$18.doCall(DistributedFileSystem.java:957)
at org.apache.hadoop.hdfs.DistributedFileSystem$18.doCall(DistributedFileSystem.java:953)
at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
at org.apache.hadoop.hdfs.DistributedFileSystem.mkdirsInternal(DistributedFileSystem.java:953)
at org.apache.hadoop.hdfs.DistributedFileSystem.mkdirs(DistributedFileSystem.java:946)
at org.apache.hadoop.fs.FileSystem.mkdirs(FileSystem.java:1861)
at org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.writeFlagFileForFailedAM(RMAppImpl.java:1351)
at org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.access$3500(RMAppImpl.java:115)
at org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl$AttemptFailedFinalStateSavedTransition.transition(RMAppImpl.java:1035)
at org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl$AttemptFailedFinalStateSavedTransition.transition(RMAppImpl.java:1027)
at org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl$FinalStateSavedTransition.transition(RMAppImpl.java:1016)
at org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl$FinalStateSavedTransition.transition(RMAppImpl.java:1010)
at org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385)
at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
at org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:780)
at org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl.handle(RMAppImpl.java:114)
at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationEventDispatcher.handle(ResourceManager.java:787)
at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$ApplicationEventDispatcher.handle(ResourceManager.java:771)
at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:176)
at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:108)
at java.lang.Thread.run(Thread.java:745)
Caused by: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.AccessControlException): Permission denied: user=yarn, access=WRITE, inode="/tmp/hadoop-yarn":hdfs:supergroup:drwxrwxr-x
at org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.checkFsPermission(DefaultAuthorizationProvider.java:281)
at org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.check(DefaultAuthorizationProvider.java:262)
at org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.check(DefaultAuthorizationProvider.java:242)
at org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.checkPermission(DefaultAuthorizationProvider.java:169)
at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:152)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPermission(FSNamesystem.java:6590)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPermission(FSNamesystem.java:6572)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkAncestorAccess(FSNamesystem.java:6524)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirsInternal(FSNamesystem.java:4322)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirsInt(FSNamesystem.java:4292)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirs(FSNamesystem.java:4265)
at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.mkdirs(NameNodeRpcServer.java:867)
at org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.mkdirs(AuthorizationProviderProxyClientProtocol.java:322)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.mkdirs(ClientNamenodeProtocolServerSideTranslatorPB.java:603)
at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:617)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1073)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2086)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2082)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1693)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2080)
at org.apache.hadoop.ipc.Client.call(Client.java:1471)
at org.apache.hadoop.ipc.Client.call(Client.java:1408)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:230)
at com.sun.proxy.$Proxy84.mkdirs(Unknown Source)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.mkdirs(ClientNamenodeProtocolTranslatorPB.java:544)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:256)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:104)
at com.sun.proxy.$Proxy85.mkdirs(Unknown Source)
at org.apache.hadoop.hdfs.DFSClient.primitiveMkdir(DFSClient.java:3082)
... 24 more
2017-05-27 16:30:03,653 WARN org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=my_user OPERATION=Application Finished - Failed TARGET=RMAppManager RESULT=FAILURE DESCRIPTION=App failed with state: FAILED PERMISSIONS=Application application_3091752970321_0018 failed 2 times due to AM Container for appattempt_3091752970321_0018_000002 exited with exitCode: 255
For more detailed output, check application tracking page:http://my_active_name_node:8088/proxy/application_3091752970321_0018/Then, click on links to logs of each attempt.
Diagnostics: Exception from container-launch.
Container id: container_3091752970321_0018_02_000001
Exit code: 255
Stack trace: ExitCodeException exitCode=255:
at org.apache.hadoop.util.Shell.runCommand(Shell.java:561)
at org.apache.hadoop.util.Shell.run(Shell.java:478)
at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:738)
at org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.launchContainer(LinuxContainerExecutor.java:367)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Shell output: main : command provided 1
main : run as user is my_user
main : requested yarn user is my_user
Writing to tmp file /storage/vol/06/yarn/local/nmPrivate/application_3091752970321_0018/container_3091752970321_0018_02_000001/container_3091752970321_0018_02_000001.pid.tmp
Container exited with a non-zero exit code 255
Failing this attempt. Failing the application. APPID=application_3091752970321_0018
2017-05-27 16:30:03,653 INFO org.apache.hadoop.yarn.server.resourcemanager.RMAppManager$ApplicationSummary: appId=application_3091752970321_0018,name=my_app_name,user=my_user,queue=root.my_user,state=FAILED,trackingUrl=http://my_active_name_node:8088/cluster/app/application_3091752970321_0018,appMasterHost=N/A,startTi...
LOGS on nodemanager
2017-05-27 16:28:55,543 INFO org.apache.spark.network.yarn.YarnShuffleService: Initializing container container_1495752970061_0114_01_000411
2017-05-27 16:30:02,143 WARN org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor: Exit code from container container_3091752970321_0018_02_000001 is : 255
2017-05-27 16:30:02,143 WARN org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor: Exception from container-launch with container ID: container_3091752970321_0018_02_000001 and exit code: 255
ExitCodeException exitCode=255:
at org.apache.hadoop.util.Shell.runCommand(Shell.java:561)
at org.apache.hadoop.util.Shell.run(Shell.java:478)
at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:738)
at org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.launchContainer(LinuxContainerExecutor.java:367)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
2017-05-27 16:30:02,144 WARN org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch: Container exited with a non-zero exit code 255
2017-05-27 16:30:02,189 WARN org.apache.hadoop.yarn.server.nodemanager.NMAuditLogger: USER=my_user OPERATION=Container Finished - Failed TARGET=ContainerImpl RESULT=FAILURE DESCRIPTION=Container failed with state: EXITED_WITH_FAILURE APPID=application_3091752970321_0018 CONTAINERID=container_3091752970321_0018_02_000001
2017-05-27 16:30:02,189 INFO org.apache.spark.network.yarn.YarnShuffleService: Stopping container container_3091752970321_0018_02_000001
2017-05-27 16:30:02,857 INFO SecurityLogger.org.apache.hadoop.ipc.Server: Auth successful for appattempt_3091752970321_0018_000002 (auth:SIMPLE)
2017-05-27 16:30:02,866 INFO SecurityLogger.org.apache.hadoop.security.authorize.ServiceAuthorizationManager: Authorization successful for appattempt_3091752970321_0018_000002 (auth:TOKEN) for protocol=interface org.apache.hadoop.yarn.api.ContainerManagementProtocolPB
2017-05-27 16:30:02,900 WARN org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor: Exit code from container container_3091752970321_0018_02_019321 is : 143
2017-05-27 16:30:02,955 INFO org.apache.spark.network.yarn.YarnShuffleService: Stopping container container_3091752970321_0018_02_019321
2017-05-27 16:30:03,929 WARN org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor: Exit code from container container_3091752970321_0018_02_019064 is : 143
2017-05-27 16:30:04,015 INFO SecurityLogger.org.apache.hadoop.ipc.Server: Auth successful for appattempt_1495752970061_0114_000001 (auth:SIMPLE)
2017-05-27 16:30:04,018 INFO SecurityLogger.org.apache.hadoop.security.authorize.ServiceAuthorizationManager: Authorization successful for appattempt_1495752970061_0114_000001 (auth:TOKEN) for protocol=interface org.apache.hadoop.yarn.api.ContainerManagementProtocolPB
2017-05-27 16:30:04,094 WARN org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor: Exit code from container container_3091752970321_0018_02_019136 is : 143
2017-05-27 16:30:04,205 WARN org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor: Exit code from container container_3091752970321_0018_02_019173 is : 143
2017-05-27 16:30:04,337 WARN org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor: Exit code from container container_3091752970321_0018_02_019210 is : 143
2017-05-27 16:30:04,417 WARN org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor: Exit code from container container_3091752970321_0018_02_019247 is : 143
2017-05-27 16:30:04,562 WARN org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor: Exit code from container container_3091752970321_0018_02_019284 is : 143
2017-05-27 16:30:04,604 WARN org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor: Exit code from container container_3091752970321_0018_02_019358 is : 143
2017-05-27 16:30:04,636 WARN org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor: Exit code from container container_3091752970321_0018_02_019432 is : 143
2017-05-27 16:30:04,675 WARN org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor: Exit code from container container_3091752970321_0018_02_019469 is : 143
2017-05-27 16:30:04,721 WARN org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor: Exit code from container container_3091752970321_0018_02_019540 is : 143
2017-05-27 16:30:04,767 WARN org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor: Exit code from container container_3091752970321_0018_02_019610 is : 143
2017-05-27 16:30:04,811 WARN org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor: Exit code from container container_3091752970321_0018_02_019645 is : 143
2017-05-27 16:30:04,851 WARN org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor: Exit code from container container_3091752970321_0018_02_019679 is : 143
2017-05-27 16:30:04,974 WARN org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor: Exit code from container container_3091752970321_0018_02_019715 is : 143
2017-05-27 16:30:05,086 INFO org.apache.spark.network.yarn.YarnShuffleService: Initializing container container_1495752970061_0114_01_000412
2017-05-27 16:30:05,088 INFO org.apache.spark.network.yarn.YarnShuffleService: Stopping container container_3091752970321_0018_02_019064
2017-05-27 16:30:05,088 INFO org.apache.spark.network.yarn.YarnShuffleService: Stopping container container_3091752970321_0018_02_019136
2017-05-27 16:30:05,088 INFO org.apache.spark.network.yarn.YarnShuffleService: Stopping container container_3091752970321_0018_02_019173
2017-05-27 16:30:05,088 INFO org.apache.spark.network.yarn.YarnShuffleService: Stopping container container_3091752970321_0018_02_019210
2017-05-27 16:30:05,089 INFO org.apache.spark.network.yarn.YarnShuffleService: Stopping container container_3091752970321_0018_02_019247
2017-05-27 16:30:05,089 INFO org.apache.spark.network.yarn.YarnShuffleService: Stopping container container_3091752970321_0018_02_019284
2017-05-27 16:30:05,089 INFO org.apache.spark.network.yarn.YarnShuffleService: Stopping container container_3091752970321_0018_02_019358
2017-05-27 16:30:05,089 INFO org.apache.spark.network.yarn.YarnShuffleService: Stopping container container_3091752970321_0018_02_019432
2017-05-27 16:30:05,089 INFO org.apache.spark.network.yarn.YarnShuffleService: Stopping container container_3091752970321_0018_02_019469
2017-05-27 16:30:05,089 INFO org.apache.spark.network.yarn.YarnShuffleService: Stopping container container_3091752970321_0018_02_019540
2017-05-27 16:30:05,089 INFO org.apache.spark.network.yarn.YarnShuffleService: Stopping container container_3091752970321_0018_02_019610
2017-05-27 16:30:05,089 INFO org.apache.spark.network.yarn.YarnShuffleService: Stopping container container_3091752970321_0018_02_019645
2017-05-27 16:30:05,090 INFO org.apache.spark.network.yarn.YarnShuffleService: Stopping container container_3091752970321_0018_02_019679
2017-05-27 16:30:05,090 INFO org.apache.spark.network.yarn.YarnShuffleService: Stopping container container_3091752970321_0018_02_019715
2017-05-27 16:30:05,092 INFO org.apache.spark.network.yarn.YarnShuffleService: Stopping application application_3091752970321_0018
2017-05-27 16:30:05,092 ERROR org.apache.spark.network.yarn.YarnShuffleService: Exception when stopping application application_3091752970321_0018
java.lang.NullPointerException
at org.apache.spark.network.yarn.YarnShuffleService.stopApplication(YarnShuffleService.java:174)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServices.handle(AuxServices.java:215)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServices.handle(AuxServices.java:49)
at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:176)
at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:108)
at java.lang.Thread.run(Thread.java:745)
Thanks in advance!
Created 06-13-2017 12:12 PM
Created on 06-13-2017 12:56 PM - edited 06-13-2017 12:59 PM
Thanks @mbigelow!
Sure I can.
mapreduce.map.memory.mb = 6144
mapreduce.reduce.memory.mb = 12288
mapreduce.map.java.opts = -Xmx4300m >>> this value comes from mapreduce.map.memory.mb * 0.7
mapreduce.reduce.java.opts = -Xmx8601m >>> this value comes from mapreduce.reduce.memory.mb * 0.7
I think is an OOME, I created another post with more data that confirm this, you can find it at http://community.cloudera.com/t5/Batch-Processing-and-Workflow/MapReduce-application-failed-with-Out...
The log is from other application but the job is the same.
Thanks!
Guido.
Created 06-13-2017 01:07 PM
Created 08-23-2018 10:06 PM
Kindly check log detail on Resource manager:
<clusterip>:8088
Open terminal and checl actual log problem:
yarn logs -applicationId <APP_ID>
Example: APP_ID = application_1535002188113_0001
In mine case it was showing permission issue to directory '/user/history' so i gave it
sudo -u hdfs hadoop fs -chmod 775 /user/history or sudo -u hdfs hadoop fs -chmod 777 /user/history
Then it works.