Created 06-15-2018 12:46 PM
Log :
WARN logaggregation.LogAggregationService (LogAggregationService.java:verifyAndCreateRemoteLogDir(232)) - Remote Root Log Dir [/app-logs] already exist, but with incorrect permissions. Expected: [rwxrwxrwt], Found: [rwxrwxrwx]. The cluster may have problems with multiple users.
found JIRA AMBARI-17633 .
Let me know how can i resolve the issue .
HDP version - Hadoop 2.7.3.2.6.1.0-129
Created 06-15-2018 02:46 PM
This is just a Warning message and shouldn't be the problem. Can you check if there are some other error logs.
Created 06-18-2018 10:23 AM
I'm not able to execute the job @ Aditya sirna.
LOG:
2018-06-14 13:12:53,921 INFO monitor.ContainersMonitorImpl (ContainersMonitorImpl.java:run(499)) - Memory usage of ProcessTree 92027 for container-id container_e29_1528274329273_0019_01_000001: 335.1 MB of 1 GB physical memory used; 2.4 GB of 2.1 GB virtual memory used
2018-06-14 13:12:56,940 INFO monitor.ContainersMonitorImpl (ContainersMonitorImpl.java:run(499)) - Memory usage of ProcessTree 92027 for container-id container_e29_1528274329273_0019_01_000001: 335.1 MB of 1 GB physical memory used; 2.4 GB of 2.1 GB virtual memory used
2018-06-14 13:12:59,996 INFO monitor.ContainersMonitorImpl (ContainersMonitorImpl.java:run(499)) - Memory usage of ProcessTree 92027 for container-id container_e29_1528274329273_0019_01_000001: 335.1 MB of 1 GB physical memory used; 2.4 GB of 2.1 GB virtual memory used
2018-06-14 13:13:03,017 INFO monitor.ContainersMonitorImpl (ContainersMonitorImpl.java:run(499)) - Memory usage of ProcessTree 92027 for container-id container_e29_1528274329273_0019_01_000001: 335.1 MB of 1 GB physical memory used; 2.4 GB of 2.1 GB virtual memory used
2018-06-14 13:13:06,037 INFO monitor.ContainersMonitorImpl (ContainersMonitorImpl.java:run(499)) - Memory usage of ProcessTree 92027 for container-id container_e29_1528274329273_0019_01_000001: 335.1 MB of 1 GB physical memory used; 2.4 GB of 2.1 GB virtual memory used
2018-06-14 13:13:09,128 INFO monitor.ContainersMonitorImpl (ContainersMonitorImpl.java:run(499)) - Memory usage of ProcessTree 92027 for container-id container_e29_1528274329273_0019_01_000001: 335.1 MB of 1 GB physical memory used; 2.4 GB of 2.1 GB virtual memory used
2018-06-14 13:13:09,940 INFO containermanager.ContainerManagerImpl (ContainerManagerImpl.java:startContainerInternal(810)) - Start request for container_e29_1528274329273_0037_01_000001 by user elf
2018-06-14 13:13:09,942 INFO containermanager.ContainerManagerImpl (ContainerManagerImpl.java:startContainerInternal(850)) - Creating a new application reference for app application_1528274329273_0037
2018-06-14 13:13:09,942 INFO application.ApplicationImpl (ApplicationImpl.java:handle(464)) - Application application_1528274329273_0037 transitioned from NEW to INITING
2018-06-14 13:13:10,021 WARN logaggregation.LogAggregationService (LogAggregationService.java:verifyAndCreateRemoteLogDir(232)) - Remote Root Log Dir [/app-logs] already exist, but with incorrect permissions. Expected: [rwxrwxrwt], Found: [rwxrwxrwx]. The cluster may have problems with multiple users.
2018-06-14 13:13:10,130 INFO application.ApplicationImpl (ApplicationImpl.java:transition(304)) - Adding container_e29_1528274329273_0037_01_000001 to application application_1528274329273_0037
2018-06-14 13:13:10,130 INFO application.ApplicationImpl (ApplicationImpl.java:handle(464)) - Application application_1528274329273_0037 transitioned from INITING to RUNNING
2018-06-14 13:13:10,130 INFO container.ContainerImpl (ContainerImpl.java:handle(1163)) - Container container_e29_1528274329273_0037_01_000001 transitioned from NEW to LOCALIZING
2018-06-14 13:13:10,130 INFO containermanager.AuxServices (AuxServices.java:handle(215)) - Got event CONTAINER_INIT for appId application_1528274329273_0037
2018-06-14 13:13:10,130 INFO yarn.YarnShuffleService (YarnShuffleService.java:initializeContainer(184)) - Initializing container container_e29_1528274329273_0037_01_000001
2018-06-14 13:13:10,130 INFO yarn.YarnShuffleService (YarnShuffleService.java:initializeContainer(287)) - Initializing container container_e29_1528274329273_0037_01_000001
2018-06-14 13:13:10,131 INFO localizer.LocalizedResource (LocalizedResource.java:handle(203)) - Resource file:/tmp/spark-a57d5eb4-528a-4d71-965b-f8afd494fbf1/__spark_libs__1218228472113576092.zip transitioned from INIT to DOWNLOADING
2018-06-14 13:13:10,131 INFO localizer.LocalizedResource (LocalizedResource.java:handle(203)) - Resource file:/home/elf/.sparkStaging/application_1528274329273_0037/__spark_conf__.zip transitioned from INIT to DOWNLOADING
2018-06-14 13:13:10,131 INFO localizer.ResourceLocalizationService (ResourceLocalizationService.java:handle(712)) - Created localizer for container_e29_1528274329273_0037_01_000001
2018-06-14 13:13:10,203 INFO localizer.ResourceLocalizationService (ResourceLocalizationService.java:writeCredentials(1194)) - Writing credentials to the nmPrivate file /hadoop/yarn/local/nmPrivate/container_e29_1528274329273_0037_01_000001.tokens. Credentials list:
2018-06-14 13:13:10,534 INFO nodemanager.DefaultContainerExecutor (DefaultContainerExecutor.java:createUserCacheDirs(646)) - Initializing user elf
2018-06-14 13:13:10,535 INFO nodemanager.DefaultContainerExecutor (DefaultContainerExecutor.java:startLocalizer(126)) - Copying from /hadoop/yarn/local/nmPrivate/container_e29_1528274329273_0037_01_000001.tokens to /hadoop/yarn/local/usercache/elf/appcache/application_1528274329273_0037/container_e29_1528274329273_0037_01_000001.tokens
2018-06-14 13:13:10,536 INFO nodemanager.DefaultContainerExecutor (DefaultContainerExecutor.java:startLocalizer(133)) - Localizer CWD set to /hadoop/yarn/local/usercache/elf/appcache/application_1528274329273_0037 = file:/hadoop/yarn/local/usercache/elf/appcache/application_1528274329273_0037
2018-06-14 13:13:10,721 WARN localizer.ResourceLocalizationService (ResourceLocalizationService.java:processHeartbeat(1017)) - { file:/tmp/spark-a57d5eb4-528a-4d71-965b-f8afd494fbf1/__spark_libs__1218228472113576092.zip, 1528962187000, ARCHIVE, null } failed: File file:/tmp/spark-a57d5eb4-528a-4d71-965b-f8afd494fbf1/__spark_libs__1218228472113576092.zip does not exist
java.io.FileNotFoundException: File file:/tmp/spark-a57d5eb4-528a-4d71-965b-f8afd494fbf1/__spark_libs__1218228472113576092.zip does not exist
at org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:624)
at org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:850)
at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:614)
at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:422)
at org.apache.hadoop.yarn.util.FSDownload.copy(FSDownload.java:253)
at org.apache.hadoop.yarn.util.FSDownload.access$000(FSDownload.java:63)
at org.apache.hadoop.yarn.util.FSDownload$2.run(FSDownload.java:361)
at org.apache.hadoop.yarn.util.FSDownload$2.run(FSDownload.java:359)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1866)
at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:358)
at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:62)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
2018-06-14 13:13:10,721 INFO localizer.LocalizedResource (LocalizedResource.java:handle(203)) - Resource file:/tmp/spark-a57d5eb4-528a-4d71-965b-f8afd494fbf1/__spark_libs__1218228472113576092.zip(->/hadoop/yarn/local/usercache/elf/filecache/0/14727/__spark_libs__1218228472113576092.zip) transitioned from DOWNLOADING to FAILED
2018-06-14 13:13:10,721 INFO container.ContainerImpl (ContainerImpl.java:handle(1163)) - Container container_e29_1528274329273_0037_01_000001 transitioned from LOCALIZING to LOCALIZATION_FAILED
2018-06-14 13:13:10,723 INFO localizer.LocalResourcesTrackerImpl (LocalResourcesTrackerImpl.java:handle(165)) - Container container_e29_1528274329273_0037_01_000001 sent RELEASE event on a resource request { file:/tmp/spark-a57d5eb4-528a-4d71-965b-f8afd494fbf1/__spark_libs__1218228472113576092.zip, 1528962187000, ARCHIVE, null } not present in cache.
2018-06-14 13:13:10,723 INFO localizer.ResourceLocalizationService (ResourceLocalizationService.java:processHeartbeat(675)) - Unknown localizer with localizerId container_e29_1528274329273_0037_01_000001 is sending heartbeat. Ordering it to DIE
2018-06-14 13:13:10,723 WARN ipc.Client (Client.java:call(1462)) - interrupted waiting to send rpc request to server
java.lang.InterruptedException
at java.util.concurrent.FutureTask.awaitDone(FutureTask.java:404)
at java.util.concurrent.FutureTask.get(FutureTask.java:191)
at org.apache.hadoop.ipc.Client$Connection.sendRpcRequest(Client.java:1094)
at org.apache.hadoop.ipc.Client.call(Client.java:1457)
at org.apache.hadoop.ipc.Client.call(Client.java:1398)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:233)
at com.sun.proxy.$Proxy90.heartbeat(Unknown Source)
at org.apache.hadoop.yarn.server.nodemanager.api.impl.pb.client.LocalizationProtocolPBClientImpl.heartbeat(LocalizationProtocolPBClientImpl.java:62)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer.localizeFiles(ContainerLocalizer.java:257)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer.runLocalization(ContainerLocalizer.java:174)
at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.startLocalizer(DefaultContainerExecutor.java:139)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerRunner.run(ResourceLocalizationService.java:1114)
2018-06-14 13:13:10,724 INFO container.ContainerImpl (ContainerImpl.java:handle(1163)) - Container container_e29_1528274329273_0037_01_000001 transitioned from LOCALIZATION_FAILED to DONE
2018-06-14 13:13:10,725 INFO localizer.ResourceLocalizationService (ResourceLocalizationService.java:run(1134)) - Localizer failed
java.io.IOException: java.io.IOException: java.lang.InterruptedException
at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer.runLocalization(ContainerLocalizer.java:177)
at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.startLocalizer(DefaultContainerExecutor.java:139)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerRunner.run(ResourceLocalizationService.java:1114)
Caused by: java.io.IOException: java.lang.InterruptedException
at org.apache.hadoop.ipc.Client.call(Client.java:1463)
at org.apache.hadoop.ipc.Client.call(Client.java:1398)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:233)
at com.sun.proxy.$Proxy90.heartbeat(Unknown Source)
at org.apache.hadoop.yarn.server.nodemanager.api.impl.pb.client.LocalizationProtocolPBClientImpl.heartbeat(LocalizationProtocolPBClientImpl.java:62)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer.localizeFiles(ContainerLocalizer.java:257)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer.runLocalization(ContainerLocalizer.java:174)
... 2 more
Caused by: java.lang.InterruptedException
at java.util.concurrent.FutureTask.awaitDone(FutureTask.java:404)
at java.util.concurrent.FutureTask.get(FutureTask.java:191)
at org.apache.hadoop.ipc.Client$Connection.sendRpcRequest(Client.java:1094)
at org.apache.hadoop.ipc.Client.call(Client.java:1457)
... 8 more
2018-06-14 13:13:10,725 WARN event.AsyncDispatcher (AsyncDispatcher.java:handle(254)) - AsyncDispatcher thread interrupted
java.lang.InterruptedException
at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireInterruptibly(AbstractQueuedSynchronizer.java:1220)
at java.util.concurrent.locks.ReentrantLock.lockInterruptibly(ReentrantLock.java:335)
at java.util.concurrent.LinkedBlockingQueue.put(LinkedBlockingQueue.java:339)
at org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:251)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerRunner.run(ResourceLocalizationService.java:1138)
2018-06-14 13:13:10,725 ERROR yarn.YarnUncaughtExceptionHandler (YarnUncaughtExceptionHandler.java:uncaughtException(68)) - Thread Thread[LocalizerRunner for container_e29_1528274329273_0037_01_000001,5,main] threw an Exception.
org.apache.hadoop.yarn.exceptions.YarnRuntimeException: java.lang.InterruptedException
at org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:259)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerRunner.run(ResourceLocalizationService.java:1138)
Caused by: java.lang.InterruptedException
at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireInterruptibly(AbstractQueuedSynchronizer.java:1220)
at java.util.concurrent.locks.ReentrantLock.lockInterruptibly(ReentrantLock.java:335)
at java.util.concurrent.LinkedBlockingQueue.put(LinkedBlockingQueue.java:339)
at org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:251)
... 1 more
2018-06-14 13:13:10,726 INFO application.ApplicationImpl (ApplicationImpl.java:transition(347)) - Removing container_e29_1528274329273_0037_01_000001 from application application_1528274329273_0037
2018-06-14 13:13:10,727 INFO logaggregation.AppLogAggregatorImpl (AppLogAggregatorImpl.java:startContainerLogAggregation(512)) - Considering container container_e29_1528274329273_0037_01_000001 for log-aggregation
2018-06-14 13:13:10,727 INFO containermanager.AuxServices (AuxServices.java:handle(215)) - Got event CONTAINER_STOP for appId application_1528274329273_0037
2018-06-14 13:13:10,727 INFO yarn.YarnShuffleService (YarnShuffleService.java:stopContainer(190)) - Stopping container container_e29_1528274329273_0037_01_000001
2018-06-14 13:13:10,727 INFO yarn.YarnShuffleService (YarnShuffleService.java:stopContainer(293)) - Stopping container container_e29_1528274329273_0037_01_000001
2018-06-14 13:13:11,744 INFO nodemanager.NodeStatusUpdaterImpl (NodeStatusUpdaterImpl.java:removeOrTrackCompletedContainersFromContext(553)) - Removed completed containers from NM context: [container_e29_1528274329273_0037_01_000001]
2018-06-14 13:13:11,748 INFO application.ApplicationImpl (ApplicationImpl.java:handle(464)) - Application application_1528274329273_0037 transitioned from RUNNING to APPLICATION_RESOURCES_CLEANINGUP
2018-06-14 13:13:11,748 INFO containermanager.AuxServices (AuxServices.java:handle(215)) - Got event APPLICATION_STOP for appId application_1528274329273_0037
2018-06-14 13:13:11,748 INFO yarn.YarnShuffleService (YarnShuffleService.java:stopApplication(171)) - Stopping application application_1528274329273_0037
2018-06-14 13:13:11,748 INFO shuffle.ExternalShuffleBlockResolver (ExternalShuffleBlockResolver.java:applicationRemoved(206)) - Application application_1528274329273_0037 removed, cleanupLocalDirs = false
2018-06-14 13:13:11,749 INFO yarn.YarnShuffleService (YarnShuffleService.java:stopApplication(266)) - Stopping application application_1528274329273_0037
2018-06-14 13:13:11,749 INFO shuffle.ExternalShuffleBlockResolver (ExternalShuffleBlockResolver.java:applicationRemoved(186)) - Application application_1528274329273_0037 removed, cleanupLocalDirs = false
2018-06-14 13:13:11,749 INFO application.ApplicationImpl (ApplicationImpl.java:handle(464)) - Application application_1528274329273_0037 transitioned from APPLICATION_RESOURCES_CLEANINGUP to FINISHED
2018-06-14 13:13:11,749 INFO logaggregation.AppLogAggregatorImpl (AppLogAggregatorImpl.java:finishLogAggregation(520)) - Application just finished : application_1528274329273_0037
2018-06-14 13:13:11,750 INFO nodemanager.DefaultContainerExecutor (DefaultContainerExecutor.java:deleteAsUser(494)) - Deleting absolute path : /hadoop/yarn/local/usercache/elf/appcache/application_1528274329273_0037
2018-06-14 13:13:11,906 INFO logaggregation.AppLogAggregatorImpl (AppLogAggregatorImpl.java:doContainerLogAggregation(567)) - Uploading logs for container container_e29_1528274329273_0037_01_000001. Current good log dirs are /hadoop/yarn/log
2018-06-14 13:13:12,138 INFO monitor.ContainersMonitorImpl (ContainersMonitorImpl.java:run(425)) - Stopping resource-monitoring for container_e29_1528274329273_0037_01_000001
2018-06-14 13:13:12,141 INFO nodemanager.DefaultContainerExecutor (DefaultContainerExecutor.java:deleteAsUser(503)) - Deleting path : /hadoop/yarn/log/application_1528274329273_0037
2018-06-14 13:13:12,172 INFO monitor.ContainersMonitorImpl (ContainersMonitorImpl.java:run(499)) - Memory usage of ProcessTree 92027 for container-id container_e29_1528274329273_0019_01_000001: 335.1 MB of 1 GB physical memory used; 2.4 GB of 2.1 GB virtual memory used
2018-06-14 13:13:15,202 INFO monitor.ContainersMonitorImpl (ContainersMonitorImpl.java:run(499)) - Memory usage of ProcessTree 92027 for container-id container_e29_1528274329273_0019_01_000001: 335.1 MB of 1 GB physical memory used; 2.4 GB of 2.1 GB virtual memory used
2018-06-14 13:13:18,251 INFO monitor.ContainersMonitorImpl (ContainersMonitorImpl.java:run(499)) - Memory usage of ProcessTree 92027 for container-id container_e29_1528274329273_0019_01_000001: 335.1 MB of 1 GB physical memory used; 2.4 GB of 2.1 GB virtual memory used
2018-06-14 13:13:21,355 INFO monitor.ContainersMonitorImpl (ContainersMonitorImpl.java:run(499)) - Memory usage of ProcessTree 92027 for container-id container_e29_1528274329273_0019_01_000001: 335.1 MB of 1 GB physical memory used; 2.4 GB of 2.1 GB virtual memory used
2018-06-14 13:13:24,373 INFO monitor.ContainersMonitorImpl (ContainersMonitorImpl.java:run(499)) - Memory usage of ProcessTree 92027 for container-id container_e29_1528274329273_0019_01_000001: 335.1 MB of 1 GB physical memory used; 2.4 GB of 2.1 GB virtual memory used
2018-06-14 13:13:27,393 INFO monitor.ContainersMonitorImpl (ContainersMonitorImpl.java:run(499)) - Memory usage of ProcessTree 92027 for container-id container_e29_1528274329273_0019_01_000001: 335.1 MB of 1 GB physical memory used; 2.4 GB of 2.1 GB virtual memory used
2018-06-14 13:13:30,417 INFO monitor.ContainersMonitorImpl (ContainersMonitorImpl.java:run(499)) - Memory usage of ProcessTree 92027 for container-id container_e29_1528274329273_0019_01_000001: 335.1 MB of 1 GB physical memory used; 2.4 GB of 2.1 GB virtual memory used
2018-06-14 13:13:33,439 INFO monitor.ContainersMonitorImpl (ContainersMonitorImpl.java:run(499)) - Memory usage of ProcessTree 92027 for container-id container_e29_1528274329273_0019_01_000001: 335.1 MB of 1 GB physical memory used; 2.4 GB of 2.1 GB virtual memory used
2018-06-14 13:13:36,498 INFO monitor.ContainersMonitorImpl (ContainersMonitorImpl.java:run(499)) - Memory usage of ProcessTree 92027 for container-id container_e29_1528274329273_0019_01_000001: 335.1 MB of 1 GB physical memory used; 2.4 GB of 2.1 GB virtual memory used
2018-06-14 13:13:39,525 INFO monitor.ContainersMonitorImpl (ContainersMonitorImpl.java:run(499)) - Memory usage of ProcessTree 92027 for container-id container_e29_1528274329273_0019_01_000001: 335.1 MB of 1 GB physical memory used; 2.4 GB of 2.1 GB virtual memory used
2018-06-14 13:13:42,561 INFO monitor.ContainersMonitorImpl (ContainersMonitorImpl.java:run(499)) - Memory usage of ProcessTree 92027 for container-id container_e29_1528274329273_0019_01_000001: 335.1 MB of 1 GB physical memory used; 2.4 GB of 2.1 GB virtual memory used