Support Questions
Find answers, ask questions, and share your expertise

while running spark submit on yarn getting issue

New Contributor

Log :

WARN logaggregation.LogAggregationService (LogAggregationService.java:verifyAndCreateRemoteLogDir(232)) - Remote Root Log Dir [/app-logs] already exist, but with incorrect permissions. Expected: [rwxrwxrwt], Found: [rwxrwxrwx]. The cluster may have problems with multiple users.

found JIRA AMBARI-17633 .

Let me know how can i resolve the issue .

HDP version - Hadoop 2.7.3.2.6.1.0-129

2 REPLIES 2

Re: while running spark submit on yarn getting issue

@Sayantan Dash,

This is just a Warning message and shouldn't be the problem. Can you check if there are some other error logs.

Re: while running spark submit on yarn getting issue

New Contributor

I'm not able to execute the job @ Aditya sirna.


LOG:

2018-06-14 13:12:53,921 INFO monitor.ContainersMonitorImpl (ContainersMonitorImpl.java:run(499)) - Memory usage of ProcessTree 92027 for container-id container_e29_1528274329273_0019_01_000001: 335.1 MB of 1 GB physical memory used; 2.4 GB of 2.1 GB virtual memory used

2018-06-14 13:12:56,940 INFO monitor.ContainersMonitorImpl (ContainersMonitorImpl.java:run(499)) - Memory usage of ProcessTree 92027 for container-id container_e29_1528274329273_0019_01_000001: 335.1 MB of 1 GB physical memory used; 2.4 GB of 2.1 GB virtual memory used

2018-06-14 13:12:59,996 INFO monitor.ContainersMonitorImpl (ContainersMonitorImpl.java:run(499)) - Memory usage of ProcessTree 92027 for container-id container_e29_1528274329273_0019_01_000001: 335.1 MB of 1 GB physical memory used; 2.4 GB of 2.1 GB virtual memory used

2018-06-14 13:13:03,017 INFO monitor.ContainersMonitorImpl (ContainersMonitorImpl.java:run(499)) - Memory usage of ProcessTree 92027 for container-id container_e29_1528274329273_0019_01_000001: 335.1 MB of 1 GB physical memory used; 2.4 GB of 2.1 GB virtual memory used

2018-06-14 13:13:06,037 INFO monitor.ContainersMonitorImpl (ContainersMonitorImpl.java:run(499)) - Memory usage of ProcessTree 92027 for container-id container_e29_1528274329273_0019_01_000001: 335.1 MB of 1 GB physical memory used; 2.4 GB of 2.1 GB virtual memory used

2018-06-14 13:13:09,128 INFO monitor.ContainersMonitorImpl (ContainersMonitorImpl.java:run(499)) - Memory usage of ProcessTree 92027 for container-id container_e29_1528274329273_0019_01_000001: 335.1 MB of 1 GB physical memory used; 2.4 GB of 2.1 GB virtual memory used

2018-06-14 13:13:09,940 INFO containermanager.ContainerManagerImpl (ContainerManagerImpl.java:startContainerInternal(810)) - Start request for container_e29_1528274329273_0037_01_000001 by user elf

2018-06-14 13:13:09,942 INFO containermanager.ContainerManagerImpl (ContainerManagerImpl.java:startContainerInternal(850)) - Creating a new application reference for app application_1528274329273_0037

2018-06-14 13:13:09,942 INFO application.ApplicationImpl (ApplicationImpl.java:handle(464)) - Application application_1528274329273_0037 transitioned from NEW to INITING

2018-06-14 13:13:10,021 WARN logaggregation.LogAggregationService (LogAggregationService.java:verifyAndCreateRemoteLogDir(232)) - Remote Root Log Dir [/app-logs] already exist, but with incorrect permissions. Expected: [rwxrwxrwt], Found: [rwxrwxrwx]. The cluster may have problems with multiple users.

2018-06-14 13:13:10,130 INFO application.ApplicationImpl (ApplicationImpl.java:transition(304)) - Adding container_e29_1528274329273_0037_01_000001 to application application_1528274329273_0037

2018-06-14 13:13:10,130 INFO application.ApplicationImpl (ApplicationImpl.java:handle(464)) - Application application_1528274329273_0037 transitioned from INITING to RUNNING

2018-06-14 13:13:10,130 INFO container.ContainerImpl (ContainerImpl.java:handle(1163)) - Container container_e29_1528274329273_0037_01_000001 transitioned from NEW to LOCALIZING

2018-06-14 13:13:10,130 INFO containermanager.AuxServices (AuxServices.java:handle(215)) - Got event CONTAINER_INIT for appId application_1528274329273_0037

2018-06-14 13:13:10,130 INFO yarn.YarnShuffleService (YarnShuffleService.java:initializeContainer(184)) - Initializing container container_e29_1528274329273_0037_01_000001

2018-06-14 13:13:10,130 INFO yarn.YarnShuffleService (YarnShuffleService.java:initializeContainer(287)) - Initializing container container_e29_1528274329273_0037_01_000001

2018-06-14 13:13:10,131 INFO localizer.LocalizedResource (LocalizedResource.java:handle(203)) - Resource file:/tmp/spark-a57d5eb4-528a-4d71-965b-f8afd494fbf1/__spark_libs__1218228472113576092.zip transitioned from INIT to DOWNLOADING

2018-06-14 13:13:10,131 INFO localizer.LocalizedResource (LocalizedResource.java:handle(203)) - Resource file:/home/elf/.sparkStaging/application_1528274329273_0037/__spark_conf__.zip transitioned from INIT to DOWNLOADING

2018-06-14 13:13:10,131 INFO localizer.ResourceLocalizationService (ResourceLocalizationService.java:handle(712)) - Created localizer for container_e29_1528274329273_0037_01_000001

2018-06-14 13:13:10,203 INFO localizer.ResourceLocalizationService (ResourceLocalizationService.java:writeCredentials(1194)) - Writing credentials to the nmPrivate file /hadoop/yarn/local/nmPrivate/container_e29_1528274329273_0037_01_000001.tokens. Credentials list:

2018-06-14 13:13:10,534 INFO nodemanager.DefaultContainerExecutor (DefaultContainerExecutor.java:createUserCacheDirs(646)) - Initializing user elf

2018-06-14 13:13:10,535 INFO nodemanager.DefaultContainerExecutor (DefaultContainerExecutor.java:startLocalizer(126)) - Copying from /hadoop/yarn/local/nmPrivate/container_e29_1528274329273_0037_01_000001.tokens to /hadoop/yarn/local/usercache/elf/appcache/application_1528274329273_0037/container_e29_1528274329273_0037_01_000001.tokens

2018-06-14 13:13:10,536 INFO nodemanager.DefaultContainerExecutor (DefaultContainerExecutor.java:startLocalizer(133)) - Localizer CWD set to /hadoop/yarn/local/usercache/elf/appcache/application_1528274329273_0037 = file:/hadoop/yarn/local/usercache/elf/appcache/application_1528274329273_0037

2018-06-14 13:13:10,721 WARN localizer.ResourceLocalizationService (ResourceLocalizationService.java:processHeartbeat(1017)) - { file:/tmp/spark-a57d5eb4-528a-4d71-965b-f8afd494fbf1/__spark_libs__1218228472113576092.zip, 1528962187000, ARCHIVE, null } failed: File file:/tmp/spark-a57d5eb4-528a-4d71-965b-f8afd494fbf1/__spark_libs__1218228472113576092.zip does not exist

java.io.FileNotFoundException: File file:/tmp/spark-a57d5eb4-528a-4d71-965b-f8afd494fbf1/__spark_libs__1218228472113576092.zip does not exist

at org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:624)

at org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:850)

at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:614)

at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:422)

at org.apache.hadoop.yarn.util.FSDownload.copy(FSDownload.java:253)

at org.apache.hadoop.yarn.util.FSDownload.access$000(FSDownload.java:63)

at org.apache.hadoop.yarn.util.FSDownload$2.run(FSDownload.java:361)

at org.apache.hadoop.yarn.util.FSDownload$2.run(FSDownload.java:359)

at java.security.AccessController.doPrivileged(Native Method)

at javax.security.auth.Subject.doAs(Subject.java:422)

at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1866)

at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:358)

at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:62)

at java.util.concurrent.FutureTask.run(FutureTask.java:266)

at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)

at java.util.concurrent.FutureTask.run(FutureTask.java:266)

at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)

at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)

at java.lang.Thread.run(Thread.java:745)

2018-06-14 13:13:10,721 INFO localizer.LocalizedResource (LocalizedResource.java:handle(203)) - Resource file:/tmp/spark-a57d5eb4-528a-4d71-965b-f8afd494fbf1/__spark_libs__1218228472113576092.zip(->/hadoop/yarn/local/usercache/elf/filecache/0/14727/__spark_libs__1218228472113576092.zip) transitioned from DOWNLOADING to FAILED

2018-06-14 13:13:10,721 INFO container.ContainerImpl (ContainerImpl.java:handle(1163)) - Container container_e29_1528274329273_0037_01_000001 transitioned from LOCALIZING to LOCALIZATION_FAILED

2018-06-14 13:13:10,723 INFO localizer.LocalResourcesTrackerImpl (LocalResourcesTrackerImpl.java:handle(165)) - Container container_e29_1528274329273_0037_01_000001 sent RELEASE event on a resource request { file:/tmp/spark-a57d5eb4-528a-4d71-965b-f8afd494fbf1/__spark_libs__1218228472113576092.zip, 1528962187000, ARCHIVE, null } not present in cache.

2018-06-14 13:13:10,723 INFO localizer.ResourceLocalizationService (ResourceLocalizationService.java:processHeartbeat(675)) - Unknown localizer with localizerId container_e29_1528274329273_0037_01_000001 is sending heartbeat. Ordering it to DIE

2018-06-14 13:13:10,723 WARN ipc.Client (Client.java:call(1462)) - interrupted waiting to send rpc request to server

java.lang.InterruptedException

at java.util.concurrent.FutureTask.awaitDone(FutureTask.java:404)

at java.util.concurrent.FutureTask.get(FutureTask.java:191)

at org.apache.hadoop.ipc.Client$Connection.sendRpcRequest(Client.java:1094)

at org.apache.hadoop.ipc.Client.call(Client.java:1457)

at org.apache.hadoop.ipc.Client.call(Client.java:1398)

at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:233)

at com.sun.proxy.$Proxy90.heartbeat(Unknown Source)

at org.apache.hadoop.yarn.server.nodemanager.api.impl.pb.client.LocalizationProtocolPBClientImpl.heartbeat(LocalizationProtocolPBClientImpl.java:62)

at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer.localizeFiles(ContainerLocalizer.java:257)

at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer.runLocalization(ContainerLocalizer.java:174)

at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.startLocalizer(DefaultContainerExecutor.java:139)

at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerRunner.run(ResourceLocalizationService.java:1114)

2018-06-14 13:13:10,724 INFO container.ContainerImpl (ContainerImpl.java:handle(1163)) - Container container_e29_1528274329273_0037_01_000001 transitioned from LOCALIZATION_FAILED to DONE

2018-06-14 13:13:10,725 INFO localizer.ResourceLocalizationService (ResourceLocalizationService.java:run(1134)) - Localizer failed

java.io.IOException: java.io.IOException: java.lang.InterruptedException

at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer.runLocalization(ContainerLocalizer.java:177)

at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.startLocalizer(DefaultContainerExecutor.java:139)

at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerRunner.run(ResourceLocalizationService.java:1114)

Caused by: java.io.IOException: java.lang.InterruptedException

at org.apache.hadoop.ipc.Client.call(Client.java:1463)

at org.apache.hadoop.ipc.Client.call(Client.java:1398)

at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:233)

at com.sun.proxy.$Proxy90.heartbeat(Unknown Source)

at org.apache.hadoop.yarn.server.nodemanager.api.impl.pb.client.LocalizationProtocolPBClientImpl.heartbeat(LocalizationProtocolPBClientImpl.java:62)

at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer.localizeFiles(ContainerLocalizer.java:257)

at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer.runLocalization(ContainerLocalizer.java:174)

... 2 more

Caused by: java.lang.InterruptedException

at java.util.concurrent.FutureTask.awaitDone(FutureTask.java:404)

at java.util.concurrent.FutureTask.get(FutureTask.java:191)

at org.apache.hadoop.ipc.Client$Connection.sendRpcRequest(Client.java:1094)

at org.apache.hadoop.ipc.Client.call(Client.java:1457)

... 8 more

2018-06-14 13:13:10,725 WARN event.AsyncDispatcher (AsyncDispatcher.java:handle(254)) - AsyncDispatcher thread interrupted

java.lang.InterruptedException

at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireInterruptibly(AbstractQueuedSynchronizer.java:1220)

at java.util.concurrent.locks.ReentrantLock.lockInterruptibly(ReentrantLock.java:335)

at java.util.concurrent.LinkedBlockingQueue.put(LinkedBlockingQueue.java:339)

at org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:251)

at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerRunner.run(ResourceLocalizationService.java:1138)

2018-06-14 13:13:10,725 ERROR yarn.YarnUncaughtExceptionHandler (YarnUncaughtExceptionHandler.java:uncaughtException(68)) - Thread Thread[LocalizerRunner for container_e29_1528274329273_0037_01_000001,5,main] threw an Exception.

org.apache.hadoop.yarn.exceptions.YarnRuntimeException: java.lang.InterruptedException

at org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:259)

at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerRunner.run(ResourceLocalizationService.java:1138)

Caused by: java.lang.InterruptedException

at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireInterruptibly(AbstractQueuedSynchronizer.java:1220)

at java.util.concurrent.locks.ReentrantLock.lockInterruptibly(ReentrantLock.java:335)

at java.util.concurrent.LinkedBlockingQueue.put(LinkedBlockingQueue.java:339)

at org.apache.hadoop.yarn.event.AsyncDispatcher$GenericEventHandler.handle(AsyncDispatcher.java:251)

... 1 more

2018-06-14 13:13:10,726 INFO application.ApplicationImpl (ApplicationImpl.java:transition(347)) - Removing container_e29_1528274329273_0037_01_000001 from application application_1528274329273_0037

2018-06-14 13:13:10,727 INFO logaggregation.AppLogAggregatorImpl (AppLogAggregatorImpl.java:startContainerLogAggregation(512)) - Considering container container_e29_1528274329273_0037_01_000001 for log-aggregation

2018-06-14 13:13:10,727 INFO containermanager.AuxServices (AuxServices.java:handle(215)) - Got event CONTAINER_STOP for appId application_1528274329273_0037

2018-06-14 13:13:10,727 INFO yarn.YarnShuffleService (YarnShuffleService.java:stopContainer(190)) - Stopping container container_e29_1528274329273_0037_01_000001

2018-06-14 13:13:10,727 INFO yarn.YarnShuffleService (YarnShuffleService.java:stopContainer(293)) - Stopping container container_e29_1528274329273_0037_01_000001

2018-06-14 13:13:11,744 INFO nodemanager.NodeStatusUpdaterImpl (NodeStatusUpdaterImpl.java:removeOrTrackCompletedContainersFromContext(553)) - Removed completed containers from NM context: [container_e29_1528274329273_0037_01_000001]

2018-06-14 13:13:11,748 INFO application.ApplicationImpl (ApplicationImpl.java:handle(464)) - Application application_1528274329273_0037 transitioned from RUNNING to APPLICATION_RESOURCES_CLEANINGUP

2018-06-14 13:13:11,748 INFO containermanager.AuxServices (AuxServices.java:handle(215)) - Got event APPLICATION_STOP for appId application_1528274329273_0037

2018-06-14 13:13:11,748 INFO yarn.YarnShuffleService (YarnShuffleService.java:stopApplication(171)) - Stopping application application_1528274329273_0037

2018-06-14 13:13:11,748 INFO shuffle.ExternalShuffleBlockResolver (ExternalShuffleBlockResolver.java:applicationRemoved(206)) - Application application_1528274329273_0037 removed, cleanupLocalDirs = false

2018-06-14 13:13:11,749 INFO yarn.YarnShuffleService (YarnShuffleService.java:stopApplication(266)) - Stopping application application_1528274329273_0037

2018-06-14 13:13:11,749 INFO shuffle.ExternalShuffleBlockResolver (ExternalShuffleBlockResolver.java:applicationRemoved(186)) - Application application_1528274329273_0037 removed, cleanupLocalDirs = false

2018-06-14 13:13:11,749 INFO application.ApplicationImpl (ApplicationImpl.java:handle(464)) - Application application_1528274329273_0037 transitioned from APPLICATION_RESOURCES_CLEANINGUP to FINISHED

2018-06-14 13:13:11,749 INFO logaggregation.AppLogAggregatorImpl (AppLogAggregatorImpl.java:finishLogAggregation(520)) - Application just finished : application_1528274329273_0037

2018-06-14 13:13:11,750 INFO nodemanager.DefaultContainerExecutor (DefaultContainerExecutor.java:deleteAsUser(494)) - Deleting absolute path : /hadoop/yarn/local/usercache/elf/appcache/application_1528274329273_0037

2018-06-14 13:13:11,906 INFO logaggregation.AppLogAggregatorImpl (AppLogAggregatorImpl.java:doContainerLogAggregation(567)) - Uploading logs for container container_e29_1528274329273_0037_01_000001. Current good log dirs are /hadoop/yarn/log

2018-06-14 13:13:12,138 INFO monitor.ContainersMonitorImpl (ContainersMonitorImpl.java:run(425)) - Stopping resource-monitoring for container_e29_1528274329273_0037_01_000001

2018-06-14 13:13:12,141 INFO nodemanager.DefaultContainerExecutor (DefaultContainerExecutor.java:deleteAsUser(503)) - Deleting path : /hadoop/yarn/log/application_1528274329273_0037

2018-06-14 13:13:12,172 INFO monitor.ContainersMonitorImpl (ContainersMonitorImpl.java:run(499)) - Memory usage of ProcessTree 92027 for container-id container_e29_1528274329273_0019_01_000001: 335.1 MB of 1 GB physical memory used; 2.4 GB of 2.1 GB virtual memory used

2018-06-14 13:13:15,202 INFO monitor.ContainersMonitorImpl (ContainersMonitorImpl.java:run(499)) - Memory usage of ProcessTree 92027 for container-id container_e29_1528274329273_0019_01_000001: 335.1 MB of 1 GB physical memory used; 2.4 GB of 2.1 GB virtual memory used

2018-06-14 13:13:18,251 INFO monitor.ContainersMonitorImpl (ContainersMonitorImpl.java:run(499)) - Memory usage of ProcessTree 92027 for container-id container_e29_1528274329273_0019_01_000001: 335.1 MB of 1 GB physical memory used; 2.4 GB of 2.1 GB virtual memory used

2018-06-14 13:13:21,355 INFO monitor.ContainersMonitorImpl (ContainersMonitorImpl.java:run(499)) - Memory usage of ProcessTree 92027 for container-id container_e29_1528274329273_0019_01_000001: 335.1 MB of 1 GB physical memory used; 2.4 GB of 2.1 GB virtual memory used

2018-06-14 13:13:24,373 INFO monitor.ContainersMonitorImpl (ContainersMonitorImpl.java:run(499)) - Memory usage of ProcessTree 92027 for container-id container_e29_1528274329273_0019_01_000001: 335.1 MB of 1 GB physical memory used; 2.4 GB of 2.1 GB virtual memory used

2018-06-14 13:13:27,393 INFO monitor.ContainersMonitorImpl (ContainersMonitorImpl.java:run(499)) - Memory usage of ProcessTree 92027 for container-id container_e29_1528274329273_0019_01_000001: 335.1 MB of 1 GB physical memory used; 2.4 GB of 2.1 GB virtual memory used

2018-06-14 13:13:30,417 INFO monitor.ContainersMonitorImpl (ContainersMonitorImpl.java:run(499)) - Memory usage of ProcessTree 92027 for container-id container_e29_1528274329273_0019_01_000001: 335.1 MB of 1 GB physical memory used; 2.4 GB of 2.1 GB virtual memory used

2018-06-14 13:13:33,439 INFO monitor.ContainersMonitorImpl (ContainersMonitorImpl.java:run(499)) - Memory usage of ProcessTree 92027 for container-id container_e29_1528274329273_0019_01_000001: 335.1 MB of 1 GB physical memory used; 2.4 GB of 2.1 GB virtual memory used

2018-06-14 13:13:36,498 INFO monitor.ContainersMonitorImpl (ContainersMonitorImpl.java:run(499)) - Memory usage of ProcessTree 92027 for container-id container_e29_1528274329273_0019_01_000001: 335.1 MB of 1 GB physical memory used; 2.4 GB of 2.1 GB virtual memory used

2018-06-14 13:13:39,525 INFO monitor.ContainersMonitorImpl (ContainersMonitorImpl.java:run(499)) - Memory usage of ProcessTree 92027 for container-id container_e29_1528274329273_0019_01_000001: 335.1 MB of 1 GB physical memory used; 2.4 GB of 2.1 GB virtual memory used

2018-06-14 13:13:42,561 INFO monitor.ContainersMonitorImpl (ContainersMonitorImpl.java:run(499)) - Memory usage of ProcessTree 92027 for container-id container_e29_1528274329273_0019_01_000001: 335.1 MB of 1 GB physical memory used; 2.4 GB of 2.1 GB virtual memory used