01-10-2018
07:18 AM
- last edited on
01-10-2018
07:34 AM
by
cjervis
I am running a spark streaming application that simply read messages from a Kafka topic, enrich them and then write the enriched messages in another kafka topic. I already tried it in Standalone mode (both client and cluster deploy mode) and in YARN client mode, successfully. When I submit the application in cluster mode it gives me the following messages:
18/01/10 12:13:34 INFO Client: Submitting application application_1515582681419_0001 to ResourceManager 18/01/10 12:13:34 INFO YarnClientImpl: Submitted application application_1515582681419_0001 18/01/10 12:13:35 INFO Client: Application report for application_1515582681419_0001 (state: ACCEPTED) 18/01/10 12:13:35 INFO Client: client token: N/A diagnostics: AM container is launched, waiting for AM container to Register with RM ApplicationMaster host: N/A ApplicationMaster RPC port: -1 queue: default start time: 1515582814080 final status: UNDEFINED tracking URL: http://ambari1.internal:8088/proxy/application_1515582681419_0001/ user: root 18/01/10 12:13:36 INFO Client: Application report for application_1515582681419_0001 (state: ACCEPTED) 18/01/10 12:13:37 INFO Client: Application report for application_1515582681419_0001 (state: ACCEPTED)
And keeps stuck in ACCEPTED Status until after around 4-5 minutes, exit with the following error message:
18/01/10 12:17:00 INFO InputInfoTracker: remove old batch metadata: 1515583000000 ms 18/01/10 12:17:02 ERROR ApplicationMaster: Uncaught exception: java.util.concurrent.TimeoutException: Futures timed out after [100000 milliseconds] at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:219) at scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223) at org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:201) at org.apache.spark.deploy.yarn.ApplicationMaster.runDriver(ApplicationMaster.scala:423) at org.apache.spark.deploy.yarn.ApplicationMaster.run(ApplicationMaster.scala:282) at org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$main$1.apply$mcV$sp(ApplicationMaster.scala:768) at org.apache.spark.deploy.SparkHadoopUtil$$anon$2.run(SparkHadoopUtil.scala:67) at org.apache.spark.deploy.SparkHadoopUtil$$anon$2.run(SparkHadoopUtil.scala:66) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1866) at org.apache.spark.deploy.SparkHadoopUtil.runAsSparkUser(SparkHadoopUtil.scala:66) at org.apache.spark.deploy.yarn.ApplicationMaster$.main(ApplicationMaster.scala:766) at org.apache.spark.deploy.yarn.ApplicationMaster.main(ApplicationMaster.scala) 18/01/10 12:17:02 INFO ApplicationMaster: Final app status: FAILED, exitCode: 10, (reason: Uncaught exception: java.util.concurrent.TimeoutException: Futures timed out after [100000 milliseconds]) 18/01/10 12:17:02 INFO StreamingContext: Invoking stop(stopGracefully=false) from shutdown hook 18/01/10 12:17:02 INFO ReceiverTracker: ReceiverTracker stopped 18/01/10 12:17:02 INFO JobGenerator: Stopping JobGenerator immediately
Funny fact: If I visit the age of the application, I can see that the Spark Context has been started and it processes some messages.
Could anyone help me on this? PS: These are the resources of my YARN cluster:
01-10-2018 07:51 PM
Could you share the resourcemanager logs and nodemanager logs to dig more
01-11-2018 12:06 AM
In addition to csguna's request for rm log, can you please share are you using command line to submit the job? If yes then please share the property file.
01-11-2018 02:21 AM
Thank you guys for replying.
I wanted to edit the question in order to give more logs there. But I don't think I can... So... here you are:
hadoop-mapreduce.jobsummary.log 2018-01-10 12:11:23,189 INFO resourcemanager.RMAppManager$ApplicationSummary: appId=application_1515514117634_0006,name=iit.cnr.it.socialpipeline.NormalizeJSON,user=root,queue=default,state=FAILED,trackingUrl=http://ambari1.internal:8088/proxy/application_1515514117634_0006/,appMasterHost=N/A,startTime=1515521579503,finishTime=1515521786904,finalStatus=FAILED,memorySeconds=421688,vcoreSeconds=205,preemptedAMContainers=0,preemptedNonAMContainers=0,preemptedResources=<memory:0\, vCores:0>,applicationType=SPARK 2018-01-10 12:11:23,189 INFO resourcemanager.RMAppManager$ApplicationSummary: appId=application_1515521947770_0001,name=iit.cnr.it.socialpipeline.NormalizeJSON,user=root,queue=default,state=FAILED,trackingUrl=http://ambari1.internal:8088/proxy/application_1515521947770_0001/,appMasterHost=N/A,startTime=1515522042260,finishTime=1515522252527,finalStatus=FAILED,memorySeconds=320665,vcoreSeconds=208,preemptedAMContainers=0,preemptedNonAMContainers=0,preemptedResources=<memory:0\, vCores:0>,applicationType=SPARK 2018-01-10 12:11:23,190 INFO resourcemanager.RMAppManager$ApplicationSummary: appId=application_1515521947770_0002,name=iit.cnr.it.socialpipeline.NormalizeJSON,user=root,queue=default,state=FAILED,trackingUrl=http://ambari1.internal:8088/proxy/application_1515521947770_0002/,appMasterHost=N/A,startTime=1515581922574,finishTime=1515582130269,finalStatus=FAILED,memorySeconds=316332,vcoreSeconds=205,preemptedAMContainers=0,preemptedNonAMContainers=0,preemptedResources=<memory:0\, vCores:0>,applicationType=SPARK 2018-01-10 12:17:02,638 INFO resourcemanager.RMAppManager$ApplicationSummary: appId=application_1515582681419_0001,name=iit.cnr.it.socialpipeline.NormalizeJSON,user=root,queue=default,state=FAILED,trackingUrl=http://ambari1.internal:8088/cluster/app/application_1515582681419_0001,appMasterHost=N/A,startTime=1515582814080,finishTime=1515583022633,finalStatus=FAILED,memorySeconds=317834,vcoreSeconds=206,preemptedAMContainers=0,preemptedNonAMContainers=0,preemptedResources=<memory:0\, vCores:0>,applicationType=SPARK 2018-01-10 14:32:07,537 INFO resourcemanager.RMAppManager$ApplicationSummary: appId=application_1515582681419_0002,name=iit.cnr.it.socialpipeline.NormalizeJSON,user=root,queue=default,state=FAILED,trackingUrl=http://ambari1.internal:8088/cluster/app/application_1515582681419_0002,appMasterHost=N/A,startTime=1515590479857,finishTime=1515591127527,finalStatus=FAILED,memorySeconds=992297,vcoreSeconds=645,preemptedAMContainers=0,preemptedNonAMContainers=0,preemptedResources=<memory:0\, vCores:0>,applicationType=SPARK 2018-01-10 15:00:56,548 INFO resourcemanager.RMAppManager$ApplicationSummary: appId=application_1515582681419_0003,name=iit.cnr.it.socialpipeline.NormalizeJSON,user=root,queue=default,state=FAILED,trackingUrl=http://ambari1.internal:8088/cluster/app/application_1515582681419_0003,appMasterHost=N/A,startTime=1515592208420,finishTime=1515592856545,finalStatus=FAILED,memorySeconds=1654602,vcoreSeconds=645,preemptedAMContainers=0,preemptedNonAMContainers=0,preemptedResources=<memory:0\, vCores:0>,applicationType=SPARK
rm-audit.log Container exited with a non-zero exit code 10. Error file: prelaunch.err. Last 4096 bytes of prelaunch.err : Last 4096 bytes of stderr : f time 1515583020000 ms 18/01/10 12:17:00 INFO JobScheduler: Total delay: 0.013 s for time 1515583020000 ms (execution: 0.009 s) 18/01/10 12:17:00 INFO KafkaRDD: Removing RDD 8 from persistence list 18/01/10 12:17:00 INFO BlockManager: Removing RDD 8 18/01/10 12:17:00 INFO ReceivedBlockTracker: Deleting batches: 18/01/10 12:17:00 INFO InputInfoTracker: remove old batch metadata: 1515583000000 ms 18/01/10 12:17:02 ERROR ApplicationMaster: Uncaught exception: java.util.concurrent.TimeoutException: Futures timed out after [100000 milliseconds] at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:219) at scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223) at org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:201) at org.apache.spark.deploy.yarn.ApplicationMaster.runDriver(ApplicationMaster.scala:423) at org.apache.spark.deploy.yarn.ApplicationMaster.run(ApplicationMaster.scala:282) at org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$main$1.apply$mcV$sp(ApplicationMaster.scala:768) at org.apache.spark.deploy.SparkHadoopUtil$$anon$2.run(SparkHadoopUtil.scala:67) at org.apache.spark.deploy.SparkHadoopUtil$$anon$2.run(SparkHadoopUtil.scala:66) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1866) at org.apache.spark.deploy.SparkHadoopUtil.runAsSparkUser(SparkHadoopUtil.scala:66) at org.apache.spark.deploy.yarn.ApplicationMaster$.main(ApplicationMaster.scala:766) at org.apache.spark.deploy.yarn.ApplicationMaster.main(ApplicationMaster.scala) 18/01/10 12:17:02 INFO ApplicationMaster: Final app status: FAILED, exitCode: 10, (reason: Uncaught exception: java.util.concurrent.TimeoutException: Futures timed out after [100000 milliseconds]) 18/01/10 12:17:02 INFO StreamingContext: Invoking stop(stopGracefully=false) from shutdown hook 18/01/10 12:17:02 INFO ReceiverTracker: ReceiverTracker stopped 18/01/10 12:17:02 INFO JobGenerator: Stopping JobGenerator immediately 18/01/10 12:17:02 INFO RecurringTimer: Stopped timer for JobGenerator after time 1515583020000 18/01/10 12:17:02 INFO JobGenerator: Stopped JobGenerator 18/01/10 12:17:02 INFO JobScheduler: Stopped JobScheduler 18/01/10 12:17:02 INFO ContextHandler: Stopped o.s.j.s.ServletContextHandler@6fd4adf6{/streaming,null,UNAVAILABLE,@Spark} 18/01/10 12:17:02 INFO ContextHandler: Stopped o.s.j.s.ServletContextHandler@4db63574{/streaming/batch,null,UNAVAILABLE,@Spark} 18/01/10 12:17:02 INFO ContextHandler: Stopped o.s.j.s.ServletContextHandler@5aa67d0f{/static/streaming,null,UNAVAILABLE,@Spark} 18/01/10 12:17:02 INFO StreamingContext: StreamingContext stopped successfully 18/01/10 12:17:02 INFO SparkContext: Invoking stop() from shutdown hook 18/01/10 12:17:02 INFO AbstractConnector: Stopped Spark@39c6ad48{HTTP/1.1,[http/1.1]}{0.0.0.0:0} 18/01/10 12:17:02 INFO SparkUI: Stopped Spark web UI at http://192.168.100.25:34356 18/01/10 12:17:02 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped! 18/01/10 12:17:02 INFO MemoryStore: MemoryStore cleared 18/01/10 12:17:02 INFO BlockManager: BlockManager stopped 18/01/10 12:17:02 INFO BlockManagerMaster: BlockManagerMaster stopped 18/01/10 12:17:02 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped! 18/01/10 12:17:02 INFO SparkContext: Successfully stopped SparkContext 18/01/10 12:17:02 INFO ApplicationMaster: Unregistering ApplicationMaster with FAILED (diag message: Uncaught exception: java.util.concurrent.TimeoutException: Futures timed out after [100000 milliseconds]) 18/01/10 12:17:02 INFO ApplicationMaster: Deleting staging directory hdfs://ambari1.internal:8020/user/root/.sparkStaging/application_1515582681419_0001 18/01/10 12:17:02 INFO ShutdownHookManager: Shutdown hook called 18/01/10 12:17:02 INFO ShutdownHookManager: Deleting directory /hadoop/yarn/local/usercache/root/appcache/application_1515582681419_0001/spark-58754dbc-fb87-4868-a4df-164f898d3ff5 Failing this attempt. Failing the application. APPID=application_1515582681419_0001 2018-01-10 15:25:31,389 WARN resourcemanager.RMAuditLogger: USER=root OPERATION=Application Finished - Failed TARGET=RMAppManager RESULT=FAILURE DESCRIPTION=App failed with state: FAILED PERMISSIONS=Application application_1515582681419_0002 failed 2 times due to AM Container for appattempt_1515582681419_0002_000002 exited with exitCode: 10 For more detailed output, check the application tracking page: http://ambari1.internal:8088/cluster/app/application_1515582681419_0002 Then click on links to logs of each attempt. Diagnostics: Exception from container-launch. Container id: container_e05_1515582681419_0002_02_000001 Exit code: 10
01-11-2018 02:24 AM
2018-01-10 12:13:32,405 INFO resourcemanager.ClientRMService (ClientRMService.java:getNewApplicationId(297)) - Allocated new applicationId: 1 2018-01-10 12:13:34,080 WARN rmapp.RMAppImpl (RMAppImpl.java:<init>(423)) - The specific max attempts: 0 for application: 1 is invalid, because it is out of the range [1, 2]. Use the global max attempts instead. 2018-01-10 12:13:34,081 INFO rmapp.RMAppImpl (RMAppImpl.java:transition(1105)) - Storing application with id application_1515582681419_0001 2018-01-10 12:13:34,082 INFO resourcemanager.ClientRMService (ClientRMService.java:submitApplication(615)) - Application with id 1 submitted by user root 2018-01-10 12:13:34,084 INFO rmapp.RMAppImpl (RMAppImpl.java:handle(779)) - application_1515582681419_0001 State change from NEW to NEW_SAVING 2018-01-10 12:13:34,084 INFO recovery.RMStateStore (RMStateStore.java:transition(199)) - Storing info for app: application_1515582681419_0001 2018-01-10 12:13:34,095 INFO rmapp.RMAppImpl (RMAppImpl.java:handle(779)) - application_1515582681419_0001 State change from NEW_SAVING to SUBMITTED 2018-01-10 12:13:34,096 INFO capacity.ParentQueue (ParentQueue.java:addApplication(360)) - Application added - appId: application_1515582681419_0001 user: root leaf-queue of parent: root #applications: 1 2018-01-10 12:13:34,097 INFO capacity.CapacityScheduler (CapacityScheduler.java:addApplication(744)) - Accepted application application_1515582681419_0001 from user: root, in queue: default 2018-01-10 12:13:34,097 INFO rmapp.RMAppImpl (RMAppImpl.java:handle(779)) - application_1515582681419_0001 State change from SUBMITTED to ACCEPTED 2018-01-10 12:13:34,119 INFO resourcemanager.ApplicationMasterService (ApplicationMasterService.java:registerAppAttempt(679)) - Registering app attempt : appattempt_1515582681419_0001_000001 2018-01-10 12:13:34,120 INFO attempt.RMAppAttemptImpl (RMAppAttemptImpl.java:handle(809)) - appattempt_1515582681419_0001_000001 State change from NEW to SUBMITTED 2018-01-10 12:13:34,131 INFO capacity.LeafQueue (LeafQueue.java:activateApplications(720)) - Application application_1515582681419_0001 from user: root activated in queue: default 2018-01-10 12:13:34,131 INFO capacity.LeafQueue (LeafQueue.java:addApplicationAttempt(744)) - Application added - appId: application_1515582681419_0001 user: root, leaf-queue: default #user-pending-applications: 0 #user-active-applications: 1 #queue-pending-applications: 0 #queue-active-applications: 1 2018-01-10 12:13:34,132 INFO capacity.CapacityScheduler (CapacityScheduler.java:addApplicationAttempt(773)) - Added Application Attempt appattempt_1515582681419_0001_000001 to scheduler from user root in queue default 2018-01-10 12:13:34,133 INFO attempt.RMAppAttemptImpl (RMAppAttemptImpl.java:handle(809)) - appattempt_1515582681419_0001_000001 State change from SUBMITTED to SCHEDULED 2018-01-10 12:13:34,701 INFO rmcontainer.RMContainerImpl (RMContainerImpl.java:handle(422)) - container_e05_1515582681419_0001_01_000001 Container Transitioned from NEW to ALLOCATED 2018-01-10 12:13:34,702 INFO scheduler.SchedulerNode (SchedulerNode.java:allocateContainer(152)) - Assigned container container_e05_1515582681419_0001_01_000001 of capacity <memory:1536, vCores:1> on host ambari2.internal:45454, which has 1 containers, <memory:1536, vCores:1> used and <memory:10496, vCores:3> available after allocation 2018-01-10 12:13:34,702 INFO allocator.AbstractContainerAllocator (AbstractContainerAllocator.java:getCSAssignmentFromAllocateResult(89)) - assignedContainer application attempt=appattempt_1515582681419_0001_000001 container=container_e05_1515582681419_0001_01_000001 queue=org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.allocator.RegularContainerAllocator@6d756235 clusterResource=<memory:12032, vCores:4> type=OFF_SWITCH 2018-01-10 12:13:34,716 INFO security.NMTokenSecretManagerInRM (NMTokenSecretManagerInRM.java:createAndGetNMToken(200)) - Sending NMToken for nodeId : ambari2.internal:45454 for container : container_e05_1515582681419_0001_01_000001 2018-01-10 12:13:34,724 INFO rmcontainer.RMContainerImpl (RMContainerImpl.java:handle(422)) - container_e05_1515582681419_0001_01_000001 Container Transitioned from ALLOCATED to ACQUIRED 2018-01-10 12:13:34,724 INFO capacity.ParentQueue (ParentQueue.java:assignContainers(475)) - assignedContainer queue=root usedCapacity=0.12765957 absoluteUsedCapacity=0.12765957 used=<memory:1536, vCores:1> cluster=<memory:12032, vCores:4> 2018-01-10 12:13:34,725 INFO security.NMTokenSecretManagerInRM (NMTokenSecretManagerInRM.java:clearNodeSetForAttempt(146)) - Clear node set for appattempt_1515582681419_0001_000001 2018-01-10 12:13:34,725 INFO attempt.RMAppAttemptImpl (RMAppAttemptImpl.java:storeAttempt(1955)) - Storing attempt: AppId: application_1515582681419_0001 AttemptId: appattempt_1515582681419_0001_000001 MasterContainer: Container: [ContainerId: container_e05_1515582681419_0001_01_000001, NodeId: ambari2.internal:45454, NodeHttpAddress: ambari2.internal:8042, Resource: <memory:1536, vCores:1>, Priority: 0, Token: Token { kind: ContainerToken, service: 192.168.100.25:45454 }, ] 2018-01-10 12:13:34,728 INFO attempt.RMAppAttemptImpl (RMAppAttemptImpl.java:handle(809)) - appattempt_1515582681419_0001_000001 State change from SCHEDULED to ALLOCATED_SAVING 2018-01-10 12:13:34,731 INFO attempt.RMAppAttemptImpl (RMAppAttemptImpl.java:handle(809)) - appattempt_1515582681419_0001_000001 State change from ALLOCATED_SAVING to ALLOCATED 2018-01-10 12:13:34,751 INFO amlauncher.AMLauncher (AMLauncher.java:run(266)) - Launching masterappattempt_1515582681419_0001_000001 2018-01-10 12:13:34,770 INFO amlauncher.AMLauncher (AMLauncher.java:launch(111)) - Setting up container Container: [ContainerId: container_e05_1515582681419_0001_01_000001, NodeId: ambari2.internal:45454, NodeHttpAddress: ambari2.internal:8042, Resource: <memory:1536, vCores:1>, Priority: 0, Token: Token { kind: ContainerToken, service: 192.168.100.25:45454 }, ] for AM appattempt_1515582681419_0001_000001 2018-01-10 12:13:34,771 INFO security.AMRMTokenSecretManager (AMRMTokenSecretManager.java:createAndGetAMRMToken(195)) - Create AMRMToken for ApplicationAttempt: appattempt_1515582681419_0001_000001 2018-01-10 12:13:34,774 INFO security.AMRMTokenSecretManager (AMRMTokenSecretManager.java:createPassword(307)) - Creating password for appattempt_1515582681419_0001_000001 2018-01-10 12:13:34,985 INFO amlauncher.AMLauncher (AMLauncher.java:launch(132)) - Done launching container Container: [ContainerId: container_e05_1515582681419_0001_01_000001, NodeId: ambari2.internal:45454, NodeHttpAddress: ambari2.internal:8042, Resource: <memory:1536, vCores:1>, Priority: 0, Token: Token { kind: ContainerToken, service: 192.168.100.25:45454 }, ] for AM appattempt_1515582681419_0001_000001 2018-01-10 12:13:34,986 INFO attempt.RMAppAttemptImpl (RMAppAttemptImpl.java:handle(809)) - appattempt_1515582681419_0001_000001 State change from ALLOCATED to LAUNCHED 2018-01-10 12:13:35,701 INFO rmcontainer.RMContainerImpl (RMContainerImpl.java:handle(422)) - container_e05_1515582681419_0001_01_000001 Container Transitioned from ACQUIRED to RUNNING 2018-01-10 12:15:18,860 INFO rmcontainer.RMContainerImpl (RMContainerImpl.java:handle(422)) - container_e05_1515582681419_0001_01_000001 Container Transitioned from RUNNING to COMPLETED 2018-01-10 12:15:18,860 INFO scheduler.SchedulerNode (SchedulerNode.java:releaseContainer(220)) - Released container container_e05_1515582681419_0001_01_000001 of capacity <memory:1536, vCores:1> on host ambari2.internal:45454, which currently has 0 containers, <memory:0, vCores:0> used and <memory:12032, vCores:4> available, release resources=true 2018-01-10 12:15:18,861 INFO attempt.RMAppAttemptImpl (RMAppAttemptImpl.java:rememberTargetTransitionsAndStoreState(1209)) - Updating application attempt appattempt_1515582681419_0001_000001 with final state: FAILED, and exit status: 10 2018-01-10 12:15:18,862 INFO attempt.RMAppAttemptImpl (RMAppAttemptImpl.java:handle(809)) - appattempt_1515582681419_0001_000001 State change from LAUNCHED to FINAL_SAVING 2018-01-10 12:15:18,862 INFO integration.RMRegistryOperationsService (RMRegistryOperationsService.java:onContainerFinished(144)) - Container container_e05_1515582681419_0001_01_000001 finished, purging container-level records 2018-01-10 12:15:18,862 INFO integration.RMRegistryOperationsService (RMRegistryOperationsService.java:purgeRecordsAsync(198)) - records under / with ID container_e05_1515582681419_0001_01_000001 and policy container: {} 2018-01-10 12:15:18,870 INFO resourcemanager.ApplicationMasterService (ApplicationMasterService.java:unregisterAttempt(685)) - Unregistering app attempt : appattempt_1515582681419_0001_000001 2018-01-10 12:15:18,871 INFO security.AMRMTokenSecretManager (AMRMTokenSecretManager.java:applicationMasterFinished(124)) - Application finished, removing password for appattempt_1515582681419_0001_000001 2018-01-10 12:15:18,872 INFO attempt.RMAppAttemptImpl (RMAppAttemptImpl.java:handle(809)) - appattempt_1515582681419_0001_000001 State change from FINAL_SAVING to FAILED 2018-01-10 12:15:18,872 INFO rmapp.RMAppImpl (RMAppImpl.java:transition(1331)) - The number of failed attempts is 1. The max attempts is 2 2018-01-10 12:15:18,873 INFO capacity.CapacityScheduler (CapacityScheduler.java:doneApplicationAttempt(812)) - Application Attempt appattempt_1515582681419_0001_000001 is done. finalState=FAILED 2018-01-10 12:15:18,873 INFO resourcemanager.ApplicationMasterService (ApplicationMasterService.java:registerAppAttempt(679)) - Registering app attempt : appattempt_1515582681419_0001_000002 2018-01-10 12:15:18,873 INFO scheduler.AppSchedulingInfo (AppSchedulingInfo.java:clearRequests(124)) - Application application_1515582681419_0001 requests cleared 2018-01-10 12:15:18,873 INFO attempt.RMAppAttemptImpl (RMAppAttemptImpl.java:handle(809)) - appattempt_1515582681419_0001_000002 State change from NEW to SUBMITTED 2018-01-10 12:15:18,873 INFO capacity.LeafQueue (LeafQueue.java:removeApplicationAttempt(795)) - Application removed - appId: application_1515582681419_0001 user: root queue: default #user-pending-applications: 0 #user-active-applications: 0 #queue-pending-applications: 0 #queue-active-applications: 0 2018-01-10 12:15:18,873 INFO capacity.LeafQueue (LeafQueue.java:activateApplications(720)) - Application application_1515582681419_0001 from user: root activated in queue: default 2018-01-10 12:15:18,873 INFO capacity.LeafQueue (LeafQueue.java:addApplicationAttempt(744)) - Application added - appId: application_1515582681419_0001 user: root, leaf-queue: default #user-pending-applications: 0 #user-active-applications: 1 #queue-pending-applications: 0 #queue-active-applications: 1 2018-01-10 12:15:18,873 INFO capacity.CapacityScheduler (CapacityScheduler.java:addApplicationAttempt(773)) - Added Application Attempt appattempt_1515582681419_0001_000002 to scheduler from user root in queue default 2018-01-10 12:15:18,877 INFO attempt.RMAppAttemptImpl (RMAppAttemptImpl.java:handle(809)) - appattempt_1515582681419_0001_000002 State change from SUBMITTED to SCHEDULED 2018-01-10 12:15:19,860 INFO rmcontainer.RMContainerImpl (RMContainerImpl.java:handle(422)) - container_e05_1515582681419_0001_02_000001 Container Transitioned from NEW to ALLOCATED 2018-01-10 12:15:19,860 INFO scheduler.SchedulerNode (SchedulerNode.java:allocateContainer(152)) - Assigned container container_e05_1515582681419_0001_02_000001 of capacity <memory:1536, vCores:1> on host ambari2.internal:45454, which has 1 containers, <memory:1536, vCores:1> used and <memory:10496, vCores:3> available after allocation 2018-01-10 12:15:19,861 INFO allocator.AbstractContainerAllocator (AbstractContainerAllocator.java:getCSAssignmentFromAllocateResult(89)) - assignedContainer application attempt=appattempt_1515582681419_0001_000002 container=container_e05_1515582681419_0001_02_000001 queue=org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.allocator.RegularContainerAllocator@2cf52214 clusterResource=<memory:12032, vCores:4> type=OFF_SWITCH 2018-01-10 12:15:19,862 INFO security.NMTokenSecretManagerInRM (NMTokenSecretManagerInRM.java:createAndGetNMToken(200)) - Sending NMToken for nodeId : ambari2.internal:45454 for container : container_e05_1515582681419_0001_02_000001 2018-01-10 12:15:19,862 INFO rmcontainer.RMContainerImpl (RMContainerImpl.java:handle(422)) - container_e05_1515582681419_0001_02_000001 Container Transitioned from ALLOCATED to ACQUIRED 2018-01-10 12:15:19,863 INFO security.NMTokenSecretManagerInRM (NMTokenSecretManagerInRM.java:clearNodeSetForAttempt(146)) - Clear node set for appattempt_1515582681419_0001_000002 2018-01-10 12:15:19,863 INFO capacity.ParentQueue (ParentQueue.java:assignContainers(475)) - assignedContainer queue=root usedCapacity=0.12765957 absoluteUsedCapacity=0.12765957 used=<memory:1536, vCores:1> cluster=<memory:12032, vCores:4> 2018-01-10 12:15:19,863 INFO attempt.RMAppAttemptImpl (RMAppAttemptImpl.java:storeAttempt(1955)) - Storing attempt: AppId: application_1515582681419_0001 AttemptId: appattempt_1515582681419_0001_000002 MasterContainer: Container: [ContainerId: container_e05_1515582681419_0001_02_000001, NodeId: ambari2.internal:45454, NodeHttpAddress: ambari2.internal:8042, Resource: <memory:1536, vCores:1>, Priority: 0, Token: Token { kind: ContainerToken, service: 192.168.100.25:45454 }, ] 2018-01-10 12:15:19,863 INFO attempt.RMAppAttemptImpl (RMAppAttemptImpl.java:handle(809)) - appattempt_1515582681419_0001_000002 State change from SCHEDULED to ALLOCATED_SAVING 2018-01-10 12:15:19,867 INFO attempt.RMAppAttemptImpl (RMAppAttemptImpl.java:handle(809)) - appattempt_1515582681419_0001_000002 State change from ALLOCATED_SAVING to ALLOCATED 2018-01-10 12:15:19,867 INFO amlauncher.AMLauncher (AMLauncher.java:run(266)) - Launching masterappattempt_1515582681419_0001_000002 2018-01-10 12:15:19,870 INFO amlauncher.AMLauncher (AMLauncher.java:launch(111)) - Setting up container Container: [ContainerId: container_e05_1515582681419_0001_02_000001, NodeId: ambari2.internal:45454, NodeHttpAddress: ambari2.internal:8042, Resource: <memory:1536, vCores:1>, Priority: 0, Token: Token { kind: ContainerToken, service: 192.168.100.25:45454 }, ] for AM appattempt_1515582681419_0001_000002 2018-01-10 12:15:19,870 INFO security.AMRMTokenSecretManager (AMRMTokenSecretManager.java:createAndGetAMRMToken(195)) - Create AMRMToken for ApplicationAttempt: appattempt_1515582681419_0001_000002 2018-01-10 12:15:19,870 INFO security.AMRMTokenSecretManager (AMRMTokenSecretManager.java:createPassword(307)) - Creating password for appattempt_1515582681419_0001_000002 2018-01-10 12:15:19,887 INFO amlauncher.AMLauncher (AMLauncher.java:launch(132)) - Done launching container Container: [ContainerId: container_e05_1515582681419_0001_02_000001, NodeId: ambari2.internal:45454, NodeHttpAddress: ambari2.internal:8042, Resource: <memory:1536, vCores:1>, Priority: 0, Token: Token { kind: ContainerToken, service: 192.168.100.25:45454 }, ] for AM appattempt_1515582681419_0001_000002 2018-01-10 12:15:19,887 INFO attempt.RMAppAttemptImpl (RMAppAttemptImpl.java:handle(809)) - appattempt_1515582681419_0001_000002 State change from ALLOCATED to LAUNCHED 2018-01-10 12:15:20,862 INFO rmcontainer.RMContainerImpl (RMContainerImpl.java:handle(422)) - container_e05_1515582681419_0001_02_000001 Container Transitioned from ACQUIRED to RUNNING 2018-01-10 12:17:02,626 INFO rmcontainer.RMContainerImpl (RMContainerImpl.java:handle(422)) - container_e05_1515582681419_0001_02_000001 Container Transitioned from RUNNING to COMPLETED 2018-01-10 12:17:02,626 INFO scheduler.SchedulerNode (SchedulerNode.java:releaseContainer(220)) - Released container container_e05_1515582681419_0001_02_000001 of capacity <memory:1536, vCores:1> on host ambari2.internal:45454, which currently has 0 containers, <memory:0, vCores:0> used and <memory:12032, vCores:4> available, release resources=true 2018-01-10 12:17:02,627 INFO attempt.RMAppAttemptImpl (RMAppAttemptImpl.java:rememberTargetTransitionsAndStoreState(1209)) - Updating application attempt appattempt_1515582681419_0001_000002 with final state: FAILED, and exit status: 10 2018-01-10 12:17:02,627 INFO attempt.RMAppAttemptImpl (RMAppAttemptImpl.java:handle(809)) - appattempt_1515582681419_0001_000002 State change from LAUNCHED to FINAL_SAVING 2018-01-10 12:17:02,627 INFO integration.RMRegistryOperationsService (RMRegistryOperationsService.java:onContainerFinished(144)) - Container container_e05_1515582681419_0001_02_000001 finished, purging container-level records 2018-01-10 12:17:02,627 INFO integration.RMRegistryOperationsService (RMRegistryOperationsService.java:purgeRecordsAsync(198)) - records under / with ID container_e05_1515582681419_0001_02_000001 and policy container: {} 2018-01-10 12:17:02,632 INFO resourcemanager.ApplicationMasterService (ApplicationMasterService.java:unregisterAttempt(685)) - Unregistering app attempt : appattempt_1515582681419_0001_000002 2018-01-10 12:17:02,632 INFO security.AMRMTokenSecretManager (AMRMTokenSecretManager.java:applicationMasterFinished(124)) - Application finished, removing password for appattempt_1515582681419_0001_000002 2018-01-10 12:17:02,632 INFO attempt.RMAppAttemptImpl (RMAppAttemptImpl.java:handle(809)) - appattempt_1515582681419_0001_000002 State change from FINAL_SAVING to FAILED 2018-01-10 12:17:02,632 INFO rmapp.RMAppImpl (RMAppImpl.java:transition(1331)) - The number of failed attempts is 2. The max attempts is 2 2018-01-10 12:17:02,633 INFO rmapp.RMAppImpl (RMAppImpl.java:rememberTargetTransitionsAndStoreState(1124)) - Updating application application_1515582681419_0001 with final state: FAILED 2018-01-10 12:17:02,634 INFO rmapp.RMAppImpl (RMAppImpl.java:handle(779)) - application_1515582681419_0001 State change from ACCEPTED to FINAL_SAVING 2018-01-10 12:17:02,634 INFO recovery.RMStateStore (RMStateStore.java:transition(228)) - Updating info for app: application_1515582681419_0001 2018-01-10 12:17:02,634 INFO capacity.CapacityScheduler (CapacityScheduler.java:doneApplicationAttempt(812)) - Application Attempt appattempt_1515582681419_0001_000002 is done. finalState=FAILED 2018-01-10 12:17:02,634 INFO scheduler.AppSchedulingInfo (AppSchedulingInfo.java:clearRequests(124)) - Application application_1515582681419_0001 requests cleared 2018-01-10 12:17:02,634 INFO capacity.LeafQueue (LeafQueue.java:removeApplicationAttempt(795)) - Application removed - appId: application_1515582681419_0001 user: root queue: default #user-pending-applications: 0 #user-active-applications: 0 #queue-pending-applications: 0 #queue-active-applications: 0 2018-01-10 12:17:02,636 INFO rmapp.RMAppImpl (RMAppImpl.java:transition(1064)) - Application application_1515582681419_0001 failed 2 times due to AM Container for appattempt_1515582681419_0001_000002 exited with exitCode: 10 For more detailed output, check the application tracking page: http://ambari1.internal:8088/cluster/app/application_1515582681419_0001 Then click on links to logs of each attempt. Diagnostics: Exception from container-launch. Container id: container_e05_1515582681419_0001_02_000001 Exit code: 10 Container exited with a non-zero exit code 10. Error file: prelaunch.err. Last 4096 bytes of prelaunch.err : Last 4096 bytes of stderr : f time 1515583020000 ms 18/01/10 12:17:00 INFO JobScheduler: Total delay: 0.013 s for time 1515583020000 ms (execution: 0.009 s) 18/01/10 12:17:00 INFO KafkaRDD: Removing RDD 8 from persistence list 18/01/10 12:17:00 INFO BlockManager: Removing RDD 8 18/01/10 12:17:00 INFO ReceivedBlockTracker: Deleting batches: 18/01/10 12:17:00 INFO InputInfoTracker: remove old batch metadata: 1515583000000 ms 18/01/10 12:17:02 ERROR ApplicationMaster: Uncaught exception: java.util.concurrent.TimeoutException: Futures timed out after [100000 milliseconds] at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:219) at scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223) at org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:201) at org.apache.spark.deploy.yarn.ApplicationMaster.runDriver(ApplicationMaster.scala:423) at org.apache.spark.deploy.yarn.ApplicationMaster.run(ApplicationMaster.scala:282) at org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$main$1.apply$mcV$sp(ApplicationMaster.scala:768) at org.apache.spark.deploy.SparkHadoopUtil$$anon$2.run(SparkHadoopUtil.scala:67) at org.apache.spark.deploy.SparkHadoopUtil$$anon$2.run(SparkHadoopUtil.scala:66) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1866) at org.apache.spark.deploy.SparkHadoopUtil.runAsSparkUser(SparkHadoopUtil.scala:66) at org.apache.spark.deploy.yarn.ApplicationMaster$.main(ApplicationMaster.scala:766) at org.apache.spark.deploy.yarn.ApplicationMaster.main(ApplicationMaster.scala) 18/01/10 12:17:02 INFO ApplicationMaster: Final app status: FAILED, exitCode: 10, (reason: Uncaught exception: java.util.concurrent.TimeoutException: Futures timed out after [100000 milliseconds]) 18/01/10 12:17:02 INFO StreamingContext: Invoking stop(stopGracefully=false) from shutdown hook 18/01/10 12:17:02 INFO ReceiverTracker: ReceiverTracker stopped 18/01/10 12:17:02 INFO JobGenerator: Stopping JobGenerator immediately 18/01/10 12:17:02 INFO RecurringTimer: Stopped timer for JobGenerator after time 1515583020000 18/01/10 12:17:02 INFO JobGenerator: Stopped JobGenerator 18/01/10 12:17:02 INFO JobScheduler: Stopped JobScheduler 18/01/10 12:17:02 INFO ContextHandler: Stopped o.s.j.s.ServletContextHandler@6fd4adf6{/streaming,null,UNAVAILABLE,@Spark} 18/01/10 12:17:02 INFO ContextHandler: Stopped o.s.j.s.ServletContextHandler@4db63574{/streaming/batch,null,UNAVAILABLE,@Spark} 18/01/10 12:17:02 INFO ContextHandler: Stopped o.s.j.s.ServletContextHandler@5aa67d0f{/static/streaming,null,UNAVAILABLE,@Spark} 18/01/10 12:17:02 INFO StreamingContext: StreamingContext stopped successfully 18/01/10 12:17:02 INFO SparkContext: Invoking stop() from shutdown hook 18/01/10 12:17:02 INFO AbstractConnector: Stopped Spark@39c6ad48{HTTP/1.1,[http/1.1]}{0.0.0.0:0} 18/01/10 12:17:02 INFO SparkUI: Stopped Spark web UI at http://192.168.100.25:34356 18/01/10 12:17:02 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped! 18/01/10 12:17:02 INFO MemoryStore: MemoryStore cleared 18/01/10 12:17:02 INFO BlockManager: BlockManager stopped 18/01/10 12:17:02 INFO BlockManagerMaster: BlockManagerMaster stopped 18/01/10 12:17:02 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped! 18/01/10 12:17:02 INFO SparkContext: Successfully stopped SparkContext 18/01/10 12:17:02 INFO ApplicationMaster: Unregistering ApplicationMaster with FAILED (diag message: Uncaught exception: java.util.concurrent.TimeoutException: Futures timed out after [100000 milliseconds]) 18/01/10 12:17:02 INFO ApplicationMaster: Deleting staging directory hdfs://ambari1.internal:8020/user/root/.sparkStaging/application_1515582681419_0001 18/01/10 12:17:02 INFO ShutdownHookManager: Shutdown hook called 18/01/10 12:17:02 INFO ShutdownHookManager: Deleting directory /hadoop/yarn/local/usercache/root/appcache/application_1515582681419_0001/spark-58754dbc-fb87-4868-a4df-164f898d3ff5 Failing this attempt. Failing the application. 2018-01-10 12:17:02,637 INFO rmapp.RMAppImpl (RMAppImpl.java:handle(779)) - application_1515582681419_0001 State change from FINAL_SAVING to FAILED 2018-01-10 12:17:02,637 INFO capacity.ParentQueue (ParentQueue.java:removeApplication(385)) - Application removed - appId: application_1515582681419_0001 user: root leaf-queue of parent: root #applications: 0 2018-01-10 12:17:02,638 INFO integration.RMRegistryOperationsService (RMRegistryOperationsService.java:onApplicationCompleted(119)) - Application application_1515582681419_0001 completed, purging application-level records 2018-01-10 12:17:02,638 INFO integration.RMRegistryOperationsService (RMRegistryOperationsService.java:purgeRecordsAsync(198)) - records under / with ID application_1515582681419_0001 and policy application: {} 2018-01-10 12:21:21,034 INFO scheduler.AbstractYarnScheduler (AbstractYarnScheduler.java:run(479)) - Release request cache is cleaned up 2018-01-10 13:21:20,999 INFO availability.MetricSinkWriteShardHostnameHashingStrategy (MetricSinkWriteShardHostnameHashingStrategy.java:findCollectorShard(42)) - Calculated collector shard ambari1.internal based on hostname: ambari1.internal 2018-01-10 14:21:18,164 INFO resourcemanager.ClientRMService (ClientRMService.java:getNewApplicationId(297)) - Allocated new applicationId: 2 2018-01-10 14:21:19,857 WARN rmapp.RMAppImpl (RMAppImpl.java:<init>(423)) - The specific max attempts: 0 for application: 2 is invalid, because it is out of the range [1, 2]. Use the global max attempts instead. 2018-01-10 14:21:19,857 INFO resourcemanager.ClientRMService (ClientRMService.java:submitApplication(615)) - Application with id 2 submitted by user root 2018-01-10 14:21:19,857 INFO rmapp.RMAppImpl (RMAppImpl.java:transition(1105)) - Storing application with id application_1515582681419_0002 2018-01-10 14:21:19,858 INFO rmapp.RMAppImpl (RMAppImpl.java:handle(779)) - application_1515582681419_0002 State change from NEW to NEW_SAVING 2018-01-10 14:21:19,858 INFO recovery.RMStateStore (RMStateStore.java:transition(199)) - Storing info for app: application_1515582681419_0002 2018-01-10 14:21:19,863 INFO rmapp.RMAppImpl (RMAppImpl.java:handle(779)) - application_1515582681419_0002 State change from NEW_SAVING to SUBMITTED 2018-01-10 14:21:19,863 INFO capacity.ParentQueue (ParentQueue.java:addApplication(360)) - Application added - appId: application_1515582681419_0002 user: root leaf-queue of parent: root #applications: 1 2018-01-10 14:21:19,863 INFO capacity.CapacityScheduler (CapacityScheduler.java:addApplication(744)) - Accepted application application_1515582681419_0002 from user: root, in queue: default
resourcemanager.log
01-11-2018 03:49 AM
nodemanager.log
2018-01-10 15:32:29,309 INFO monitor.ContainersMonitorImpl (ContainersMonitorImpl.java:run(499)) - Memory usage of ProcessTree 15190 for container-id container_e07_1515594329635_0001_01_000001: 513.4 MB of 2.5 GB physical memory used; 3.7 GB of 5.3 GB virtual memory used 2018-01-10 15:32:29,553 WARN nodemanager.DefaultContainerExecutor (DefaultContainerExecutor.java:launchContainer(249)) - Exit code from container container_e07_1515594329635_0001_01_000001 is : 10 2018-01-10 15:32:29,554 WARN nodemanager.DefaultContainerExecutor (DefaultContainerExecutor.java:launchContainer(255)) - Exception from container-launch with container ID: container_e07_1515594329635_0001_01_000001 and exit code: 10 ExitCodeException exitCode=10: at org.apache.hadoop.util.Shell.runCommand(Shell.java:944) at org.apache.hadoop.util.Shell.run(Shell.java:848) at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:1142) at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:237) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:326) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:87) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) 2018-01-10 15:32:29,556 INFO nodemanager.ContainerExecutor (ContainerExecutor.java:logOutput(368)) - Exception from container-launch. 2018-01-10 15:32:29,557 INFO nodemanager.ContainerExecutor (ContainerExecutor.java:logOutput(368)) - Container id: container_e07_1515594329635_0001_01_000001 2018-01-10 15:32:29,557 INFO nodemanager.ContainerExecutor (ContainerExecutor.java:logOutput(368)) - Exit code: 10 2018-01-10 15:32:29,557 WARN launcher.ContainerLaunch (ContainerLaunch.java:handleContainerExitWithFailure(431)) - Container launch failed : Container exited with a non-zero exit code 10. 2018-01-10 15:32:29,563 INFO container.ContainerImpl (ContainerImpl.java:handle(1163)) - Container container_e07_1515594329635_0001_01_000001 transitioned from RUNNING to EXITED_WITH_FAILURE 2018-01-10 15:32:29,564 INFO launcher.ContainerLaunch (ContainerLaunch.java:cleanupContainer(541)) - Cleaning up container container_e07_1515594329635_0001_01_000001 2018-01-10 15:32:29,588 INFO nodemanager.DefaultContainerExecutor (DefaultContainerExecutor.java:deleteAsUser(492)) - Deleting absolute path : /hadoop/yarn/local/usercache/root/appcache/application_1515594329635_0001/container_e07_1515594329635_0001_01_000001 2018-01-10 15:32:29,595 INFO container.ContainerImpl (ContainerImpl.java:handle(1163)) - Container container_e07_1515594329635_0001_01_000001 transitioned from EXITED_WITH_FAILURE to DONE
01-11-2018 03:50 AM
spark-submit command CONF_FILE=hdfs://ambari1.internal:8020/user/hadoop/local.normalizer.conf ${SPARK_DIR}/bin/spark-submit --conf spark.yarn.am.waitTime=110s --class mypackage.NormalizeJSON --master yarn --deploy-mode cluster ${WORK_DIR}/target/${JAR_VERSION} ${CONF_FILE}