Support Questions

Find answers, ask questions, and share your expertise

ApplicationManager won't launch

avatar

Hello,

 

I am trying to launch a spark app on Yarn over 2 nodes hadoop cluster. When i do it gets stuck with error message : 

 

Waiting on ApplicationMaster container to launch and the app gets stuck in ACCEPTED status. When i check the ApplicationMaster Log in the resource manager UI, it says : 

 

 

2021-10-25 12:22:53,788 INFO cluster.YarnClusterScheduler: Created YarnClusterScheduler 2021-10-25 12:22:53,960 INFO util.Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 57482. 2021-10-25 12:22:53,960 INFO netty.NettyBlockTransferService: Server created on slaveVM1:57482 2021-10-25 12:22:53,960 INFO storage.BlockManager: Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication policy 2021-10-25 12:22:53,976 INFO storage.BlockManagerMaster: Registering BlockManager BlockManagerId(driver, slaveVM1, 57482, None) 2021-10-25 12:22:53,976 INFO storage.BlockManagerMasterEndpoint: Registering block manager slaveVM1:57482 with 366.3 MiB RAM, BlockManagerId(driver, slaveVM1, 57482, None) 2021-10-25 12:22:53,976 INFO storage.BlockManagerMaster: Registered BlockManager BlockManagerId(driver, slaveVM1, 57482, None) 2021-10-25 12:22:53,976 INFO storage.BlockManager: Initialized BlockManager: BlockManagerId(driver, slaveVM1, 57482, None) 2021-10-25 12:22:54,194 INFO ui.ServerInfo: Adding filter to /metrics/json: org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter 2021-10-25 12:22:54,194 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@7fe6122a{/metrics/json,null,AVAILABLE,@Spark} 2021-10-25 12:22:54,288 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8030 2021-10-25 12:22:54,366 INFO yarn.YarnRMClient: Registering the ApplicationMaster 2021-10-25 12:22:56,433 INFO ipc.Client: Retrying connect to server: 0.0.0.0/0.0.0.0:8030. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS) 2021-10-25 12:22:58,467 INFO ipc.Client: Retrying connect to server: 0.0.0.0/0.0.0.0:8030. Already tried 1 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS) 2021-10-25 12:23:00,502 INFO ipc.Client: Retrying connect to server: 0.0.0.0/0.0.0.0:8030. Already tried 2 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)

2021-10-25 14:32:25,915 INFO retry.RetryInvocationHandler: java.net.ConnectException: Your endpoint configuration is wrong; For more details see:  http://wiki.apache.org/hadoop/UnsetHostnameOrPort, while invoking ApplicationMasterProtocolPBClientImpl.registerApplicationMaster over null after 5 failover attempts. Trying to failover after sleeping for 27785ms.

   in Yarn-site i set the resourcemanager.hostname  property in nodemanager to have a value of masterIP.

 

UPDATE : diagnostics: [lun ott 25 16:36:24 +0200 2021] Application is added to the scheduler and is not yet activated. Skipping AM assignment as cluster resource is empty. Details : AM Partition = <DEFAULT_PARTITION>; AM Resource Request = <memory:2096, vCores:1>; Queue Resource Limit for AM = <memory:0, vCores:0>; User AM Resource Limit of the queue = <memory:0, vCores:0>; Queue AM Resource Usage = <memory:0, vCores:0>;
ApplicationMaster host: N/A
ApplicationMaster RPC port: -1
queue: default
start time: 1635172583960
final status: UNDEFINED
tracking URL: http://masterVM2:8088/proxy/applicati

 

Thanks for clarifying.

1 REPLY 1

avatar
Contributor

can you share few details.

1. is is apache cluster or any enterprise?

2. how you are launching/submitting your job?

3. please check connectivity between your hosts

4. make sure cluster is up and running.