Created on 10-25-2021 03:53 AM - edited 10-25-2021 07:37 AM
Hello,
I am trying to launch a spark app on Yarn over 2 nodes hadoop cluster. When i do it gets stuck with error message :
Waiting on ApplicationMaster container to launch and the app gets stuck in ACCEPTED status. When i check the ApplicationMaster Log in the resource manager UI, it says :
2021-10-25 12:22:53,788 INFO cluster.YarnClusterScheduler: Created YarnClusterScheduler 2021-10-25 12:22:53,960 INFO util.Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 57482. 2021-10-25 12:22:53,960 INFO netty.NettyBlockTransferService: Server created on slaveVM1:57482 2021-10-25 12:22:53,960 INFO storage.BlockManager: Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication policy 2021-10-25 12:22:53,976 INFO storage.BlockManagerMaster: Registering BlockManager BlockManagerId(driver, slaveVM1, 57482, None) 2021-10-25 12:22:53,976 INFO storage.BlockManagerMasterEndpoint: Registering block manager slaveVM1:57482 with 366.3 MiB RAM, BlockManagerId(driver, slaveVM1, 57482, None) 2021-10-25 12:22:53,976 INFO storage.BlockManagerMaster: Registered BlockManager BlockManagerId(driver, slaveVM1, 57482, None) 2021-10-25 12:22:53,976 INFO storage.BlockManager: Initialized BlockManager: BlockManagerId(driver, slaveVM1, 57482, None) 2021-10-25 12:22:54,194 INFO ui.ServerInfo: Adding filter to /metrics/json: org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter 2021-10-25 12:22:54,194 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@7fe6122a{/metrics/json,null,AVAILABLE,@Spark} 2021-10-25 12:22:54,288 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8030 2021-10-25 12:22:54,366 INFO yarn.YarnRMClient: Registering the ApplicationMaster 2021-10-25 12:22:56,433 INFO ipc.Client: Retrying connect to server: 0.0.0.0/0.0.0.0:8030. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS) 2021-10-25 12:22:58,467 INFO ipc.Client: Retrying connect to server: 0.0.0.0/0.0.0.0:8030. Already tried 1 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS) 2021-10-25 12:23:00,502 INFO ipc.Client: Retrying connect to server: 0.0.0.0/0.0.0.0:8030. Already tried 2 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2021-10-25 14:32:25,915 INFO retry.RetryInvocationHandler: java.net.ConnectException: Your endpoint configuration is wrong; For more details see: http://wiki.apache.org/hadoop/UnsetHostnameOrPort, while invoking ApplicationMasterProtocolPBClientImpl.registerApplicationMaster over null after 5 failover attempts. Trying to failover after sleeping for 27785ms.
in Yarn-site i set the resourcemanager.hostname property in nodemanager to have a value of masterIP.
UPDATE : diagnostics: [lun ott 25 16:36:24 +0200 2021] Application is added to the scheduler and is not yet activated. Skipping AM assignment as cluster resource is empty. Details : AM Partition = <DEFAULT_PARTITION>; AM Resource Request = <memory:2096, vCores:1>; Queue Resource Limit for AM = <memory:0, vCores:0>; User AM Resource Limit of the queue = <memory:0, vCores:0>; Queue AM Resource Usage = <memory:0, vCores:0>;
ApplicationMaster host: N/A
ApplicationMaster RPC port: -1
queue: default
start time: 1635172583960
final status: UNDEFINED
tracking URL: http://masterVM2:8088/proxy/applicati
Thanks for clarifying.
Created 12-07-2021 10:34 PM
can you share few details.
1. is is apache cluster or any enterprise?
2. how you are launching/submitting your job?
3. please check connectivity between your hosts
4. make sure cluster is up and running.