Member since
09-28-2017
88
Posts
3
Kudos Received
0
Solutions
04-01-2019
08:17 AM
nope, still looking for a solution
... View more
03-31-2019
06:30 AM
just checked my configs and dynamic is already set Advanced spark2-thrift-sparkconf spark.dynamicAllocation.enabled spark.dynamicAllocation.initialExecutors spark.dynamicAllocation.maxExecutors spark.dynamicAllocation.minExecutors here are the logs for starting a job 19/03/31 06:23:17 INFO RMProxy: Connecting to ResourceManager at grid-master.MyDomain.com/XXX.XXX.XXX:8030 19/03/31 06:23:17 INFO YarnRMClient: Registering the ApplicationMaster 19/03/31 06:23:17 INFO Configuration: found resource resource-types.xml at file:/etc/hadoop/3.1.0.0-78/0/resource-types.xml 19/03/31 06:23:17 INFO YarnSchedulerBackend$YarnSchedulerEndpoint: ApplicationMaster registered as NettyRpcEndpointRef(spark://YarnAM@grid-worker-102.MyDomain.com:44575) 19/03/31 06:23:17 INFO YarnAllocator: Will request 1220 executor container(s), each with 1 core(s) and 2432 MB memory (including 384 MB of overhead) 19/03/31 06:23:17 INFO YarnAllocator: Submitted 1220 unlocalized container requests. 19/03/31 06:23:17 INFO ApplicationMaster: Started progress reporter thread with (heartbeat : 3000, initial allocation : 200) intervals 19/03/31 06:23:18 INFO YarnAllocator: Launching container container_e33_1553767920480_0028_01_000002 on host grid-02.MyDomain.com for executor with ID 1 19/03/31 06:23:18 INFO YarnAllocator: Launching container container_e33_1553767920480_0028_01_000003 on host grid-worker-101.MyDomain.com for executor with ID 2 19/03/31 06:23:18 INFO YarnAllocator: Launching container container_e33_1553767920480_0028_01_000004 on host grid-04.MyDomain.com for executor with ID 3 19/03/31 06:23:18 INFO YarnAllocator: Launching container container_e33_1553767920480_0028_01_000005 on host grid-05.MyDomain.com for executor with ID 4 19/03/31 06:23:18 INFO YarnAllocator: Launching container container_e33_1553767920480_0028_01_000006 on host grid-03.MyDomain.com for executor with ID 5 19/03/31 06:23:18 INFO YarnAllocator: Launching container container_e33_1553767920480_0028_01_000007 on host grid-worker-102.MyDomain.com for executor with ID 6 19/03/31 06:23:18 INFO YarnAllocator: Received 6 containers from YARN, launching executors on 6 of them. 19/03/31 06:23:18 INFO YarnAllocator: Launching container container_e33_1553767920480_0028_01_000008 on host grid-01.MyDomain.com for executor with ID 7 19/03/31 06:23:18 INFO YarnAllocator: Received 1 containers from YARN, launching executors on 1 of them. 19/03/31 06:23:19 INFO YarnAllocator: Launching container container_e33_1553767920480_0028_01_000009 on host grid-02.MyDomain.com for executor with ID 8 19/03/31 06:23:19 INFO YarnAllocator: Launching container container_e33_1553767920480_0028_01_000010 on host grid-worker-101.MyDomain.com for executor with ID 9 19/03/31 06:23:19 INFO YarnAllocator: Launching container container_e33_1553767920480_0028_01_000011 on host grid-04.MyDomain.com for executor with ID 10 19/03/31 06:23:19 INFO YarnAllocator: Launching container container_e33_1553767920480_0028_01_000012 on host grid-05.MyDomain.com for executor with ID 11 19/03/31 06:23:19 INFO YarnAllocator: Launching container container_e33_1553767920480_0028_01_000013 on host grid-03.MyDomain.com for executor with ID 12 19/03/31 06:23:19 INFO YarnAllocator: Launching container container_e33_1553767920480_0028_01_000014 on host grid-worker-102.MyDomain.com for executor with ID 13 19/03/31 06:23:19 INFO YarnAllocator: Launching container container_e33_1553767920480_0028_01_000015 on host grid-01.MyDomain.com for executor with ID 14 19/03/31 06:23:19 INFO YarnAllocator: Received 7 containers from YARN, launching executors on 7 of them. 19/03/31 06:23:21 INFO YarnAllocator: Launching container container_e33_1553767920480_0028_01_000016 on host grid-worker-101.MyDomain.com for executor with ID 15 19/03/31 06:23:21 INFO YarnAllocator: Launching container container_e33_1553767920480_0028_01_000017 on host grid-02.MyDomain.com for executor with ID 16 19/03/31 06:23:21 INFO YarnAllocator: Launching container container_e33_1553767920480_0028_01_000018 on host grid-04.MyDomain.com for executor with ID 17 19/03/31 06:23:21 INFO YarnAllocator: Launching container container_e33_1553767920480_0028_01_000019 on host grid-03.MyDomain.com for executor with ID 18 19/03/31 06:23:21 INFO YarnAllocator: Launching container container_e33_1553767920480_0028_01_000020 on host grid-05.MyDomain.com for executor with ID 19 19/03/31 06:23:21 INFO YarnAllocator: Launching container container_e33_1553767920480_0028_01_000021 on host grid-worker-102.MyDomain.com for executor with ID 20
... View more
03-28-2019
11:52 AM
this is not the issue, here is the queue structure each user can use the entire cluster, but it frees resources only when the entire application (and all its executors ) are done, ( my problem is then when an executor is ended and there are no pending jobs the exectures is not freed) for instance the application submitted job that have Tasks 200 execution and got 200 workers (each with 1 core), when the tasks are ending the executor is not freed, (in the worst case the application 1/200 (1 running) but the entire 200 resources are still allocated. i would expect that 199 will be return to the cluster
... View more
03-28-2019
09:23 AM
anyone got an idea? it is major issue for us
... View more
03-24-2019
08:48 AM
hi, i have a large cluster with multiple queues but in this scenario only two are used bt.a bt.b each queue can scale to 100% of the cluster ( total 200 nodes) the problem is when one user is running a large job of 1000 executors. and he have less than 200 executors, for example 150 is running (850 completed), 50 should be allocated to the second user. but the second user does not receive new allocations and the second user starts only when the first user fully completed all executors (jobe completed ) sometimes it comes to a situation when only 1 worker still running(999 completed), and 199 are "sort of pending and holds all the resources " and the cluster seams fully populated, and frees only when the 1 worker is done
... View more
Labels:
- Labels:
-
Apache Spark
-
Apache YARN
03-10-2019
07:20 AM
Can anyone give idea to fix it ?
... View more
03-05-2019
08:03 AM
approx 30% of the jobs i submit fails, here are the logs with the error 19/03/04 18:43:17 ERROR CoarseGrainedExecutorBackend: Executor self-exiting due to : Driver grid-05.test.com:36315 disassociated! Shutting down. 19/03/04 18:43:17 INFO DiskBlockManager: Shutdown hook called 19/03/04 18:43:17 INFO ShutdownHookManager: Shutdown hook called any idea how i can resolve it ?
... View more
Labels:
- Labels:
-
Apache YARN
03-04-2019
10:41 AM
i don't think this helps me because i want to increase the failure limit to more than 20
... View more
03-04-2019
09:05 AM
can i set it via the ambari config?
... View more
Labels:
- Labels:
-
Apache Ambari
-
Apache Spark
-
Apache YARN
02-20-2019
01:26 PM
i think that did what i needed. thanks
... View more