I am trying to run two parallel Spark jobs through Oozie. These jobs execute successfully when Spark Master is set to local[*], however when I try to run these jobs with Spark Master as yarn, the workflow execution get stuck at the following state:
1. Two Oozie MR Jobs gets stuck at 95%
2. First Spark job gets into Running state and gets stuck at 10%
3. Second Spark job gets stuck into Accepted State.
I am using a CHD 5.8 Amazon EC2 cluster with one master and three slaves. I have set the following resource related settings through Cloudera Manager:
1. Dynamic Resource Pool Configuration -> User Limits -> Default 10
2. Dynamic Resource Pool Configuration -> Resource Pools -> Max Running Apps 10
What could be the issue here? Any help is highly appriciated. Thanks in advance.
Are you using spark on yarn ?
If yes, is there available container on yarn ?
Could you tell us your yarn configuration :
- vcpu/mem per nodes ?
- min mem and vcpu per container ?
Also, the requirement asked in vcpu and mem for your spark app ?
These information will let you determined if your "accepted" job is waiting for an avalaible container or not.