Support Questions

jayadeep_jayara · ‎10-06-2016

I was going through the documentation on YARN architecture and couldn't find answers to some of the questions that I had, any help in answering them will be greatly appreciated

1. It is mentioned that Resource Manager starts a container and that container becomes Application Master which in turn requests for other containers. So in this case how does RM decide which container becomes AM if more than 1 node is eligible to provide the resources. Is it based on round robin techniques?

2. If the application master dies in one node can one of the other container become an application master without disrupting the client services?

3. Are all the containers given the same resources or can there be a heterogeneous mix?

4. Does the application jar copied to all the container nodes or only the app master node and does the application jar need to reside in HDFS or can the client application ship the jars to hdfs location during initiation.

Thanks,

Jayadeep

LesterMartin · ‎10-06-2016

It is clear you've done some research already on YARN, so I'll try to respond briefly to each to see if my responses are what you are needing to proceed.

1. In general, the RM picks a worker node that can run the AM container, so yes, good to visualize it as a round-robin approach.

2. The RM has a subtask who is responsible to watch the AM's and if one of them dies can restart it's container on another node. The AM itself needs to be written in such a way that it is restartable, but generally speaking the ones we all use (MR, Tez, Spark, etc) are all restartable. That said, "the client" may, or may not, be affected, but the "job" itself can run to completion despite an AM failure.

3. This is up to the AM to request of the RM what container sizes it needs (and how many). So, yes, theoretically they could be of different sizes, but often will be the same size. Spark is a good example as we usually ask for N containers, but want them to all have the same # of cores and amount of memory.

4. I believe the RM is going to ensure all jars are shipped to the needed NodeManager (NM) instances on the worker nodes where your containers will be at. I also believe you have some options about pre-placing your jars on HDFS, but that's been a while since I toyed with than and it was with MapReduce.

Hope this helps your understanding some. Good luck!

View solution in original post

LesterMartin · ‎10-06-2016

It is clear you've done some research already on YARN, so I'll try to respond briefly to each to see if my responses are what you are needing to proceed.

1. In general, the RM picks a worker node that can run the AM container, so yes, good to visualize it as a round-robin approach.

2. The RM has a subtask who is responsible to watch the AM's and if one of them dies can restart it's container on another node. The AM itself needs to be written in such a way that it is restartable, but generally speaking the ones we all use (MR, Tez, Spark, etc) are all restartable. That said, "the client" may, or may not, be affected, but the "job" itself can run to completion despite an AM failure.

3. This is up to the AM to request of the RM what container sizes it needs (and how many). So, yes, theoretically they could be of different sizes, but often will be the same size. Spark is a good example as we usually ask for N containers, but want them to all have the same # of cores and amount of memory.

4. I believe the RM is going to ensure all jars are shipped to the needed NodeManager (NM) instances on the worker nodes where your containers will be at. I also believe you have some options about pre-placing your jars on HDFS, but that's been a while since I toyed with than and it was with MapReduce.

Hope this helps your understanding some. Good luck!

Cloudera Community

Support Questions

YARN Questions

Cloudera 7.4.4 - Yarn - Questions about Queues

yarn scheduler question

Understanding basics of HDFS and YARN

Apache Ranger Admin UI Login Question

Reserved Memory and Vcores in negative value in Ya...

Cloudera Data Platform (CDP) - Questions & Answers

Deep dive into YARN Log Aggregation / Deep dive in...

Common LLAP questions answered

Spark on YARN - Executor Resource Allocation Optim...

Cloudera DataFlow (CDF) - Questions & Answers