Created 02-18-2018 05:21 PM
Hi.
If I run a job with yarn that makes use of hdfs data, I understand that yarn will search for hardware resources to run it.
But how is the interaction of the yarn with the namenode.
In other words, the yarn has to communicate with the namenode at some point in order to know where are located the hdfs files that the job requires. When he does that?
Can someone please clarify the matter for me.
Regards
Created 02-19-2018 05:12 AM
Hi @Lanic
When you submit a job, its YARN which gives an information about the resources. So the driver gets the information from name node regarding the HDFS data location, needed to execute the job. Then based on the nearest available resource which are closer to the data will be taken into consideration where the jobs will be executed. Its the name node which gives Yarn about the information of the HDFS data location. Once all the jobs are completed then the communication about all the jobs status will be updated and corresponding metastore will be brought in sync.
Hope it Helps!!
Created 02-19-2018 05:12 AM
Hi @Lanic
When you submit a job, its YARN which gives an information about the resources. So the driver gets the information from name node regarding the HDFS data location, needed to execute the job. Then based on the nearest available resource which are closer to the data will be taken into consideration where the jobs will be executed. Its the name node which gives Yarn about the information of the HDFS data location. Once all the jobs are completed then the communication about all the jobs status will be updated and corresponding metastore will be brought in sync.
Hope it Helps!!
Created 02-19-2018 11:38 AM
Just to be more specific
1. Driver talks to namenode to find the location of the HDFS blocks.
2. The info is available to the AM.
3. Driver request for AM, Am requests for the required resources based on the blocks info.
4. YARN has no business to talk to namenode directly.