Reply
Highlighted
Explorer
Posts: 69
Registered: ‎01-24-2017

yarn question

Hi All,

 

If one submits a job to a Hadoop cluster without explicitly using Yarn (but for example, using Spark shell, Hive shell, HBase shell, Pig, MapReduce with hadoop command, Impala, via Hue interface,  etc.) is the job still scheduled and controlled by Yarn or not? Can one rely that everything goes via Yarn?

 

In Yarn can I partition my users into several queues with different priorities or different amount of resources allocated to each queue?

 

Thank you,

Igor

 

Posts: 642
Topics: 3
Kudos: 105
Solutions: 67
Registered: ‎08-16-2016

Re: yarn question

So jobs launched in YARN will run, well, in YARN.

Launching shells don't by themselves consist of jobs. It isn't until the user tells it to do something that would require a job. That job is then ran in YARN.

Each service/shell can be a bit tricky as well. For instance a Select * query in Hive does not launch a MR job. It reads the data directly from HDFS. A Select Count(*) does launch a MR job as now it needs to crunch the data to get the value (an exception is if stats are turned on). Impala is a caveat, by default queries ran against it, either from Hue or the shell, run against the impala daemons. It effectively is a MPP cluster that sits besides YARN and on top of HDFS. There is a service, Llama, that allows you to integrate Impala into YARN.
Announcements