07-01-2017 01:24 AM
07-01-2017 01:58 AM - edited 07-01-2017 01:59 AM
Cloudera recommends to use Fair Scheduler as Task Scheduler .
Please refer the Link.
07-01-2017 07:40 AM
I'm really thankful for getting response from you.
but what i'm asking about is not configuring the scheduler, what i'd like to change is making slot can be configure dynamically and Reduce slot can be able to process Map Task when it's idle.
this can be done only by dynamic configuration.
as you will see in photo in following link, Map slot are fixed 12 and 6 fixed for Reduce slot, is there any way to make them be changable according to the requirment of jobs in the processing time.
07-03-2017 12:00 AM
Gotcha , I belive you are using MR1 , Correct me if I am wrong.
In MR1, the mapred.tasktracker.map.tasks.maximum and mapred.tasktracker.reduce.tasks.maximum properties dictated how many map and reduce slots each TaskTracker had.
These properties no longer exist in YARN MR2 Instead, YARN uses yarn.nodemanager.resource.memory-mb and yarn.nodemanager.resource.cpu-vcores, which control the amount of memory and CPU on each node, both available to both maps and reduces.
07-04-2017 09:47 AM
Yes I'm working on MapReduce v1 which is using slot instead of container.
I also setup cloudera chd4.7 and by browsing it i think it using MR v1, can you correct me if i'm wrong.
07-13-2017 12:52 PM
But there's research paper that proposed an algorithms to make MRv1 dynamic slot allocation or dynamic slot configuration.
So for this i have to understand core code and all classes to know what i have to change.
07-14-2017 02:40 AM - edited 07-14-2017 02:43 AM
If you got YARN framework there is no SIngle Point of Failure because you can configure High Availablity plus there a sperate dameon that performs the resource allocation that is resource manager and node manager which manages compute container , also you can configure HDFS HA in Hadoop 2 . Where as in Hadoop 1 namenode is a single point of failure as convyed earlier. The prequistes for HA is you need Zookeeper for HDFS , for Resourcemanager HA you need one because it comes with the ActiveStandbyElector embedded .In addtion you can also perform rolling restart of your cluster without any service interruption.FInally the reason they pushed to MR2 is because there is a lot of stress going on Job tracker as it performs both the task.