Member since
01-28-2015
11
Posts
0
Kudos Received
0
Solutions
07-02-2015
07:50 AM
Thanks much Wilfred. In response to (1) reply - yes you will need to supply the setting every time. There is not something like: "if user == X the queue = Y", you can write your own rule for it if you wanted one. The rules can be added. -- I am just wondering how rules can be written to assign users to the queues in the rules, i don't any see users and queue specification section in rules. I mean to ask how is it possible to add rules to assign users to queues the want my requirement is i.e. if user == X the queue = Y"
... View more
06-25-2015
09:34 AM
Thanks Wilfred, yes it explains. So what I understood is that aclSubmit apps would just check if a user is allowed to run in the queue determined by the queue placement policy. And queue placement policy based on rules. Please help with below points - 1. Fair scheduler - Say I create queues xyz and default in the config file. Want to have "user1"'s jobs to under xyz queue, does he always need to specify the queue name in the command line to get it placed under xyz? Or is there any way in the config file or create a custom rule where I can specify that whenever user1 runs any jobs (without specifying queue name), they should be placed under xyz queue. 2. How would I do the same above thing but in capacity scheduler?
... View more
06-22-2015
08:30 AM
Thanks Wilfred, so you mean everytime a user submits the job then they should specify the queue name in - mapred.job.queue.name parameter hadoop jar /usr/lib/hadoop-mapreduce/hadoop-mapreduce-examples.jar wordcount -Dmapred.job.queue.name=alpha.alpha1 /yarn/man*.txt /yarn/testout1 When I run the job specifiying queue name, it runs in the mentioned queue. So do we always need to specify the queue name? Wont it assign the queue explicitly based on the submitACL username/groups.
... View more
06-18-2015
08:32 AM
Thanks Wilfred for the response. I was trying different combinations and the root.userx must have been created while create was not false. I checked the documentation but unable to figure out what should be the queue placement in order for jobs submitted by users to be assigned to appropriate queues. With the below queue placement the jobs are always being placed into default queue, <queuePlacementPolicy> <rule name="user" create="false" /> <rule name="primaryGroup" create="false" /> <rule name="specified" create="false" /> <rule name="default" /> </queuePlacementPolicy> The documentation states that "Currently the only supported administrative action is killing an application. Anybody who may administer a queue may also submit applications to it. " Does this mean aclSubmitApps functionality is not enabled/supported other than Admin ACL task of killing an application.
... View more
06-15-2015
09:09 PM
Hello, implemented capacity scheduler in my hadoop env (3 node clsuster) with below listed queues. Also assigned certain users to queues, but when I am running job as one of the user assigned to a particular queue the job is running under default queue rather than the assigned queue. When I am running the job I am not specifying the queue in the MR Job command line. But when the job is run by mentioning the assigned queue using –D mapreduce.job.queuename, then it runs under the mentioned queue. Tried with fair scheduler as well and found the same behavior. My understanding is once these queues are defined and users allocated to them. And when users run the jobs, the jobs should automatically (without the need to change the jobs to specify the queue name) get assigned to allocated queues. Please let me know if this is not how it works. [root@xxxxxxxx ~]# cat /etc/gphd/hadoop/conf/fair-scheduler.xml <allocations> <queue name="default"> <minResources>14417mb,22vcores</minResources> <maxResources>1441792mb,132vcores</maxResources> <maxRunningApps>50</maxRunningApps> <weight>10</weight> <schedulingPolicy>fifo</schedulingPolicy> <minSharePreemptionTimeout>300</minSharePreemptionTimeout> </queue> <queue name="framework"> <minResources>14417mb, 88vcores</minResources> <maxResources>1441792mb, 132vcores</maxResources> <maxRunningApps>5</maxRunningApps> <weight>30</weight> <schedulingPolicy>fair</schedulingPolicy> <minSharePreemptionTimeout>300</minSharePreemptionTimeout> <aclSubmitApps>userx,svc_cpsi_s1,svc_pusd_s1,svc_ssyr_s1,svc_susd_s1</aclSubmitApps> </queue> <queue name="transformation"> <minResources>14417mb, 88vcores</minResources> <maxResources>1441792mb, 132vcores</maxResources> <maxRunningApps>5</maxRunningApps> <weight>20</weight> <schedulingPolicy>fair</schedulingPolicy> <minSharePreemptionTimeout>300</minSharePreemptionTimeout> <aclSubmitApps>svc_bdli_s1,svc_bdlt_s1,svc_bdlm_s1</aclSubmitApps> </queue> <userMaxAppsDefault>50</userMaxAppsDefault> <fairSharePreemptionTimeout>6000</fairSharePreemptionTimeout> <defaultQueueSchedulingPolicy>fifo</defaultQueueSchedulingPolicy> <queuePlacementPolicy> <rule name="specified" create="false" /> <rule name="user" create="false" /> <rule name="group" create="false" /> <rule name="default" /> </queuePlacementPolicy> </allocations> The job is being run as userx - who as per config file should be part of framework queue [userx@xxxxxxxx tmp]$ hadoop jar /usr/lib/gphd/hadoop-mapreduce/hadoop-mapreduce-examples.jar wordcount /yarn/man*.txt /yarn/testout1 > /tmp/testout1 15/06/01 12:08:43 INFO client.RMProxy: Connecting to ResourceManager at xxxxxxxx.xyz.com/10.15.232.185:8032 15/06/01 12:08:43 INFO input.FileInputFormat: Total input paths to process : 5 15/06/01 12:08:43 INFO mapreduce.JobSubmitter: number of splits:5 15/06/01 12:08:44 INFO impl.YarnClientImpl: Submitted application application_1433174070996_0002 to ResourceManager at xxxxxxxx.xyz.com/10.15.232.185:8032 15/06/01 12:08:44 INFO mapreduce.Job: The url to track the job: http://xxxxxxxx.xyz.com:8088/proxy/application_1433174070996_0002/ 15/06/01 12:08:44 INFO mapreduce.Job: Running job: job_1433174070996_0002 Instead of framework queue, the job is being run under root.admin queue. [root@xxxxxxxx ~]# yarn application -list 15/06/01 12:09:02 INFO client.RMProxy: Connecting to ResourceManager at xxxxxxxx.xyz.com/10.15.232.185:8032 Total number of applications (application-types: [] and states: [SUBMITTED, ACCEPTED, RUNNING]):1 Application-Id Application-Name Application-Type User Queue State Final-State Progress Tracking-URL application_1433174070996_0002 word count MAPREDUCE userx root.userx RUNNING UNDEFINED 5% http://xxxxxxxx.xyz.com:39001
... View more
Labels:
- Labels:
-
Apache Hadoop
-
Apache YARN
-
MapReduce
03-24-2015
07:05 AM
Hi Mario/All, Am getting below erorr when I am trying to run Rumen which is required for YARN SLS. java -cp "hadoop/*:hadoop/lib/*:hadoop-hdfs/*:hadoop-yarn/*:hadoop-mapreduce/*" org.apache.hadoop.tools.rumen.TraceBuilder /home/admin/nmakb/job-trace.json /home/admin/nmakb/topology.output hdfs://user/history/done 2015-03-24 04:45:16,693 WARN [main] util.NativeCodeLoader (NativeCodeLoader.java:<clinit>(62)) - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 2015-03-24 04:45:17,602 WARN [main] rumen.TraceBuilder (TraceBuilder.java:run(284)) - No job found in traces: I have jobs info present in the job history folder. But not sure why its complaining and saying "No Job found in traces". Please help
... View more
02-01-2015
10:39 PM
Hello, I am using Cloudera VM for practice. I want to test the capacity and fair schedulers using YARN SLS. But I could not find the sls xml and scripts in the VM and was not able to find instructions on where to download them for testing. If anyone has used it please guide me.
... View more
Labels:
- Labels:
-
Apache YARN