About Wilfred

Wilfred · ‎06-28-2015

That is correct. For the two questions. Answers: 1) yes you will need to supply the setting every time. There is not something like: "if user == X the queue = Y", you can write your own rule for it if you wanted one. The rules can be added. 2) The capacisty scheduler has a completly different config and does not have placement rules or something like that, see CapacityScheduler config documentation. They just have ACLs nothing else. So you would need to use the same setting every single time. Wilfred

Wilfred · ‎06-23-2015

You need to start from the top: root first and then down. If you allow anybody to submit and admin the root queue there is nothing to enforce on the sub queues. As I explained in the linked post and as it shows in the explanantion in the screen shot. Wilfred

Wilfred · ‎06-22-2015

I can's see the screenshots that were added to the message but setting the ACL for the queue is described here follow the steps. You must add the "yarn" user to the overall YARN Admin ACL. Remove the root queue admin and submit ACL's otherwise things will not work. See my post in one of the other threads on this forum. The for the queue ACL you set the submit ACL for the production queue to: hdfs,user1 The test queue would get the submit ACL as: user2 For the default queue you can set the submit ACL to: "*" Wilfred

Wilfred · ‎06-22-2015

The rules are always executed. If you have a specified rule then that rule will check the value of "mapred.job.queue.name" you pass in. Otherwise you will check the other rules in the list like the user or group based rules. The ACL check is perfromed after a rule provides a queue name. If we look at the user rule for user "test_user" example: <rule name="user" create="false" /> This rule builds the queue name and returns the queue: "root.test_user" as the queue to put the application. It will only do that if the queue already exists in the FairScheduler config. If the queue exists it will check if "test_user" is allowed to submit an application in this queue by checking the ACL's of the "root.test_user" queue and the "root" queue (i.e. the parent). The ACL's that are checked are both the submit and admin ACL's So the generic steps are for each rule: - build the queue name (rule dependent) - check if exist if create is false, stop if this check fails (return no queue name) - check the submit ACL and then the admin ACL for the queue - if ACL check fails check the parent queues all the way to root for access: submit and admin ACL on each level: if no access return no queue name - return queue name These steps are repeated for each rule in the order the rules are configured until a queue is found or no more rules are available. Does that explain it? Wilfred

Wilfred · ‎06-19-2015

This error: main : command provided 1 main : user is abc main : requested yarn user is abc Container exited with a non-zero exit code 1 Looks like the exit code from the linux container executor. In cluster mode the driver runs inside the same container as the Application Master which makes a difference. As other people have said already get the logs from the containers by running: yarn logs -applicationId APPID Make sure that you run it as the user "abc" (same as the user that executes the spark command). Wilfred

Wilfred · ‎06-18-2015

All create options are false which means that you can only have the queues that are in the config (user, primary group and specified) The specified rule will look at the job config and if you have "mapreduce.job.queuename" then that rule will trigger. Again only if the queue exists. If none of those rules apply then default will trigger. I think you have not specified the correct queue in your job... For the acls: if you have admin rights you also have submit rights. If you have submit rights then you do not have admin rights. Admin rights in a queue will only give you kill on any application in that queue extra. submit acls are enforced as are the admin acls. For both to work you must have a yarn.admin.acl configured and not set to "*" since that wil lmake any user a yarn admin. Wilfred

Wilfred · ‎06-16-2015

Specifying worker opts in the client does not really make sense. The worker needs to know what it needs to clean up and it should be set on the worker. Try adding the whole string (that you have between the quotes to the "Additional Worker args" for the worker. Wilfred

Wilfred · ‎06-16-2015

Using the class path precedence is not the correct solution for all cases. A solution that will work in all cases is to use shading for the classes that you have modified versions of (use maven or gradle to do that). In your case you need to shade the parquet classes that you have modified when you package the jar. Be careful if you change classes like parquet: you could ed up with files that are only readable with your code and force you to keep packaging it with all jobs. That could cause problems later if you decide to use a different method to access the files. Wilfred

Wilfred · ‎06-16-2015

Specifying an acl does not mean that a job gets placed into that queue. Your placement policies somehow need to put the job into the queue based on the information you have inside the config. Check the placement policies and how they work as per FairScheduler configuration. Also your "yarn application -list" does not correspond to the configuration you have given: the root.userx queue should not exist or run any applications based on the queue placement policy for the fair scheduler (policies have create="false") The capacity scheduler does not have a placement policy and would, I assume, dump ithe appllication in the first queue that is allowed. Wilfred

Wilfred · ‎06-09-2015

Can you check the path separator? I would have expected that on windows you would use the \ and not the / can you also explain how you start PySpark: do you use the cmd scripts or under cygwin? BTW: we do not test windows as a client, so you might see a known issue Wilfred

Online	Offline
Last Visited	‎05-14-2025 06:21 PM

Member Since	‎01-16-2014 10:22 PM
Last Visited	‎05-14-2025 06:21 PM
Posts	336
Kudos received	43

Cloudera Community

Re: Shall we run multiple spark version jobs innoo...

Re: CompositeGroupsMapping

Re: Yarn Fair Scheduler Allocation file not found ...

Re: Odd behavior when pending mappers get stuck on...

Re: Have various Spark version running on the clus...

Re: YARN - Jobs not running under assigned queues

Re: Restrict users for FairScheduler pool

Re: Restrict users for FairScheduler pool

Re: YARN - Jobs not running under assigned queues

Re: Issue on running spark application in Yarn-clu...

Re: YARN - Jobs not running under assigned queues

Re: How can I clean up a worker's directory automa...

Re: On CDH5.3.2, artifacts lib jar precedence over...

Re: YARN - Jobs not running under assigned queues

Re: PySpark issue on Windows (Java gateway proces...