About Wilfred

Wilfred · ‎09-01-2015

yes CM generates this as part of the gateway (client config). The classpath text file is generated by CM based on the dependencies that are defined in the deployment. This is not something you can change. As you can see in the upstream docs we use a form of hadoop free distribution but we still only test this with CDH and the specific dependencies. Does that explain what you are lookign for? WIlfred

Wilfred · ‎09-01-2015

For adding custom classes to the classpath you should use one of the two following options: - add them via the command line options - add them via the config For the driver you have the option to use: --driver-class-path /path/to/file Or for the the executor use --conf "spark.executor.extraClassPath=/path/to/jar" In spark-defaults.conf set the two values (or one if you only need it for one side spark.driver.extraClassPath spark.executor.extraClassPath This can be done through the CM UI. Depending on the exact thing you are doing you might see limitations of which option you can use. Wilfred

Wilfred · ‎09-01-2015

What have you set for the maxAMShare on the queue or in the scheduler default? There is a setting called queueMaxAMShareDefault it defaults to 50% or 0.5f which means that a queue can not assign more than 50% of its resources to AM container(s). Wilfred

Wilfred · ‎08-26-2015

This change will not happen. You can not change the scheduler without restarting the resource manager. It is not a job configurable setting but a server side setting only read on startup. Wilfred

Wilfred · ‎08-25-2015

The only way to use Spark when you do not have a Spark action is to use the shell based action and create the proper spark-submit command for it. You will need to make sure that the configuration and classpath etc is set from the action. Wilfred

Wilfred · ‎08-20-2015

No sorry I can not deduce that from the data given. Check the RM logs that is whee all scheduling activities are logged. Wilfred

Wilfred · ‎08-20-2015

The setting is part of the queue in the dynamic resource pools configuration for Cloudera Manager. It is exposed in CM 5.4 and is only available from CDH 5.1 onwards. The scheduler does not look at disks as a resource (it might in the future) but for now follow the YARN tuning documentation which will take into account the disks when you calculate the values. Wilfred

Wilfred · ‎08-19-2015

This is not a case of not being documented: a har file is created by running a MR job. When accessing it you use the har uri and really are just following pointers. i would suggest that you look at sequence files and not at the har archives. Sequence files are the solution for the issue you are looking at and can be created and accessed using the standard API. Wilfred

Wilfred · ‎08-18-2015

If you run spark on yarn you have two modes: client and cluster. If you run in cluster mode the AM also runs the driver. If you run client mode the driver runs locally and there is just an AM to manage the executors. Check running on yarn for the full details. The amount of resources that can be used for AM's is limited, check for the "Max Application Master Share" for the queues. On top of the memory requirement that you have set for the executor are the heap sizes on top of that the overhead will be added (7% with a minimum of 384MB). That will increase you container request and could see it being round up to 2GB based on the increment value you have set. Check yarn.scheduler.increment-allocation-mb Like you did you need to check what is available on each node to see if you have room, having space in the cluster does not always mean that you can satisfy the request. Wilfred

Wilfred · ‎08-18-2015

Depending on how you have set up yarn hive should be part of the "allowed.system.users" list for the NM's that list will white list all system users below the "min.user.id". There is also a list of "banned.users" that lists all users that are not allowed to run containers. All these three need to be in sync to allow running a container. The hdfs user should not be allowed since it is the superuser and could circumvent the HDFS access permissions. When you execute a job from hue authentication is taken care of by hue. It will make sure that some kind of kerberos initialisation is performed. Wilfred

Online	Offline
Last Visited	‎02-15-2023 08:41 PM

Member Since	‎01-16-2014 10:22 PM
Last Visited	‎02-15-2023 08:41 PM
Posts	336
Kudos received	43

Cloudera Community

Re: Shall we run multiple spark version jobs innoo...

Re: CompositeGroupsMapping

Re: Yarn Fair Scheduler Allocation file not found ...

Re: Odd behavior when pending mappers get stuck on...

Re: Have various Spark version running on the clus...

Re: Spark distributed classpath

Re: Spark distributed classpath

Re: Application master gets stuck when it gets mor...

Re: When i set the scheduler from program, it impa...

Re: how to get the spark support with 4.0.0-cdh5.3...

Re: Max apps per user.

Re: Max apps per user.

Re: Build a HAR (Hadoop Archive) using Spark

Re: Max apps per user.

Re: map reduce 2.0 throwing error after enabling k...