Member since
01-16-2014
336
Posts
43
Kudos Received
31
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
3402 | 12-20-2017 08:26 PM | |
3376 | 03-09-2017 03:47 PM | |
2842 | 11-18-2016 09:00 AM | |
5027 | 05-18-2016 08:29 PM | |
3854 | 02-29-2016 01:14 AM |
09-01-2015
07:35 AM
1 Kudo
yes CM generates this as part of the gateway (client config). The classpath text file is generated by CM based on the dependencies that are defined in the deployment. This is not something you can change. As you can see in the upstream docs we use a form of hadoop free distribution but we still only test this with CDH and the specific dependencies. Does that explain what you are lookign for? WIlfred
... View more
09-01-2015
05:55 AM
For adding custom classes to the classpath you should use one of the two following options: - add them via the command line options - add them via the config For the driver you have the option to use: --driver-class-path /path/to/file Or for the the executor use --conf "spark.executor.extraClassPath=/path/to/jar" In spark-defaults.conf set the two values (or one if you only need it for one side spark.driver.extraClassPath spark.executor.extraClassPath This can be done through the CM UI. Depending on the exact thing you are doing you might see limitations of which option you can use. Wilfred
... View more
09-01-2015
05:18 AM
1 Kudo
What have you set for the maxAMShare on the queue or in the scheduler default? There is a setting called queueMaxAMShareDefault it defaults to 50% or 0.5f which means that a queue can not assign more than 50% of its resources to AM container(s). Wilfred
... View more
08-26-2015
05:04 AM
This change will not happen. You can not change the scheduler without restarting the resource manager. It is not a job configurable setting but a server side setting only read on startup. Wilfred
... View more
08-25-2015
02:02 AM
The only way to use Spark when you do not have a Spark action is to use the shell based action and create the proper spark-submit command for it. You will need to make sure that the configuration and classpath etc is set from the action. Wilfred
... View more
08-20-2015
08:32 PM
No sorry I can not deduce that from the data given. Check the RM logs that is whee all scheduling activities are logged. Wilfred
... View more
08-20-2015
04:20 PM
The setting is part of the queue in the dynamic resource pools configuration for Cloudera Manager. It is exposed in CM 5.4 and is only available from CDH 5.1 onwards. The scheduler does not look at disks as a resource (it might in the future) but for now follow the YARN tuning documentation which will take into account the disks when you calculate the values. Wilfred
... View more
08-19-2015
12:16 AM
This is not a case of not being documented: a har file is created by running a MR job. When accessing it you use the har uri and really are just following pointers. i would suggest that you look at sequence files and not at the har archives. Sequence files are the solution for the issue you are looking at and can be created and accessed using the standard API. Wilfred
... View more
08-18-2015
11:58 PM
If you run spark on yarn you have two modes: client and cluster. If you run in cluster mode the AM also runs the driver. If you run client mode the driver runs locally and there is just an AM to manage the executors. Check running on yarn for the full details. The amount of resources that can be used for AM's is limited, check for the "Max Application Master Share" for the queues. On top of the memory requirement that you have set for the executor are the heap sizes on top of that the overhead will be added (7% with a minimum of 384MB). That will increase you container request and could see it being round up to 2GB based on the increment value you have set. Check yarn.scheduler.increment-allocation-mb Like you did you need to check what is available on each node to see if you have room, having space in the cluster does not always mean that you can satisfy the request. Wilfred
... View more
08-18-2015
11:17 PM
1 Kudo
Depending on how you have set up yarn hive should be part of the "allowed.system.users" list for the NM's that list will white list all system users below the "min.user.id". There is also a list of "banned.users" that lists all users that are not allowed to run containers. All these three need to be in sync to allow running a container. The hdfs user should not be allowed since it is the superuser and could circumvent the HDFS access permissions. When you execute a job from hue authentication is taken care of by hue. It will make sure that some kind of kerberos initialisation is performed. Wilfred
... View more