About Wilfred

Wilfred · ‎01-24-2016

A vcore is a virtual core. You can define it however you want. You could, as an example, define that a vcore is the processing power that is delivered by a 1GHz thread core. A 3GHz core would than be comparable to 3 vcores in the node manager. Your container request then needs to use multiple vcores which handles the difference in speed. Not a lot of clusters do this due to the administrative overhead and the fact that if the end users do not use the vcore correctly it can overload the faster machines. Wilfred

Wilfred · ‎01-20-2016

You need to setup the nodes with the proper vcores and memory available for the NM. That should solve the problem. It will put more load on the larger nodes than on the small nodes. The container is also scheduled on the node based on the data locality which is out of your control. You can however not say start processing of the split on a specific node. Wilfred

Wilfred · ‎01-20-2016

CDH 5.3 does not come with Spark 1.5. You are running an unsupported cluster. Please be aware of that. Weight has nothing to do with the pre-emption. It is a common misunderstanding. The weight is just to decide which queue gets a higher priority during the scheduling cycle. So if I have queues with the weights 3:1:1 then from every 10 schedule attempts 6 will go to the queue with weight 3 and 2 attempts will be for each queue with weight 1, totalling 10 attempts. Minimum share preemption works only if you have the minimum and maximum shares for a queue set. Make sure you have that. The fair share of a queue is calculated based on the demand in the queue (i.e. the applications that are running in the queue). You thus might not be hitting the fair share preemption threshold.... Wilfred

Wilfred · ‎09-28-2015

With the --files option you put the file in your working directory on the executor. You are trying to point to the file using an absolute path which is not what files option does for you. Can you use just the name "rule2.xml" and not a path. When you read the documentation for the files. See the important note at the bottom of the page running on yarn. Also do not use the Resources.getResource() but just use a open of a java construct like: new FileInputStream("rule2.xml") or something like it. Wilfred

Wilfred · ‎09-28-2015

To rule out that we have a custom jar issue can you run the pi example to make sure that the cluster is (not) setup correctly? We have documented how to run a spark application, with the example in our docs. The error that you show points to a classpath error and you can not find the Spark classes on your class path. WIlfred

Wilfred · ‎09-22-2015

To be completely in control I often recommend to use a shading tool for libraries like this. Using maven shade or gradle shadow to make sure that your code references your version is a shure fire way to get this working. When you build your project you "shade" the references in your code which means it always uses your version. Wilfred

Wilfred · ‎09-21-2015

The action of adding a gateway role for Spark on a new machine managed by CM I do on a regular basis for different versions of CM and have never had a problem with submitting an application from a node like that. The classpath for the application is part of the submitted application context and not based on the executor path. How would you otherwise add classes to the classpath that are application specific? Wilfred

Wilfred · ‎09-21-2015

The security group cache will only work within a JVM. If you have lots of JVM's or short lived JVM's for your jobs then caching inside the JVM will only give a limited relief. NSCD will prevent the OS going out for every call that is made and works over different JVM's. So instead of having one call per JVM you will now have one call for a lot of JVM's Wilfred

Wilfred · ‎09-21-2015

Whatever you use as a spark-submit from the command line is what you use in the oozie shell action. Make sure that you have the proper gateway for Spark and YARN installed on the oozie server so it has the configuration needed. The rest works as if you have a standard oozie shell action (i.e. create the workflow, properties and shell script files) and place the files on the machine/hdfs so they can be found. Wilfred

Wilfred · ‎09-03-2015

It should not pose a problem. If it does let us know but we have not seen an issue with this. Wilfred

Online	Offline
Last Visited	‎05-14-2025 06:21 PM

Member Since	‎01-16-2014 10:22 PM
Last Visited	‎05-14-2025 06:21 PM
Posts	336
Kudos received	43

Cloudera Community

Re: Shall we run multiple spark version jobs innoo...

Re: CompositeGroupsMapping

Re: Yarn Fair Scheduler Allocation file not found ...

Re: Odd behavior when pending mappers get stuck on...

Re: Have various Spark version running on the clus...

Re: MapReduce 2 Optimization in Heterogeneous Clus...

Re: MapReduce 2 Optimization in Heterogeneous Clus...

Re: YARN fails to preempt Spark job

Re: Spark : File not found error .... works fine i...

Re: how to get the spark support with 4.0.0-cdh5.3...

Re: spark-submit on additional machine

Re: spark-submit on additional machine

Re: Yarn MR overloads Active Directory domain cont...

Re: how to get the spark support with 4.0.0-cdh5.3...

Re: Spark distributed classpath