Member since
01-16-2014
336
Posts
43
Kudos Received
31
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
3428 | 12-20-2017 08:26 PM | |
3393 | 03-09-2017 03:47 PM | |
2871 | 11-18-2016 09:00 AM | |
5103 | 05-18-2016 08:29 PM | |
3905 | 02-29-2016 01:14 AM |
01-24-2016
03:09 PM
1 Kudo
A vcore is a virtual core. You can define it however you want. You could, as an example, define that a vcore is the processing power that is delivered by a 1GHz thread core. A 3GHz core would than be comparable to 3 vcores in the node manager. Your container request then needs to use multiple vcores which handles the difference in speed. Not a lot of clusters do this due to the administrative overhead and the fact that if the end users do not use the vcore correctly it can overload the faster machines. Wilfred
... View more
01-20-2016
07:12 PM
1 Kudo
You need to setup the nodes with the proper vcores and memory available for the NM. That should solve the problem. It will put more load on the larger nodes than on the small nodes. The container is also scheduled on the node based on the data locality which is out of your control. You can however not say start processing of the split on a specific node. Wilfred
... View more
01-20-2016
06:53 PM
1 Kudo
CDH 5.3 does not come with Spark 1.5. You are running an unsupported cluster. Please be aware of that. Weight has nothing to do with the pre-emption. It is a common misunderstanding. The weight is just to decide which queue gets a higher priority during the scheduling cycle. So if I have queues with the weights 3:1:1 then from every 10 schedule attempts 6 will go to the queue with weight 3 and 2 attempts will be for each queue with weight 1, totalling 10 attempts. Minimum share preemption works only if you have the minimum and maximum shares for a queue set. Make sure you have that. The fair share of a queue is calculated based on the demand in the queue (i.e. the applications that are running in the queue). You thus might not be hitting the fair share preemption threshold.... Wilfred
... View more
09-28-2015
08:48 PM
With the --files option you put the file in your working directory on the executor. You are trying to point to the file using an absolute path which is not what files option does for you. Can you use just the name "rule2.xml" and not a path. When you read the documentation for the files. See the important note at the bottom of the page running on yarn. Also do not use the Resources.getResource() but just use a open of a java construct like: new FileInputStream("rule2.xml") or something like it. Wilfred
... View more
09-28-2015
08:08 PM
To rule out that we have a custom jar issue can you run the pi example to make sure that the cluster is (not) setup correctly? We have documented how to run a spark application, with the example in our docs. The error that you show points to a classpath error and you can not find the Spark classes on your class path. WIlfred
... View more
09-22-2015
04:05 AM
To be completely in control I often recommend to use a shading tool for libraries like this. Using maven shade or gradle shadow to make sure that your code references your version is a shure fire way to get this working. When you build your project you "shade" the references in your code which means it always uses your version. Wilfred
... View more
09-21-2015
11:36 PM
The action of adding a gateway role for Spark on a new machine managed by CM I do on a regular basis for different versions of CM and have never had a problem with submitting an application from a node like that. The classpath for the application is part of the submitted application context and not based on the executor path. How would you otherwise add classes to the classpath that are application specific? Wilfred
... View more
09-21-2015
10:59 PM
The security group cache will only work within a JVM. If you have lots of JVM's or short lived JVM's for your jobs then caching inside the JVM will only give a limited relief. NSCD will prevent the OS going out for every call that is made and works over different JVM's. So instead of having one call per JVM you will now have one call for a lot of JVM's Wilfred
... View more
09-21-2015
03:30 AM
Whatever you use as a spark-submit from the command line is what you use in the oozie shell action. Make sure that you have the proper gateway for Spark and YARN installed on the oozie server so it has the configuration needed. The rest works as if you have a standard oozie shell action (i.e. create the workflow, properties and shell script files) and place the files on the machine/hdfs so they can be found. Wilfred
... View more
09-03-2015
11:48 AM
It should not pose a problem. If it does let us know but we have not seen an issue with this. Wilfred
... View more