Member since
07-29-2013
366
Posts
69
Kudos Received
71
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
5089 | 03-09-2016 01:21 AM | |
4304 | 03-07-2016 01:52 AM | |
13548 | 02-29-2016 04:40 AM | |
4024 | 02-22-2016 03:08 PM | |
5022 | 01-19-2016 02:13 PM |
10-06-2014
02:10 PM
Solved this by having following property defined in workflow.xml. <configuration> <property> <name>oozie.launcher.mapreduce.job.user.classpath.first</name> <value>true</value> </property> ..... </configuration>
... View more
09-26-2014
02:01 AM
Have a look at the parent packaging pom to see some additional settings like this that affect the CDH packaging. I don't know how much they're documented beyond this as it's generally rare for anyone to try to rebuild the source. Still, it ought not be hard.
... View more
09-15-2014
05:19 AM
Your signature is just a little bit off. The result of a join is not a triple, but a tuple whose second element is a tuple. You have: (_, (_, _),(_,_,device)) but I think you need: (_, ((_, _),(_,_,device)))
... View more
09-12-2014
06:49 AM
It will make a difference insofar as the driver program will run either out on the cluster (yarn-cluster) or locally (yarn-client). The same issue remains -- the processes need to talk to each other on certain ports. But it affects where the driver is and that affects what machine's ports need to be open. For example, if your ports are all open within your cluster, I expect that yarn-cluster works directly.
... View more
09-12-2014
06:23 AM
I believe it was added in 1.1, yes. I don't have a streaming app driver handy, so maybe double-check -- you will see an obvious Streaming tab if it's there. Without guaranteeing anything, I think the next CDH will have 1.1, and at any time you can run your own Spark jobs with any version under YARN.
... View more
09-10-2014
08:14 AM
I think you imported just about everything except the one thing you need to get implicit conversions that unlock the functions in PairRDDFunctions, which is where join() is defined. You need: import org.apache.spark.SparkContext._ In the shell this is imported by default.
... View more
08-31-2014
01:21 AM
The default is that you manually trigger model builds. But you can configure it to build after a certain amount of time has elapsed, or a certain number of data points have been written. See model.time-threshold and model.data-threshold. Yes, all data points cause in-memory model updates no matter how they arrive.
... View more
08-13-2014
11:05 PM
Thanks for the solution.Will try the options available and give the feedback..
... View more