About srowen

ctchiu888 · ‎10-06-2014

Solved this by having following property defined in workflow.xml. <configuration> <property> <name>oozie.launcher.mapreduce.job.user.classpath.first</name> <value>true</value> </property> ..... </configuration>

srowen · ‎09-26-2014

Have a look at the parent packaging pom to see some additional settings like this that affect the CDH packaging. I don't know how much they're documented beyond this as it's generally rare for anyone to try to rebuild the source. Still, it ought not be hard.

Jason.Chen · ‎09-19-2014

Yes, we tried it and it worked great. Thanks.

srowen · ‎09-15-2014

Your signature is just a little bit off. The result of a join is not a triple, but a tuple whose second element is a tuple. You have: (_, (_, _),(_,_,device)) but I think you need: (_, ((_, _),(_,_,device)))

srowen · ‎09-12-2014

It will make a difference insofar as the driver program will run either out on the cluster (yarn-cluster) or locally (yarn-client). The same issue remains -- the processes need to talk to each other on certain ports. But it affects where the driver is and that affects what machine's ports need to be open. For example, if your ports are all open within your cluster, I expect that yarn-cluster works directly.

srowen · ‎09-12-2014

I believe it was added in 1.1, yes. I don't have a streaming app driver handy, so maybe double-check -- you will see an obvious Streaming tab if it's there. Without guaranteeing anything, I think the next CDH will have 1.1, and at any time you can run your own Spark jobs with any version under YARN.

srowen · ‎09-10-2014

I think you imported just about everything except the one thing you need to get implicit conversions that unlock the functions in PairRDDFunctions, which is where join() is defined. You need: import org.apache.spark.SparkContext._ In the shell this is imported by default.

nishi · ‎09-03-2014

Thanks you very much It solved my problem.

srowen · ‎08-31-2014

The default is that you manually trigger model builds. But you can configure it to build after a certain amount of time has elapsed, or a certain number of data points have been written. See model.time-threshold and model.data-threshold. Yes, all data points cause in-memory model updates no matter how they arrive.

ArunShell · ‎08-13-2014

Thanks for the solution.Will try the options available and give the feedback..

Online	Offline
Last Visited	‎02-06-2015 02:06 PM

Member Since	‎07-29-2013 08:58 AM
Last Visited	‎02-06-2015 02:06 PM
Posts	366
Kudos received	62

Cloudera Community

Re: CDH 5.6

Re: How to use Oryx 1 to detect spam email

Re: Spark program in eclipse

Re: Graphx in latest CDH

Re: Maturity ORYX

Re: Run Spark App Error

Re: problem about compiling mahout from source in ...

Re: Get known items of a user

Re: Using filter in joined dataset in spark ?

Re: Akka Error while running Spark Jobs

Re: Metrics for a Spark Streaming Operation

Re: Joining Streaming Data with HDFS File

Re: Extra worker in spark

Re: Question aboit api end point /ingest

Re: Control the number of files created from Spark...