About srowen

srowen · ‎11-03-2014

Yes they will usually be in [0,1] but not always. They aren't probabilities. They are entries in X*Y', yes. I think it's safe to take values >=1 as a very strong positive. What's a good cutoff? it really depends on your semantics and use case. They are comparable across models so you can probably determine a value empirically with some testing.

Intern9 · ‎10-31-2014

Thanks!

srowen · ‎10-30-2014

The spark app will run as whatever user you submitted it as, or should be. I would just make the directory writable to that user if at all possible.

robertrichter · ‎10-27-2014

For testing I finally configured the firewall on the remote machine and allow any connection from the cloudera hosts. This works for me.

srowen · ‎10-22-2014

That's really what IDRescorer is for, yes. If you need it in distributed mode you can reimplement the same idea by changing the code. I don't think it's really a clustering problem; you're just filtering based on clear attributes. You could also think of it a search relevance problem, and combine the results of a recommender and search engine in your app. No, ALS has no concept of attributes. It's a different, longer story, but you can always use 'fake' users and items corresponding to topics or labels to inject this information in the ALS model.

Shailesh · ‎10-16-2014

Thanks Srowen. After tunning the Java Heap memory for all nodes through CM and also increased the driver and worker memory to 6GB and 3GB resp the "TaskSchedulerImpl: Initial job has not accepted any resources" issue got resolved. Regards, Shailesh

Harihar · ‎10-15-2014

Hi I' running cdh5.1.3 libs on cdh5 cluster but when I run spark pgrogramm it gives me these exception : 2014-10-16 14:37:38,312 INFO org.apache.spark.deploy.worker.Worker: Asked to launch executor app-20141016143738-0008/1 for SparkROnCluster 2014-10-16 14:37:38,317 ERROR org.apache.spark.deploy.worker.ExecutorRunner: Error running executor java.io.IOException: Cannot run program "/run/cloudera-scm-agent/process/256-spark-SPARK_WORKER/bin/compute-classpath.sh" (in directory "."): error=2, No such file or directory at java.lang.ProcessBuilder.start(ProcessBuilder.java:1041) at org.apache.spark.util.Utils$.executeAndGetOutput(Utils.scala:759) at org.apache.spark.deploy.worker.CommandUtils$.buildJavaOpts(CommandUtils.scala:72) at org.apache.spark.deploy.worker.CommandUtils$.buildCommandSeq(CommandUtils.scala:37) at org.apache.spark.deploy.worker.ExecutorRunner.getCommandSeq(ExecutorRunner.scala:109) at org.apache.spark.deploy.worker.ExecutorRunner.fetchAndRunExecutor(ExecutorRunner.scala:124) at org.apache.spark.deploy.worker.ExecutorRunner$$anon$1.run(ExecutorRunner.scala:58) Caused by: java.io.IOException: error=2, No such file or directory at java.lang.UNIXProcess.forkAndExec(Native Method) at java.lang.UNIXProcess.<init>(UNIXProcess.java:186) at java.lang.ProcessImpl.start(ProcessImpl.java:130) at java.lang.ProcessBuilder.start(ProcessBuilder.java:1022) ... 6 more When I'm runnig sample examples like wordcount tallSVd it runs fine. what changes I should made to make my appliation should run this script file?

swkane · ‎10-14-2014

I got the same error trying to run Spark on YARN. I fixed it by copying /usr/lib/hadoop/client/hadoop-mapreduce-client-core.jar into HDFS, and then putting that file in my /etc/spark/conf/spark-defaults.conf file for the 'spark.yarn.dist.files' directive: spark.yarn.dist.files /my/path/on/hdfs/hadoop-mapreduce-client-core.jar

srowen · ‎10-10-2014

Yes, that's all correct. Set time-threshold to 24 hours (1440 minutes) to rebuild once a day, regardless of the amount of data that has been written. Yes, the amount of data used to build the model is always increasing (unless you are manually deleting data, or, decaying data). It does sum up all counts for each user-item pair, so it is somewhat compacted this way.

ravi14 · ‎10-10-2014

Thanks a lot...

Online	Offline
Last Visited	‎02-06-2015 02:06 PM

Member Since	‎07-29-2013 08:58 AM
Last Visited	‎02-06-2015 02:06 PM
Posts	366
Kudos received	62

Cloudera Community

Re: CDH 5.6

Re: How to use Oryx 1 to detect spam email

Re: Spark program in eclipse

Re: Graphx in latest CDH

Re: Maturity ORYX

Re: How to use advanced functions for the /recomme...

Re: Best way to dump all data from Spark Streaming...

Re: Attempting to write remote Spark Streaming job...

Re: Spark Error Remote

Re: Mahout: How to user IDRescorer in Distributed ...

Re: spark-submit works on single node only

Re: TaskSchedulerImpl: Initial job has not accepte...

Re: JobConf NoClassDefFoundError on Simple spark-s...

Re: Question on /ingest service

Re: Java or Scala or Python on Spark