Member since
07-29-2013
366
Posts
69
Kudos Received
71
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
5250 | 03-09-2016 01:21 AM | |
4423 | 03-07-2016 01:52 AM | |
13815 | 02-29-2016 04:40 AM | |
4158 | 02-22-2016 03:08 PM | |
5149 | 01-19-2016 02:13 PM |
11-03-2014
10:32 AM
Yes they will usually be in [0,1] but not always. They aren't probabilities. They are entries in X*Y', yes. I think it's safe to take values >=1 as a very strong positive. What's a good cutoff? it really depends on your semantics and use case. They are comparable across models so you can probably determine a value empirically with some testing.
... View more
10-30-2014
04:38 AM
The spark app will run as whatever user you submitted it as, or should be. I would just make the directory writable to that user if at all possible.
... View more
10-27-2014
08:08 AM
For testing I finally configured the firewall on the remote machine and allow any connection from the cloudera hosts. This works for me.
... View more
10-22-2014
05:09 AM
That's really what IDRescorer is for, yes. If you need it in distributed mode you can reimplement the same idea by changing the code. I don't think it's really a clustering problem; you're just filtering based on clear attributes. You could also think of it a search relevance problem, and combine the results of a recommender and search engine in your app. No, ALS has no concept of attributes. It's a different, longer story, but you can always use 'fake' users and items corresponding to topics or labels to inject this information in the ALS model.
... View more
10-16-2014
02:45 PM
Thanks Srowen. After tunning the Java Heap memory for all nodes through CM and also increased the driver and worker memory to 6GB and 3GB resp the "TaskSchedulerImpl: Initial job has not accepted any resources" issue got resolved. Regards, Shailesh
... View more
10-15-2014
07:57 PM
Hi I' running cdh5.1.3 libs on cdh5 cluster but when I run spark pgrogramm it gives me these exception : 2014-10-16 14:37:38,312 INFO org.apache.spark.deploy.worker.Worker: Asked to launch executor app-20141016143738-0008/1 for SparkROnCluster 2014-10-16 14:37:38,317 ERROR org.apache.spark.deploy.worker.ExecutorRunner: Error running executor java.io.IOException: Cannot run program "/run/cloudera-scm-agent/process/256-spark-SPARK_WORKER/bin/compute-classpath.sh" (in directory "."): error=2, No such file or directory at java.lang.ProcessBuilder.start(ProcessBuilder.java:1041) at org.apache.spark.util.Utils$.executeAndGetOutput(Utils.scala:759) at org.apache.spark.deploy.worker.CommandUtils$.buildJavaOpts(CommandUtils.scala:72) at org.apache.spark.deploy.worker.CommandUtils$.buildCommandSeq(CommandUtils.scala:37) at org.apache.spark.deploy.worker.ExecutorRunner.getCommandSeq(ExecutorRunner.scala:109) at org.apache.spark.deploy.worker.ExecutorRunner.fetchAndRunExecutor(ExecutorRunner.scala:124) at org.apache.spark.deploy.worker.ExecutorRunner$$anon$1.run(ExecutorRunner.scala:58) Caused by: java.io.IOException: error=2, No such file or directory at java.lang.UNIXProcess.forkAndExec(Native Method) at java.lang.UNIXProcess.<init>(UNIXProcess.java:186) at java.lang.ProcessImpl.start(ProcessImpl.java:130) at java.lang.ProcessBuilder.start(ProcessBuilder.java:1022) ... 6 more When I'm runnig sample examples like wordcount tallSVd it runs fine. what changes I should made to make my appliation should run this script file?
... View more
10-14-2014
05:02 PM
I got the same error trying to run Spark on YARN. I fixed it by copying /usr/lib/hadoop/client/hadoop-mapreduce-client-core.jar into HDFS, and then putting that file in my /etc/spark/conf/spark-defaults.conf file for the 'spark.yarn.dist.files' directive: spark.yarn.dist.files /my/path/on/hdfs/hadoop-mapreduce-client-core.jar
... View more
10-10-2014
09:06 AM
Yes, that's all correct. Set time-threshold to 24 hours (1440 minutes) to rebuild once a day, regardless of the amount of data that has been written. Yes, the amount of data used to build the model is always increasing (unless you are manually deleting data, or, decaying data). It does sum up all counts for each user-item pair, so it is somewhat compacted this way.
... View more