Member since
08-11-2014
481
Posts
92
Kudos Received
72
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
3436 | 01-26-2018 04:02 AM | |
7074 | 12-22-2017 09:18 AM | |
3530 | 12-05-2017 06:13 AM | |
3843 | 10-16-2017 07:55 AM | |
11161 | 10-04-2017 08:08 PM |
02-17-2015
04:34 AM
okey i will try this thanks so much , but i have another question if you please, i add the external jars i need , all of them work normally , but the " org.apache.hadoop.hive.conf.HiveConf" which exists in hive-common-0.13.1-cdh5.3.0.jar , it gives the error " calss not found " so why that happens ? the command i run : sudo spark-submit --class "WordCount" --master local[*] --jars /usr/local/WordCount/target/scala-2.10/spark-streaming-flume_2.11-1.2.0.jar,/usr/lib/avro/avro-ipc-1.7.6-cdh5.3.0.jar,/usr/lib/flume-ng/lib/flume-ng-sdk-1.5.0-cdh5.3.0.jar,/usr/lib/hive/lib/hive-common-0.13.1-cdh5.3.0.jar,/usr/local/WordCount/target/scala-2.10/spark-hive_2.10-1.2.0-cdh5.3.0.jar /usr/local/WordCount/target/scala-2.10/wordcount_2.10-1.0.jar 127.0.0.1 9999
... View more
12-17-2014
09:10 AM
joliveirinha, still have not resolved issues. Though, if you download spark 1.1.1. and install, that seems to work.
... View more
12-11-2014
10:01 AM
Hi Sean Just to let you know the outcome of this, all of my tests yesterday with Hadoop, with various parameters, on the one month of searches dataset, went on fine. I will not continue testing this further on the whole big dataset, as for the moment it looks like Hadoop is out of the picture, since I managed to get hold of a machine with 512GB of RAM which prooved up to the challange of running Oryx in memory. The dataset is 421MB, with roughly 20 million records, and it took just a few minutes to go through 29 iterations, so well done! Seemed like a big portion of time was spent writing the model (this is an SSD machine). (I will continue further by looking at recommendations response times, how's that affected when I ingest users etc etc) Thank you for the help with the bugs and all the explanations along the way.
... View more
11-19-2014
05:53 AM
Thanks veru much, I am new to Spark and drank the coolade on the communative and assoviative mandate for Cluster-based algorighms I very much appreciate you providing me an accurate view on implementations. I am very interested in the parelization of SNA and ML algorithms on clusters and appreciate any reading/references you can provide me. Thanks again for your time and insignt and I appreciate any further insight you can provide. In short; thanks Mate! Chris
... View more
11-19-2014
01:53 AM
It looks like you asked for more resources than you configured YARN to offer, so check how much you can allocate in YARN and how much Spark asked for. I don't know about the ERROR; it may be a red herring. Please have a look at http://spark.apache.org/docs/latest/ for pretty good Spark docs.
... View more
11-19-2014
12:10 AM
1 Kudo
I'm not suggesting you log in as spark or a superuser. You shouldn't do this. Instead, change your app to not access directories you don't have access to as your user.
... View more
10-29-2014
11:31 AM
The dataset is having 100k records, but they correspond to 943 users and 1682. It will create vectors from the data and performs the similarity measure using them. Yes, its minue considering how much the distributed programming can scale upto, but interested to know how to alter the behaviour.
... View more
10-29-2014
09:16 AM
There is no user-based recommender based on Hadoop MapReduce. The closest thing is indeed an item-based implementation in https://github.com/apache/mahout/tree/master/mrlegacy/src/main/java/org/apache/mahout/cf/taste/hadoop/item It is also a recommender, but no it is always using item similarity.
... View more
10-08-2014
11:07 PM
The GroupLens data set is not in the project, since it can't be redistributed. It would normally be found under src/it/resources/grouplens100k . You can get the data sets here: http://grouplens.org/datasets/movielens/
... View more
- « Previous
- Next »