About srowen

tarekabouzeid91 · ‎02-17-2015

okey i will try this thanks so much , but i have another question if you please, i add the external jars i need , all of them work normally , but the " org.apache.hadoop.hive.conf.HiveConf" which exists in hive-common-0.13.1-cdh5.3.0.jar , it gives the error " calss not found " so why that happens ? the command i run : sudo spark-submit --class "WordCount" --master local[*] --jars /usr/local/WordCount/target/scala-2.10/spark-streaming-flume_2.11-1.2.0.jar,/usr/lib/avro/avro-ipc-1.7.6-cdh5.3.0.jar,/usr/lib/flume-ng/lib/flume-ng-sdk-1.5.0-cdh5.3.0.jar,/usr/lib/hive/lib/hive-common-0.13.1-cdh5.3.0.jar,/usr/local/WordCount/target/scala-2.10/spark-hive_2.10-1.2.0-cdh5.3.0.jar /usr/local/WordCount/target/scala-2.10/wordcount_2.10-1.0.jar 127.0.0.1 9999

ansonabraham · ‎12-17-2014

joliveirinha, still have not resolved issues. Though, if you download spark 1.1.1. and install, that seems to work.

christinan · ‎12-11-2014

Hi Sean Just to let you know the outcome of this, all of my tests yesterday with Hadoop, with various parameters, on the one month of searches dataset, went on fine. I will not continue testing this further on the whole big dataset, as for the moment it looks like Hadoop is out of the picture, since I managed to get hold of a machine with 512GB of RAM which prooved up to the challange of running Oryx in memory. The dataset is 421MB, with roughly 20 million records, and it took just a few minutes to go through 29 iterations, so well done! Seemed like a big portion of time was spent writing the model (this is an SSD machine). (I will continue further by looking at recommendations response times, how's that affected when I ingest users etc etc) Thank you for the help with the bugs and all the explanations along the way.

crigano · ‎11-19-2014

Thanks veru much, I am new to Spark and drank the coolade on the communative and assoviative mandate for Cluster-based algorighms I very much appreciate you providing me an accurate view on implementations. I am very interested in the parelization of SNA and ML algorithms on clusters and appreciate any reading/references you can provide me. Thanks again for your time and insignt and I appreciate any further insight you can provide. In short; thanks Mate! Chris

srowen · ‎11-19-2014

It looks like you asked for more resources than you configured YARN to offer, so check how much you can allocate in YARN and how much Spark asked for. I don't know about the ERROR; it may be a red herring. Please have a look at http://spark.apache.org/docs/latest/ for pretty good Spark docs.

srowen · ‎11-19-2014

I'm not suggesting you log in as spark or a superuser. You shouldn't do this. Instead, change your app to not access directories you don't have access to as your user.

Srini_D · ‎10-29-2014

The dataset is having 100k records, but they correspond to 943 users and 1682. It will create vectors from the data and performs the similarity measure using them. Yes, its minue considering how much the distributed programming can scale upto, but interested to know how to alter the behaviour.

srowen · ‎10-29-2014

There is no user-based recommender based on Hadoop MapReduce. The closest thing is indeed an item-based implementation in https://github.com/apache/mahout/tree/master/mrlegacy/src/main/java/org/apache/mahout/cf/taste/hadoop/item It is also a recommender, but no it is always using item similarity.

Jinsu · ‎10-16-2014

Thank you, Sean.

srowen · ‎10-08-2014

The GroupLens data set is not in the project, since it can't be redistributed. It would normally be found under src/it/resources/grouplens100k . You can get the data sets here: http://grouplens.org/datasets/movielens/

Online	Offline
Last Visited	‎02-13-2018 12:34 PM

Member Since	‎08-11-2014 09:17 AM
Last Visited	‎02-13-2018 12:34 PM
Posts	481
Kudos received	87

Cloudera Community

Re: Own code editor in CDSW?

Re: error using Pandas within PySpark transformati...

Re: Does CDSW need to be part of the cluster?

Re: Local Data combined with HDFS

Re: Where can I find Oryx 1.x releases (or GitHub)

Re: Hive Context in CDH 5.3.x

Re: Spark java.lang.IllegalStateException: unread ...

Re: Oryx ALS: Hadoop computation yields MAP 0.00x,...

Re: Does Spark ML and Mahout support Random Forest...

Re: Spark on YARN in CDH-5

Re: Problems with cloudera quickstart/spark/readin...

Re: Distributed Recommender not giving recommendat...

Re: Distributed recommenderjob for user-based reco...

Re: tuning hyperparameters with Oryx 1.0

Re: Running als-computation SimpleIT test