About srowen

srowen · ‎08-05-2014

Why? in a kerberized environment, to access resources you need to integrate with kerberos. The Spark project hasn't implemented anything like that. YARN works with kerberos, and so it can work with kerberos by leveraging YARN. Maybe part of the answer is, why is it necessary if it works through YARN?

buntu · ‎08-04-2014

Thanks Sean.. I'm currently computing uniques visitors per page and running a count distinct using SparkSQL. We also run the non-spark jobs on the cluster, so if we allocate the 2GB I'm assuming we can't run any other jobs simultaneously. Also, I'm also looking to see how to set the storage levels in CM.

srowen · ‎08-03-2014

The method is "textFile" not "textfile" https://spark.apache.org/docs/1.0.0/api/scala/index.html#org.apache.spark.SparkContext

srowen · ‎07-29-2014

Bad news: not directly. the design goal here is real-time scoring. You could write a process that queries an embedded Serving Layer, or, calls to one via HTTP. It's a bit more overhead, but certainly works. The bulk recommend function is a hold-over from the older code base, really. There wasn't an equivalent for classification. Good news: since the output is a PMML model, and libraries like openscoring exist, you could fairly easily wire up a Mapper that loads a model and scores data.

Xuesong · ‎07-03-2014

Thanks maestro.

Thatcher · ‎06-30-2014

Perfect - Hadoop home was pointing to the wrong place, that was being picked up. I am able to submit applications just fine now. Thanks.

Xuesong · ‎06-20-2014

Thanks.

mahoutmaster · ‎05-21-2014

Thank you for your effort. No, this file is not empty: here you can check it part-r-00000 I would like to see all the vectors with information about cluster for each. It would be nice to see also centers of the clusters. I changed IntWritable key = new IntWritable(); WeightedPropertyVectorWritable value = new WeightedPropertyVectorWritable(); to this Text key = new Text(); ClusterWritable value = new ClusterWritable(); I have not got any exception but the oputput is: org.apache.mahout.clustering.iterator.ClusterWritable@572c4a12 belongs to cluster C-0 org.apache.mahout.clustering.iterator.ClusterWritable@572c4a12 belongs to cluster C-1 --- EDIT: I changed value.toString() to value.getValue() and now, I have got an output: C-0: {0:0.07,1:0.9499999999999998} belongs to cluster C-0 C-1: {0:12.25,1:12.9} belongs to cluster C-1 Thank you very much !!!!

Xuesong · ‎04-28-2014

Thanks, Sean. You're a saint.

Xuesong · ‎04-28-2014

You are so kind. Thanks for your help.

Online	Offline
Last Visited	‎02-06-2015 02:06 PM

Member Since	‎07-29-2013 08:58 AM
Last Visited	‎02-06-2015 02:06 PM
Posts	366
Kudos received	62

Cloudera Community

Re: CDH 5.6

Re: How to use Oryx 1 to detect spam email

Re: Spark program in eclipse

Re: Graphx in latest CDH

Re: Maturity ORYX

Re: Using Spark on a Kerberos Cluster

Re: Spark app throwing java.lang.OutOfMemoryError:...

Re: Unable to use the Spark Conecxt Variable in th...

Re: Scoring data on hadoop with Oryx at large scal...

Re: How to calculate the similarity of movies in S...

Re: Executing application with spark-class

Re: How to set set up Spark environment in Scala A...

Re: How to print data after canopy clustering

Re: How to use oryx to build recomendation app

Re: Using mahout 0.8 build recommendation app in c...