About mgaido

mgaido · ‎09-06-2016

Could you please post the full stack trace of the exception? It looks like the indexer is not creating properly the label_idx column...

mgaido · ‎09-05-2016

No, that code is not using cross-validation. An example about how to use cross validation can be found here. It needs the DataFrame API, so you should refer to this for the Random Forest implementation.

mgaido · ‎07-25-2016

You could use https://spark.apache.org/docs/1.6.1/api/python/pyspark.html#pyspark.RDD.zipWithUniqueId.

mgaido · ‎07-20-2016

The easiest way is to use the method saveAsObjectFile and read it through the objectFile method... You can easily find them in Spark documentation for further details about them.

mgaido · ‎07-20-2016

Tou can convert a org.apache.mahout.math.Vector into a org.apache.spark.mllib.linalg.Vector by using the iterateNonZero() or iterateAll() methods of org.apache.mahout.math.Vector. In fact, if you Vector is sparse the first option is the best. In this case you can build two arrays via the iterateNonZero: one containing all the non-zero indexes and the other with the corresponding values, i.e. ArrayList<Double> values = new ArrayList<Double>(); ArrayList<Integer> indexes = new ArrayList<Integer>(); org.apache.mahout.math.Vector v = ... Iterator<Element> it = v.iterateNonZero(); while(it.hasNext()){ Element e = it.next(); values.add(e.get()); indexes.add(e.index()); } Vectors.sparse(v.size(), indexes.toArray(new Integer[indexes.size()]) ,values.toArray(new Double[values.size()])); You can do the same thing if you have a dense Vector using the iterateAll() method and Vectors.dense.

mgaido · ‎07-07-2016

You're saying that you're changing "Enable authorization". I'm saying that you have to change "Choose authorization". They are different things. "Enable authorization" is in the Advanced tab, "Choose authorization" is in the Settings tab.

mgaido · ‎07-05-2016

Because you've not done what I told you. You have to go in the config tab in Ambari, in the Settings tab (not Advanced) and change "Choose authorization" to "None" (instead of Ranger) in the Security area.

mgaido · ‎07-04-2016

The issue is related to the fact that Ranger is used for the authorization. You just need to go on Hive config tab in Ambari, select None as Authorization in the Security section and restart Hive.

mgaido · ‎06-30-2016

Very interesting, but I think it'd have been as more even comparision if you'd have used SparkSQL csv reader from databricks to read the file for DataFrame and SparkSQL tests, otherwise there is the overhead of converting the RDD to a DataFrame...

mgaido · ‎06-28-2016

Which is the problem using a local file? Indeed is what you have to do... There is no reason to specify the path of the file on hdfs.

Online	Offline
Last Visited	‎03-08-2017 05:54 AM

Member Since	‎01-09-2017 02:57 AM
Last Visited	‎03-08-2017 05:54 AM
Posts	55
Kudos received	14

Cloudera Community

Re: 10-fold cross validation in Random Forests

Re: In Java Convert Mahout Vector to Spark Vector

Re: HDP 2.5 TP issue running Hive query - xasecure...

Re: Change views hive in hue

Re: Can't find example mapreduce jar file

Re: 10-fold cross validation in Random Forests

Re: 10-fold cross validation in Random Forests

Re: What is the best way to assign a sequence numb...

Re: In Java Convert Mahout Vector to Spark Vector

Re: In Java Convert Mahout Vector to Spark Vector

Re: HDP 2.5 TP issue running Hive query - xasecure...

Re: HDP 2.5 TP issue running Hive query - xasecure...

Re: HDP 2.5 TP issue running Hive query - xasecure...

Re: Spark RDDs vs DataFrames vs SparkSQL

Re: run a python script containing commands spark