Member since
01-09-2017
55
Posts
14
Kudos Received
7
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
3815 | 09-05-2016 10:38 AM | |
1916 | 07-20-2016 08:22 AM | |
3904 | 07-04-2016 08:13 AM | |
1505 | 06-03-2016 08:01 AM | |
2073 | 05-05-2016 12:37 PM |
09-06-2016
07:53 AM
Could you please post the full stack trace of the exception? It looks like the indexer is not creating properly the label_idx column...
... View more
09-05-2016
10:38 AM
No, that code is not using cross-validation. An example about how to use cross validation can be found here. It needs the DataFrame API, so you should refer to this for the Random Forest implementation.
... View more
07-25-2016
02:57 PM
You could use https://spark.apache.org/docs/1.6.1/api/python/pyspark.html#pyspark.RDD.zipWithUniqueId.
... View more
07-20-2016
01:44 PM
The easiest way is to use the method saveAsObjectFile and read it through the objectFile method... You can easily find them in Spark documentation for further details about them.
... View more
07-20-2016
08:22 AM
Tou can convert a org.apache.mahout.math.Vector into a org.apache.spark.mllib.linalg.Vector by using the iterateNonZero() or iterateAll() methods of org.apache.mahout.math.Vector. In fact, if you Vector is sparse the first option is the best. In this case you can build two arrays via the iterateNonZero: one containing all the non-zero indexes and the other with the corresponding values, i.e. ArrayList<Double> values = new ArrayList<Double>();
ArrayList<Integer> indexes = new ArrayList<Integer>();
org.apache.mahout.math.Vector v = ...
Iterator<Element> it = v.iterateNonZero();
while(it.hasNext()){
Element e = it.next();
values.add(e.get());
indexes.add(e.index());
}
Vectors.sparse(v.size(), indexes.toArray(new Integer[indexes.size()]) ,values.toArray(new Double[values.size()])); You can do the same thing if you have a dense Vector using the iterateAll() method and Vectors.dense.
... View more
07-07-2016
09:56 AM
You're saying that you're changing "Enable authorization". I'm saying that you have to change "Choose authorization". They are different things. "Enable authorization" is in the Advanced tab, "Choose authorization" is in the Settings tab.
... View more
07-05-2016
07:03 AM
Because you've not done what I told you. You have to go in the config tab in Ambari, in the Settings tab (not Advanced) and change "Choose authorization" to "None" (instead of Ranger) in the Security area.
... View more
07-04-2016
08:13 AM
1 Kudo
The issue is related to the fact that Ranger is used for the authorization. You just need to go on Hive config tab in Ambari, select None as Authorization in the Security section and restart Hive.
... View more
06-30-2016
07:08 AM
Very interesting, but I think it'd have been as more even comparision if you'd have used SparkSQL csv reader from databricks to read the file for DataFrame and SparkSQL tests, otherwise there is the overhead of converting the RDD to a DataFrame...
... View more
06-28-2016
07:11 AM
Which is the problem using a local file? Indeed is what you have to do... There is no reason to specify the path of the file on hdfs.
... View more