Member since
08-01-2016
12
Posts
3
Kudos Received
0
Solutions
09-07-2016
07:40 AM
Hello Dan, Thank you a lot for the help, it worked! In addition, I would like to have the recall, precision and f1 as well. And I would like to see the random forest trees as well. Do you know how I can do it? I have 2 imbalanced classes, so I would like to have them for each class... Best regards, Laia
... View more
08-04-2017
02:47 PM
I would like to perform a 10 CV with random forest on an RDD input. But I am having a problem when converting the RDD input to a DataFrame.
I am using this code as you recommended:
import org.apache.spark.ml.Pipeline;
import org.apache.spark.ml.tuning.{ParamGridBuilder, CrossValidator};
import org.apache.spark.ml.classification.RandomForestClassifier; import org.apache.spark.ml.evaluation.MulticlassClassificationEvaluator; var inputPath = "..." var text = sc.textFile(inputPath) var rows = text.map(line => line.split(",").map(_.trim)).map(a => Row.fromSeq(a)) var header = rows.first() val schema = StructType(header.map(fieldName => StructField(fieldName.asInstanceOf[String],StringType,true))) val df = spark.createDataFrame(rows,schema) val nFolds: Int = 10 val NumTrees: Int = 30 val metric: String = "accuracy" val rf = new RandomForestClassifier()
.setLabelCol("label")
.setFeaturesCol("features")
.setNumTrees(NumTrees) val pipeline = new Pipeline().setStages(Array(rf))
val paramGrid = new ParamGridBuilder().build() // No parameter search
val evaluator = new MulticlassClassificationEvaluator()
.setLabelCol("label")
.setPredictionCol("prediction")
.setMetricName(metric) val cv = new CrossValidator()
.setEstimator(pipeline)
.setEvaluator(evaluator)
.setEstimatorParamMaps(paramGrid)
.setNumFolds(nFolds) val model = cv.fit(df) // trainingData: DataFrame Any help please?
Thank you.
... View more