<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Re: error: java.lang.IllegalArgumentException: Field &amp;quot;label_idx&amp;quot; does not exist in Archives of Support Questions (Read Only)</title>
    <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/error-java-lang-IllegalArgumentException-Field-quot-label/m-p/136944#M39777</link>
    <description>&lt;P&gt;
	Hey Laia, you're close, but it looks like a couple arguments are out of order when you configure the indexer and initial randomforest object. &lt;/P&gt;&lt;P&gt;Label_idx is not visible to the randomforest object because the order of execution is off, and as a result it is not in the dataframe ("does not exist"). If you change up the order it should work. &lt;/P&gt;&lt;P&gt;I'd recommend de-coupling the indexer and rf object, and execute them as part of the pipeline. Here's the code that I got to work. I also added a few lines at the bottom to show the predictions and accuracy (feel free to modify to fit your requirements). Let me know if this helps. &lt;/P&gt;&lt;P&gt;
	---&lt;/P&gt;&lt;P&gt;val unparseddata = sc.textFile("hdfs:///tmp/your_data.csv")&lt;/P&gt;&lt;P&gt;val data = unparseddata.map { &lt;/P&gt;&lt;P&gt;    line =&amp;gt; val parts = line.split(',').map(_.toDouble) &lt;/P&gt;&lt;P&gt;    LabeledPoint(parts.last%2, Vectors.dense(parts.slice(0, parts.length - 1))) &lt;/P&gt;&lt;P&gt;    }.toDF()&lt;/P&gt;&lt;P&gt;val Array(trainingData, testData) = data.randomSplit(Array(0.7, 0.3))&lt;/P&gt;&lt;P&gt;val nFolds: Int = 10&lt;/P&gt;&lt;P&gt;val NumTrees: Int = 3&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;val indexer = new StringIndexer().setInputCol("label").setOutputCol("label_idx")&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;val rf = new RandomForestClassifier().setNumTrees(NumTrees).setFeaturesCol("features").setLabelCol("label_idx")&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;val pipeline = new Pipeline().setStages(Array(indexer, rf)) &lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;&lt;/STRONG&gt;
val paramGrid = new ParamGridBuilder().build()&lt;/P&gt;&lt;P&gt;val evaluator = new MulticlassClassificationEvaluator().setLabelCol("label").setPredictionCol("prediction")&lt;/P&gt;&lt;P&gt;val cv = new CrossValidator() .setEstimator(pipeline) .setEvaluator(evaluator) .setEstimatorParamMaps(paramGrid) .setNumFolds(nFolds)&lt;/P&gt;&lt;P&gt;val model = cv.fit(trainingData)&lt;/P&gt;&lt;P&gt;val predictions = model.transform(testData)&lt;/P&gt;&lt;P&gt;// Show model predictions&lt;/P&gt;&lt;P&gt;predictions.show()&lt;/P&gt;&lt;P&gt;val accuracy = evaluator.evaluate(predictions)&lt;/P&gt;&lt;P&gt;println("Accuracy:   " + accuracy)&lt;/P&gt;&lt;P&gt;println("Error Rate: " + (1.0 - accuracy))&lt;/P&gt;</description>
    <pubDate>Tue, 06 Sep 2016 22:41:10 GMT</pubDate>
    <dc:creator>dzaratsian</dc:creator>
    <dc:date>2016-09-06T22:41:10Z</dc:date>
    <item>
      <title>error: java.lang.IllegalArgumentException: Field "label_idx" does not exist</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/error-java-lang-IllegalArgumentException-Field-quot-label/m-p/136943#M39776</link>
      <description>&lt;P&gt;Hello,&lt;/P&gt;&lt;P&gt;I have the following error: java.lang.IllegalArgumentException: Field "label_idx" does not exist.&lt;/P&gt;&lt;P&gt;After executing this code:&lt;/P&gt;&lt;P&gt;import org.apache.spark.mllib.tree.RandomForest&lt;/P&gt;&lt;P&gt;import org.apache.spark.mllib.tree.model.RandomForestModel&lt;/P&gt;&lt;P&gt;import org.apache.spark.mllib.util.MLUtils&lt;/P&gt;&lt;P&gt;import org.apache.spark.mllib.linalg.Vectors&lt;/P&gt;&lt;P&gt;import org.apache.spark.mllib.regression.LabeledPoint&lt;/P&gt;&lt;P&gt;import org.apache.spark.mllib.evaluation.MulticlassMetrics&lt;/P&gt;&lt;P&gt;import org.apache.spark.ml.Pipeline import org.apache.spark.ml.tuning.{ParamGridBuilder, CrossValidator}&lt;/P&gt;&lt;P&gt;import org.apache.spark.ml.classification.RandomForestClassifier&lt;/P&gt;&lt;P&gt;import org.apache.spark.ml.evaluation.MulticlassClassificationEvaluator&lt;/P&gt;&lt;P&gt;import org.apache.spark.sql.types._ import sqlContext.implicits._&lt;/P&gt;&lt;P&gt;import org.apache.spark.ml.attribute.NominalAttribute&lt;/P&gt;&lt;P&gt;import org.apache.spark.ml.feature.StringIndexer&lt;/P&gt;&lt;P&gt;val unparseddata = sc.textFile("hdfs:///tmp/epidemiological16.csv")&lt;/P&gt;&lt;P&gt;val data = unparseddata.map { line =&amp;gt; val parts = line.split(',').map(_.toDouble) LabeledPoint(parts.last%2, Vectors.dense(parts.slice(0, parts.length - 1))) }&lt;/P&gt;&lt;P&gt;val splits = data.randomSplit(Array(0.7, 0.3))&lt;/P&gt;&lt;P&gt;val (trainingData2, testData2) = (splits(0), splits(1))&lt;/P&gt;&lt;P&gt;val trainingData = trainingData2.toDF&lt;/P&gt;&lt;P&gt;val nFolds: Int = 10&lt;/P&gt;&lt;P&gt;val NumTrees: Int = 3&lt;/P&gt;&lt;P&gt;val rf = new RandomForestClassifier() .setNumTrees(NumTrees) .setFeaturesCol("features")&lt;/P&gt;&lt;P&gt;val indexer = new StringIndexer() .setInputCol("label") .setOutputCol("label_idx") .fit(trainingData)&lt;/P&gt;&lt;P&gt;rf.setLabelCol("label_idx").fit(indexer.transform(trainingData))&lt;/P&gt;&lt;P&gt;val pipeline = new Pipeline().setStages(Array(rf))&lt;/P&gt;&lt;P&gt;val paramGrid = new ParamGridBuilder().build()&lt;/P&gt;&lt;P&gt;val evaluator = new MulticlassClassificationEvaluator() .setLabelCol("label") .setPredictionCol("prediction")&lt;/P&gt;&lt;P&gt;val cv = new CrossValidator() .setEstimator(pipeline) .setEvaluator(evaluator) .setEstimatorParamMaps(paramGrid) .setNumFolds(nFolds)&lt;/P&gt;&lt;P&gt;val model = cv.fit(trainingData)&lt;/P&gt;&lt;P&gt;Do you know where can be the problem?&lt;/P&gt;&lt;P&gt;Thanks,&lt;/P&gt;&lt;P&gt;Laia&lt;/P&gt;</description>
      <pubDate>Tue, 06 Sep 2016 14:45:25 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/error-java-lang-IllegalArgumentException-Field-quot-label/m-p/136943#M39776</guid>
      <dc:creator>laia_subirats</dc:creator>
      <dc:date>2016-09-06T14:45:25Z</dc:date>
    </item>
    <item>
      <title>Re: error: java.lang.IllegalArgumentException: Field "label_idx" does not exist</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/error-java-lang-IllegalArgumentException-Field-quot-label/m-p/136944#M39777</link>
      <description>&lt;P&gt;
	Hey Laia, you're close, but it looks like a couple arguments are out of order when you configure the indexer and initial randomforest object. &lt;/P&gt;&lt;P&gt;Label_idx is not visible to the randomforest object because the order of execution is off, and as a result it is not in the dataframe ("does not exist"). If you change up the order it should work. &lt;/P&gt;&lt;P&gt;I'd recommend de-coupling the indexer and rf object, and execute them as part of the pipeline. Here's the code that I got to work. I also added a few lines at the bottom to show the predictions and accuracy (feel free to modify to fit your requirements). Let me know if this helps. &lt;/P&gt;&lt;P&gt;
	---&lt;/P&gt;&lt;P&gt;val unparseddata = sc.textFile("hdfs:///tmp/your_data.csv")&lt;/P&gt;&lt;P&gt;val data = unparseddata.map { &lt;/P&gt;&lt;P&gt;    line =&amp;gt; val parts = line.split(',').map(_.toDouble) &lt;/P&gt;&lt;P&gt;    LabeledPoint(parts.last%2, Vectors.dense(parts.slice(0, parts.length - 1))) &lt;/P&gt;&lt;P&gt;    }.toDF()&lt;/P&gt;&lt;P&gt;val Array(trainingData, testData) = data.randomSplit(Array(0.7, 0.3))&lt;/P&gt;&lt;P&gt;val nFolds: Int = 10&lt;/P&gt;&lt;P&gt;val NumTrees: Int = 3&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;val indexer = new StringIndexer().setInputCol("label").setOutputCol("label_idx")&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;val rf = new RandomForestClassifier().setNumTrees(NumTrees).setFeaturesCol("features").setLabelCol("label_idx")&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;val pipeline = new Pipeline().setStages(Array(indexer, rf)) &lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;&lt;/STRONG&gt;
val paramGrid = new ParamGridBuilder().build()&lt;/P&gt;&lt;P&gt;val evaluator = new MulticlassClassificationEvaluator().setLabelCol("label").setPredictionCol("prediction")&lt;/P&gt;&lt;P&gt;val cv = new CrossValidator() .setEstimator(pipeline) .setEvaluator(evaluator) .setEstimatorParamMaps(paramGrid) .setNumFolds(nFolds)&lt;/P&gt;&lt;P&gt;val model = cv.fit(trainingData)&lt;/P&gt;&lt;P&gt;val predictions = model.transform(testData)&lt;/P&gt;&lt;P&gt;// Show model predictions&lt;/P&gt;&lt;P&gt;predictions.show()&lt;/P&gt;&lt;P&gt;val accuracy = evaluator.evaluate(predictions)&lt;/P&gt;&lt;P&gt;println("Accuracy:   " + accuracy)&lt;/P&gt;&lt;P&gt;println("Error Rate: " + (1.0 - accuracy))&lt;/P&gt;</description>
      <pubDate>Tue, 06 Sep 2016 22:41:10 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/error-java-lang-IllegalArgumentException-Field-quot-label/m-p/136944#M39777</guid>
      <dc:creator>dzaratsian</dc:creator>
      <dc:date>2016-09-06T22:41:10Z</dc:date>
    </item>
    <item>
      <title>Re: error: java.lang.IllegalArgumentException: Field "label_idx" does not exist</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/error-java-lang-IllegalArgumentException-Field-quot-label/m-p/136945#M39778</link>
      <description>&lt;P&gt;Hello Dan,&lt;/P&gt;&lt;P&gt;Thank you a lot for the help, it worked!&lt;/P&gt;&lt;P&gt;In addition, I would like to have the recall, precision and f1 as well. And I would like to see the random forest trees as well. Do you know how I can do it? I have 2 imbalanced classes, so I would like to have them for each class...&lt;/P&gt;&lt;P&gt;Best regards,&lt;/P&gt;&lt;P&gt;Laia&lt;/P&gt;</description>
      <pubDate>Wed, 07 Sep 2016 14:40:58 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/error-java-lang-IllegalArgumentException-Field-quot-label/m-p/136945#M39778</guid>
      <dc:creator>laia_subirats</dc:creator>
      <dc:date>2016-09-07T14:40:58Z</dc:date>
    </item>
  </channel>
</rss>

