Support Questions
Find answers, ask questions, and share your expertise

MulticlassClassificationEvaluator gives missing field on test data

i wish to classify category on test data according to example data.

the test data is missing the category field (that is the objective - to find the category according to description field)

last line gives : IllegalArgumentException: Field "label_idx" does not exist.

what is the problem ?

when i change .setLabelCol("label_idx") on the evaluator to be on segment field it says it doesn't exist.

what does the label field represent , as i understand it is not real field in the input data .

val indexer = new StringIndexer().setInputCol("segment").setOutputCol("label_idx").fit(trainingDF)
    //breaking the description into individual terms
    val tokenizer = new Tokenizer().setInputCol("clean_description").setOutputCol("tokens")
    val hashingTF = new HashingTF().setInputCol("tokens").setOutputCol("features").setNumFeatures(10000)
    val nb = new NaiveBayes().setModelType("multinomial").setLabelCol("label_idx")
    /*val lr = new LogisticRegression().setMaxIter(100).setRegParam(0.03).setFamily("multinomial").setElasticNetParam(0.8)
    val ovr = new OneVsRest().setClassifier(lr)*/
    val predicteConverter = new IndexToString()
      .setInputCol("prediction")
      .setOutputCol("predictedLabel")
      .setLabels(indexer.labels)


val pipeline = new Pipeline().setStages(Array(indexer, tokenizer, hashingTF, nb, predicteConverter))
val model = pipeline.fit(trainingDF)


val prediction = model.transform(classifygDF) ....


val evaluator = new MulticlassClassificationEvaluator()
  //.setLabelCol("predictedLabel")
  .setLabelCol("label_idx")
  .setPredictionCol("prediction")

println(s"Accuracy: ${evaluator.setMetricName("accuracy").evaluate(prediction)}")