Support Questions
Find answers, ask questions, and share your expertise
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

MulticlassClassificationEvaluator gives missing field on test data

MulticlassClassificationEvaluator gives missing field on test data

i wish to classify category on test data according to example data.

the test data is missing the category field (that is the objective - to find the category according to description field)

last line gives : IllegalArgumentException: Field "label_idx" does not exist.

what is the problem ?

when i change .setLabelCol("label_idx") on the evaluator to be on segment field it says it doesn't exist.

what does the label field represent , as i understand it is not real field in the input data .

val indexer = new StringIndexer().setInputCol("segment").setOutputCol("label_idx").fit(trainingDF)
    //breaking the description into individual terms
    val tokenizer = new Tokenizer().setInputCol("clean_description").setOutputCol("tokens")
    val hashingTF = new HashingTF().setInputCol("tokens").setOutputCol("features").setNumFeatures(10000)
    val nb = new NaiveBayes().setModelType("multinomial").setLabelCol("label_idx")
    /*val lr = new LogisticRegression().setMaxIter(100).setRegParam(0.03).setFamily("multinomial").setElasticNetParam(0.8)
    val ovr = new OneVsRest().setClassifier(lr)*/
    val predicteConverter = new IndexToString()

val pipeline = new Pipeline().setStages(Array(indexer, tokenizer, hashingTF, nb, predicteConverter))
val model =

val prediction = model.transform(classifygDF) ....

val evaluator = new MulticlassClassificationEvaluator()

println(s"Accuracy: ${evaluator.setMetricName("accuracy").evaluate(prediction)}")

Don't have an account?
Coming from Hortonworks? Activate your account here