Write MLLIB model results from Spark to HDFS

New Contributor

I am following the MLLib Spark 1.1.1 Documentation for examples under Logistic Regression. I have looked at the API docs and have figured out how to manipulate the settings and run many SVMs and Logistic Regression models. I use the code from the example to evaluate my model:


    val scoreAndLabels = { point =>

      val score = model.predict(point.features)

     (score, point.label)



    val metrics = new BinaryClassificationMetrics(scoreAndLabels)

    val auROC = metrics.areaUnderROC()


This works great and I can tweak my models to see which one works the best. However I haven't figured out how to export the value scoreAndLabels to HDFS so I can see what I am predicting for the test set. I know this is probably simple scala, but as a newbie to scala, it is frustrating me greatly. 


For similar examples, I have pushed RDDs out with rdd.saveAsTextFile("/data/mynewhdfsfile"). I learned the hard way to ensure the "data" directory existing in HDFS prior to executing. However, this method does not work for "scoreAndLabels". Any help is greatly appreciated


Master Collaborator

I think you want to export the test features, and label, right? Off the top of my head, probably 95% right: => point.features.mkString(',') + "," + model.predict(point.features)).saveAsTextFile("/data/mynewhdfsdir/")


This writes to the HDFS dir a bunch of:



New Contributor
Yes, that is pretty much it. Thank you for the quick reply, I will test it out as soon as my current model finishes up.