Support Questions
Find answers, ask questions, and share your expertise
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here. Want to know more about what has changed? Check out the Community News blog.

Write MLLIB model results from Spark to HDFS

Write MLLIB model results from Spark to HDFS

New Contributor

I am following the MLLib Spark 1.1.1 Documentation for examples under Logistic Regression. I have looked at the API docs and have figured out how to manipulate the settings and run many SVMs and Logistic Regression models. I use the code from the example to evaluate my model:


    val scoreAndLabels = { point =>

      val score = model.predict(point.features)

     (score, point.label)



    val metrics = new BinaryClassificationMetrics(scoreAndLabels)

    val auROC = metrics.areaUnderROC()


This works great and I can tweak my models to see which one works the best. However I haven't figured out how to export the value scoreAndLabels to HDFS so I can see what I am predicting for the test set. I know this is probably simple scala, but as a newbie to scala, it is frustrating me greatly. 


For similar examples, I have pushed RDDs out with rdd.saveAsTextFile("/data/mynewhdfsfile"). I learned the hard way to ensure the "data" directory existing in HDFS prior to executing. However, this method does not work for "scoreAndLabels". Any help is greatly appreciated


Re: Write MLLIB model results from Spark to HDFS

Master Collaborator

I think you want to export the test features, and label, right? Off the top of my head, probably 95% right: => point.features.mkString(',') + "," + model.predict(point.features)).saveAsTextFile("/data/mynewhdfsdir/")


This writes to the HDFS dir a bunch of:



Re: Write MLLIB model results from Spark to HDFS

New Contributor
Yes, that is pretty much it. Thank you for the quick reply, I will test it out as soon as my current model finishes up.