Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

add header to correlation Matrix spark

add header to correlation Matrix spark

Explorer

I am applying correlation on a csv file using apache spark, when loading data i am obliged to skipe the first row as a header which are columns in the dataset otherwise i can't load the data. i get the correlation computed but when i got the correlation matrix, i can't add the columns name as a header in the new matrix please would you help me get the matrix with its header ,thanks ,this what i did

import org.apache.spark.mllib.linalg.{Vector,Vectors}
import org.apache.spark.mllib.stat.Statistics
import org.apache.spark.mllib.linalg.Matrix
import org.apache.spark.rdd.RDD

        val data = sc.textFile(strfilePath).mapPartitionsWithIndex {case(index, iterator)=>if(index ==0) iterator.drop(1)else iterator
              }
val inputMatrix = data.map { line =>val values = line.split(",").map(_.toDouble)Vectors.dense(values)}
val correlationMatrix=Statistics.corr(inputMatrix,"pearson")
2 REPLIES 2
Highlighted

Re: add header to correlation Matrix spark

New Contributor

@Maher Hattabi Did you find any solution??

Highlighted

Re: add header to correlation Matrix spark

New Contributor

Hi I found same problem....Any solution to print matrix with header/title??

Don't have an account?
Coming from Hortonworks? Activate your account here