Support Questions

Find answers, ask questions, and share your expertise

How to convert a DataFrame to a Vector.dense in scala

avatar
New Contributor

How to convert a DataFrame to a Vector.dense in scala

import org.apache.spark.mllib.linalg.Vectors import org.apache.spark.sql.functions.{concat, lit}

val f = bh.select($"GL20".alias("LABEL"), concat($"SYMBOL", lit(":"), $"DATE").alias("ID"), $"TIR", $"UO", $"ROC20" ) f.show(3)

+---------+------------------+--------+--------+--------+ | LABEL| ID| TIR| UO| ROC20| +---------+------------------+--------+--------+--------+ |-5.452071|DJI.IDX:2010-04-20|73.26948|65.55433| 3.0704| |-5.065461|DJT.IDX:2010-04-20|78.73316|68.14407|6.275064| |-6.747381|NDX.IDX:2010-04-20|77.02333|68.68713|3.796183| +---------+------------------+--------+--------+--------+

I want a new dataFrame in the format from the bh dataFrame above.

+------------------+--------------------+ | ID| FEATURES| +------------------+--------------------+ |DJI.IDX:2010-04-20|[73.26948,65.5543...| |DJT.IDX:2010-04-20|[78.73316,68.1440...| |NDX.IDX:2010-04-20|[77.02333,68.6871...| +------------------+--------------------+

If I hard code the values I can produce the above results but I need to get in programmatically from the bh dataFrame.

import org.apache.spark.ml.clustering.KMeans import org.apache.spark.mllib.linalg.Vectors

// Crates a DataFrame val df = sqlContext.createDataFrame(Seq( ("DJI.IDX:2010-04-20", Vectors.dense(73.26948, 65.55433, 3.0704)), ("DJT.IDX:2010-04-20", Vectors.dense(78.73316, 68.14407, 6.275064)), ("NDX.IDX:2010-04-20", Vectors.dense(77.02333, 68.68713, 3.796183)) )).toDF("ID", "FEATURES")

df.show()

1 ACCEPTED SOLUTION

avatar
New Contributor

Adding answer in case others need this. I used the VectorAssembler.

http://spark.apache.org/docs/latest/ml-features.html#vectorassembler

val assembler = new VectorAssembler().setInputCols(Array("TIR", "UO", "ROC20" )).setOutputCol("FEATURES") val vd = assembler.transform(f)

View solution in original post

2 REPLIES 2

avatar
New Contributor

Adding answer in case others need this. I used the VectorAssembler.

http://spark.apache.org/docs/latest/ml-features.html#vectorassembler

val assembler = new VectorAssembler().setInputCols(Array("TIR", "UO", "ROC20" )).setOutputCol("FEATURES") val vd = assembler.transform(f)

avatar
New Contributor

Perfect Solution.