Created on 05-14-2016 09:37 PM - edited 09-16-2022 03:19 AM
How to convert a DataFrame to a Vector.dense in scala
import org.apache.spark.mllib.linalg.Vectors import org.apache.spark.sql.functions.{concat, lit}
val f = bh.select($"GL20".alias("LABEL"), concat($"SYMBOL", lit(":"), $"DATE").alias("ID"), $"TIR", $"UO", $"ROC20" ) f.show(3)
+---------+------------------+--------+--------+--------+ | LABEL| ID| TIR| UO| ROC20| +---------+------------------+--------+--------+--------+ |-5.452071|DJI.IDX:2010-04-20|73.26948|65.55433| 3.0704| |-5.065461|DJT.IDX:2010-04-20|78.73316|68.14407|6.275064| |-6.747381|NDX.IDX:2010-04-20|77.02333|68.68713|3.796183| +---------+------------------+--------+--------+--------+
I want a new dataFrame in the format from the bh dataFrame above.
+------------------+--------------------+ | ID| FEATURES| +------------------+--------------------+ |DJI.IDX:2010-04-20|[73.26948,65.5543...| |DJT.IDX:2010-04-20|[78.73316,68.1440...| |NDX.IDX:2010-04-20|[77.02333,68.6871...| +------------------+--------------------+
If I hard code the values I can produce the above results but I need to get in programmatically from the bh dataFrame.
import org.apache.spark.ml.clustering.KMeans import org.apache.spark.mllib.linalg.Vectors
// Crates a DataFrame val df = sqlContext.createDataFrame(Seq( ("DJI.IDX:2010-04-20", Vectors.dense(73.26948, 65.55433, 3.0704)), ("DJT.IDX:2010-04-20", Vectors.dense(78.73316, 68.14407, 6.275064)), ("NDX.IDX:2010-04-20", Vectors.dense(77.02333, 68.68713, 3.796183)) )).toDF("ID", "FEATURES")
df.show()
Created 05-17-2016 10:10 AM
Adding answer in case others need this. I used the VectorAssembler.
http://spark.apache.org/docs/latest/ml-features.html#vectorassembler
val assembler = new VectorAssembler().setInputCols(Array("TIR", "UO", "ROC20" )).setOutputCol("FEATURES") val vd = assembler.transform(f)
Created 05-17-2016 10:10 AM
Adding answer in case others need this. I used the VectorAssembler.
http://spark.apache.org/docs/latest/ml-features.html#vectorassembler
val assembler = new VectorAssembler().setInputCols(Array("TIR", "UO", "ROC20" )).setOutputCol("FEATURES") val vd = assembler.transform(f)
Created 11-19-2016 01:29 AM
Perfect Solution.