Member since
05-14-2016
2
Posts
2
Kudos Received
1
Solution
My Accepted Solutions
Title | Views | Posted |
---|---|---|
5958 | 05-17-2016 10:10 AM |
05-17-2016
10:10 AM
2 Kudos
Adding answer in case others need this. I used the VectorAssembler. http://spark.apache.org/docs/latest/ml-features.html#vectorassembler val assembler = new VectorAssembler().setInputCols(Array("TIR", "UO", "ROC20" )).setOutputCol("FEATURES")
val vd = assembler.transform(f)
... View more
05-14-2016
09:37 PM
How to convert a DataFrame to a Vector.dense in scala import org.apache.spark.mllib.linalg.Vectors
import org.apache.spark.sql.functions.{concat, lit} val f = bh.select($"GL20".alias("LABEL"), concat($"SYMBOL", lit(":"), $"DATE").alias("ID"), $"TIR", $"UO", $"ROC20" )
f.show(3) +---------+------------------+--------+--------+--------+
| LABEL| ID| TIR| UO| ROC20|
+---------+------------------+--------+--------+--------+
|-5.452071|DJI.IDX:2010-04-20|73.26948|65.55433| 3.0704|
|-5.065461|DJT.IDX:2010-04-20|78.73316|68.14407|6.275064|
|-6.747381|NDX.IDX:2010-04-20|77.02333|68.68713|3.796183|
+---------+------------------+--------+--------+--------+ I want a new dataFrame in the format from the bh dataFrame above. +------------------+--------------------+
| ID| FEATURES|
+------------------+--------------------+
|DJI.IDX:2010-04-20|[73.26948,65.5543...|
|DJT.IDX:2010-04-20|[78.73316,68.1440...|
|NDX.IDX:2010-04-20|[77.02333,68.6871...|
+------------------+--------------------+ If I hard code the values I can produce the above results but I need to get in programmatically from the bh dataFrame. import org.apache.spark.ml.clustering.KMeans
import org.apache.spark.mllib.linalg.Vectors // Crates a DataFrame
val df = sqlContext.createDataFrame(Seq(
("DJI.IDX:2010-04-20", Vectors.dense(73.26948, 65.55433, 3.0704)),
("DJT.IDX:2010-04-20", Vectors.dense(78.73316, 68.14407, 6.275064)),
("NDX.IDX:2010-04-20", Vectors.dense(77.02333, 68.68713, 3.796183))
)).toDF("ID", "FEATURES") df.show()
... View more
Labels: