Archives of Support Questions (Read Only)

This is an archived board for historical reference. Information and links may no longer be available or relevant
Announcements
This board is archived and read-only for historical reference. To ask a new question, please post a new topic on the appropriate active board.

How to convert a DataFrame to a Vector.dense in scala

avatar
New Member

How to convert a DataFrame to a Vector.dense in scala

import org.apache.spark.mllib.linalg.Vectors import org.apache.spark.sql.functions.{concat, lit}

val f = bh.select($"GL20".alias("LABEL"), concat($"SYMBOL", lit(":"), $"DATE").alias("ID"), $"TIR", $"UO", $"ROC20" ) f.show(3)

+---------+------------------+--------+--------+--------+ | LABEL| ID| TIR| UO| ROC20| +---------+------------------+--------+--------+--------+ |-5.452071|DJI.IDX:2010-04-20|73.26948|65.55433| 3.0704| |-5.065461|DJT.IDX:2010-04-20|78.73316|68.14407|6.275064| |-6.747381|NDX.IDX:2010-04-20|77.02333|68.68713|3.796183| +---------+------------------+--------+--------+--------+

I want a new dataFrame in the format from the bh dataFrame above.

+------------------+--------------------+ | ID| FEATURES| +------------------+--------------------+ |DJI.IDX:2010-04-20|[73.26948,65.5543...| |DJT.IDX:2010-04-20|[78.73316,68.1440...| |NDX.IDX:2010-04-20|[77.02333,68.6871...| +------------------+--------------------+

If I hard code the values I can produce the above results but I need to get in programmatically from the bh dataFrame.

import org.apache.spark.ml.clustering.KMeans import org.apache.spark.mllib.linalg.Vectors

// Crates a DataFrame val df = sqlContext.createDataFrame(Seq( ("DJI.IDX:2010-04-20", Vectors.dense(73.26948, 65.55433, 3.0704)), ("DJT.IDX:2010-04-20", Vectors.dense(78.73316, 68.14407, 6.275064)), ("NDX.IDX:2010-04-20", Vectors.dense(77.02333, 68.68713, 3.796183)) )).toDF("ID", "FEATURES")

df.show()

1 ACCEPTED SOLUTION

avatar
New Member

Adding answer in case others need this. I used the VectorAssembler.

http://spark.apache.org/docs/latest/ml-features.html#vectorassembler

val assembler = new VectorAssembler().setInputCols(Array("TIR", "UO", "ROC20" )).setOutputCol("FEATURES") val vd = assembler.transform(f)

View solution in original post

2 REPLIES 2

avatar
New Member

Adding answer in case others need this. I used the VectorAssembler.

http://spark.apache.org/docs/latest/ml-features.html#vectorassembler

val assembler = new VectorAssembler().setInputCols(Array("TIR", "UO", "ROC20" )).setOutputCol("FEATURES") val vd = assembler.transform(f)

avatar
New Member

Perfect Solution.