Created 12-11-2015 04:22 AM
import org.apache.spark.mllib.regression.LinearRegressionWithSGD import org.apache.spark.mllib.regression.LabeledPoint import org.apache.spark.mllib.linalg.Vectors import org.apache.spark.ml.feature.{OneHotEncoder, StringIndexer} import sqlContext.implicits._ val df = sqlContext.sql("select mnemonic, average, median, stddev from wellbook.curve_statistics") val indexer = new StringIndexer() .setInputCol("mnemonic") .setOutputCol("mnemonicIndex") .fit(df) val indexed = indexer.transform(df) val encoder = new OneHotEncoder().setInputCol("mnemonicIndex"). setOutputCol("mnemonicVec") val encoded = encoder.transform(indexed) val data = encoded.select("mnemonicVec", "average", "median", "stddev") val parsedData = data.map(row => LabeledPoint(row.getDouble(0), row.getAs[Vector](1)))
Created 12-12-2015 10:04 PM
In addition to Vectors, you need to import the Spark Vector class explicitly since Scala imports its in-built Vector type by default. Try this:
import org.apache.spark.mllib.linalg.{Vector, Vectors}
Created 12-11-2015 05:18 AM
Which version of Spark and HDP are you using?
Created 12-11-2015 07:15 PM
Spark1.4.1 and HDP2.3.2
Created 12-11-2015 06:19 AM
Vedant, give this a shot:
val parsedData = data.map(row => LabeledPoint(row.getDouble(0), row.asInstanceOf[Vector](1)))
Created 12-11-2015 07:16 PM
@Joe Widen I tried it earlier and gave me the same error.
Created 12-12-2015 10:04 PM
In addition to Vectors, you need to import the Spark Vector class explicitly since Scala imports its in-built Vector type by default. Try this:
import org.apache.spark.mllib.linalg.{Vector, Vectors}
Created 12-14-2015 02:20 PM
@Dhruv Kumar Thanks it worked.