<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question SparkML error type mismatch in Archives of Support Questions (Read Only)</title>
    <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/SparkML-error-type-mismatch/m-p/231000#M64294</link>
    <description>&lt;P&gt;Hi, i have been following some online examples in trying to build a model. I am using a csv data set,  Below is a snippet of the headings and some of the data:&lt;/P&gt;&lt;P&gt;


  
TrialID ObsNum IsAlert P1 P2 P3 P4 P5 P6 P7 P8 E1 E2 &lt;/P&gt;&lt;P&gt;0 0 138.4294 10.9435 1000 60 0.302277 508 118.11 0 0 0 &lt;/P&gt;&lt;P&gt;0 1 138.3609 15.3212 1000 600.302277 508 118.11 0 0 0 &lt;/P&gt;&lt;P&gt;The third column, IsAlert is the ground truth&lt;/P&gt;&lt;P&gt;This is the code i have been trying, amongst some others.&lt;/P&gt;&lt;PRE&gt;val training = sc.textFile("hdfs:///ford/fordTrain.csv")  

val header = training.first  

val inferSchema = true   

val lr = new LogisticRegression()  .setMaxIter(10)  .setRegParam(0.3)  .setElasticNetParam(0.8)

// Fit the model

val lrModel = lr.fit(training)

// Print the coefficients and intercept for logistic regression

println(s"Coefficients: ${lrModel.coefficients} Intercept: ${lrModel.intercept}")

// We can also use the multinomial family for binary classificationval 

mlr = new LogisticRegression()  .setMaxIter(10)  .setRegParam(0.3)  .setElasticNetParam(0.8)  .setFamily("multinomial")

val mlrModel = mlr.fit(training)

// Print the coefficients and intercepts for logistic regression with multinomial family

println(s"Multinomial coefficients: ${mlrModel.coefficientMatrix}")

println(s"Multinomial intercepts: ${mlrModel.interceptVector}")



This is the error i am recieving
import org.apache.spark.sql.types.{StructType, StructField, StringType}
training: org.apache.spark.rdd.RDD[String] = hdfs:///ford/fordTrain.csv MapPartitionsRDD[7] at textFile at &amp;lt;console&amp;gt;:188
header: String = TrialID,ObsNum,IsAlert,P1,P2,P3,P4,P5,P6,P7,P8,E1,E2,E3,E4,E5,E6,E7,E8,E9,E10,E11,V1,V2,V3,V4,V5,V6,V7,V8,V9,V10,V11
inferSchema: Boolean = true
lr: org.apache.spark.ml.classification.LogisticRegression = logreg_1049bed7e9a0
&amp;lt;console&amp;gt;:192: error: type mismatch;
 found   : org.apache.spark.rdd.RDD[String]
 required: org.apache.spark.sql.DataFrame
         val lrModel = lr.fit(training)
                              ^
I would be grateful for any help, thank you

&lt;/PRE&gt;</description>
    <pubDate>Wed, 05 Jul 2017 21:27:22 GMT</pubDate>
    <dc:creator>r_young</dc:creator>
    <dc:date>2017-07-05T21:27:22Z</dc:date>
    <item>
      <title>SparkML error type mismatch</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/SparkML-error-type-mismatch/m-p/231000#M64294</link>
      <description>&lt;P&gt;Hi, i have been following some online examples in trying to build a model. I am using a csv data set,  Below is a snippet of the headings and some of the data:&lt;/P&gt;&lt;P&gt;


  
TrialID ObsNum IsAlert P1 P2 P3 P4 P5 P6 P7 P8 E1 E2 &lt;/P&gt;&lt;P&gt;0 0 138.4294 10.9435 1000 60 0.302277 508 118.11 0 0 0 &lt;/P&gt;&lt;P&gt;0 1 138.3609 15.3212 1000 600.302277 508 118.11 0 0 0 &lt;/P&gt;&lt;P&gt;The third column, IsAlert is the ground truth&lt;/P&gt;&lt;P&gt;This is the code i have been trying, amongst some others.&lt;/P&gt;&lt;PRE&gt;val training = sc.textFile("hdfs:///ford/fordTrain.csv")  

val header = training.first  

val inferSchema = true   

val lr = new LogisticRegression()  .setMaxIter(10)  .setRegParam(0.3)  .setElasticNetParam(0.8)

// Fit the model

val lrModel = lr.fit(training)

// Print the coefficients and intercept for logistic regression

println(s"Coefficients: ${lrModel.coefficients} Intercept: ${lrModel.intercept}")

// We can also use the multinomial family for binary classificationval 

mlr = new LogisticRegression()  .setMaxIter(10)  .setRegParam(0.3)  .setElasticNetParam(0.8)  .setFamily("multinomial")

val mlrModel = mlr.fit(training)

// Print the coefficients and intercepts for logistic regression with multinomial family

println(s"Multinomial coefficients: ${mlrModel.coefficientMatrix}")

println(s"Multinomial intercepts: ${mlrModel.interceptVector}")



This is the error i am recieving
import org.apache.spark.sql.types.{StructType, StructField, StringType}
training: org.apache.spark.rdd.RDD[String] = hdfs:///ford/fordTrain.csv MapPartitionsRDD[7] at textFile at &amp;lt;console&amp;gt;:188
header: String = TrialID,ObsNum,IsAlert,P1,P2,P3,P4,P5,P6,P7,P8,E1,E2,E3,E4,E5,E6,E7,E8,E9,E10,E11,V1,V2,V3,V4,V5,V6,V7,V8,V9,V10,V11
inferSchema: Boolean = true
lr: org.apache.spark.ml.classification.LogisticRegression = logreg_1049bed7e9a0
&amp;lt;console&amp;gt;:192: error: type mismatch;
 found   : org.apache.spark.rdd.RDD[String]
 required: org.apache.spark.sql.DataFrame
         val lrModel = lr.fit(training)
                              ^
I would be grateful for any help, thank you

&lt;/PRE&gt;</description>
      <pubDate>Wed, 05 Jul 2017 21:27:22 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/SparkML-error-type-mismatch/m-p/231000#M64294</guid>
      <dc:creator>r_young</dc:creator>
      <dc:date>2017-07-05T21:27:22Z</dc:date>
    </item>
    <item>
      <title>Re: SparkML error type mismatch</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/SparkML-error-type-mismatch/m-p/231001#M64295</link>
      <description>&lt;P&gt;&lt;A rel="user" href="https://community.cloudera.com/users/14005/ryoung.html" nodeid="14005"&gt;@Roger Young&lt;/A&gt; The newer APIs assume you have a DataFrame and not an RDD so the easiest thing to do is to import the implicits from either sqlContext.implicits._ or spark.implicits._ and then either call .toDF on the initial load or create a DataFrame object from your training RDD.&lt;/P&gt;&lt;P&gt;You could alternatively use LogisticRegressionWithSGD or LogisticRegressionWithLBFGS which can operate on RDDs but then you'll have to convert your input to LabeledPoints.&lt;/P&gt;&lt;P&gt;FWIW, I'd make sure to convert the columns in your training data to their respective data types just to make sure that your continuous variables are treated as such and not categorical.&lt;/P&gt;&lt;PRE&gt;import spark.implicits._

...

val training = sc.textFile("hdfs:///ford/fordTrain.csv")

val df = training.toDF

// fixup your data to ensure your columns are the expected data type

...

val lrModel = lr.fit(df)

...&lt;/PRE&gt;</description>
      <pubDate>Thu, 06 Jul 2017 05:12:18 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/SparkML-error-type-mismatch/m-p/231001#M64295</guid>
      <dc:creator>jfrazee</dc:creator>
      <dc:date>2017-07-06T05:12:18Z</dc:date>
    </item>
    <item>
      <title>Re: SparkML error type mismatch</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/SparkML-error-type-mismatch/m-p/231002#M64296</link>
      <description>&lt;P&gt;Hi &lt;A rel="user" href="https://community.cloudera.com/users/2956/jfrazee.html" nodeid="2956"&gt;@jfrazee. Thank you for the help. I will give it a go.&lt;/A&gt; &lt;/P&gt;</description>
      <pubDate>Thu, 06 Jul 2017 19:18:20 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/SparkML-error-type-mismatch/m-p/231002#M64296</guid>
      <dc:creator>r_young</dc:creator>
      <dc:date>2017-07-06T19:18:20Z</dc:date>
    </item>
  </channel>
</rss>

