Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Spark - machine learning : Type mismatch while transform

avatar
Expert Contributor

I facing error while transform (tokenizer.transform) - Please advice

-----------------------------------------------------------------------------------------------------------------------------------------------------------

import org.apache.spark.ml.feature.{HashingTF, IDF, Tokenizer}

val sentenceData = sqlContext.createDataFrame(Seq( (0, "Hi I heard about Spark"), (0, "I wish Java could use case classes"), (1, "Logistic regression models are neat") )).toDF("label", "sentence")

val tokenizer = new Tokenizer().setInputCol("sentence").setOutputCol("words")

val wordsData = tokenizer.transform(sentenceData)

-----------------------------------------------------------------------------------------------------------------------------------------------------------

Error message for reference -->

import org.apache.spark.ml.feature

sentenceData: org.apache.spark.sql.DataFrame = [label: int, sentence: string] tokenizer: org.apache.spark.ml.feature.Tokenizer = tok_6ac8a05b403d

<console>:61: error: type mismatch; found : org.apache.spark.sql.org.apache.spark.sql.org.apache.spark.sql.org.apache.spark.sql.org.apache.spark.sql.DataFrame required: org.apache.spark.sql.org.apache.spark.sql.org.apache.spark.sql.org.apache.spark.sql.org.apache.spark.sql.DataFrame val wordsData = tokenizer.transform(sentenceData)

1 ACCEPTED SOLUTION

avatar
Master Guru
hide-solution

This problem has been solved!

Want to get a detailed solution you have to login/registered on the community

Register/Login
7 REPLIES 7

avatar
Master Guru

Try this.

What version of spark are you using.

http://spark.apache.org/docs/1.6.1/ml-features.html#tokenizer

import org.apache.spark.ml.feature.{RegexTokenizer, Tokenizer}

val sentenceDataFrame = sqlContext.createDataFrame(Seq(
  (0, "Hi I heard about Spark"),
  (1, "I wish Java could use case classes"),
  (2, "Logistic,regression,models,are,neat")
)).toDF("label", "sentence")

val tokenizer = new Tokenizer().setInputCol("sentence").setOutputCol("words")
val regexTokenizer = new RegexTokenizer()
  .setInputCol("sentence")
  .setOutputCol("words")
  .setPattern("\\W") // alternatively .setPattern("\\w+").setGaps(false)

val tokenized = tokenizer.transform(sentenceDataFrame)
tokenized.select("words", "label").take(3).foreach(println)
val regexTokenized = regexTokenizer.transform(sentenceDataFrame)
regexTokenized.select("words", "label").take(3).foreach(println)


avatar
Expert Contributor

@Timothy Spann - I use 1.6 version

sc.version
res377: String = 1.6.0
I still face that error - Not sure why.

avatar
Master Guru
hide-solution

This problem has been solved!

Want to get a detailed solution you have to login/registered on the community

Register/Login

avatar
Expert Contributor
@Timothy Spann

I am working on the hortonworks sandbox 2.4 on azure environment. Currently running the program in zeppelin. Your code as well threw the same above listed error. Please advice.

avatar
Expert Contributor

I guess its something to do with zeppelin version. I didn't face the issue while running it in spark-shell programming. Thanks for the support as always when needed.

avatar
Expert Contributor

@Timothy Spann :: Whats missing in zeppelin version of Hortonworks sandbox 2.4 on azure causing this error.

avatar
Master Guru