- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
Spark - machine learning : Type mismatch while transform
- Labels:
-
Apache Spark
-
Apache Zeppelin
Created ‎12-20-2016 05:58 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I facing error while transform (tokenizer.transform) - Please advice
-----------------------------------------------------------------------------------------------------------------------------------------------------------
import org.apache.spark.ml.feature.{HashingTF, IDF, Tokenizer}
val sentenceData = sqlContext.createDataFrame(Seq( (0, "Hi I heard about Spark"), (0, "I wish Java could use case classes"), (1, "Logistic regression models are neat") )).toDF("label", "sentence")
val tokenizer = new Tokenizer().setInputCol("sentence").setOutputCol("words")
val wordsData = tokenizer.transform(sentenceData)
-----------------------------------------------------------------------------------------------------------------------------------------------------------
Error message for reference -->
import org.apache.spark.ml.feature
sentenceData: org.apache.spark.sql.DataFrame = [label: int, sentence: string] tokenizer: org.apache.spark.ml.feature.Tokenizer = tok_6ac8a05b403d
<console>:61: error: type mismatch; found : org.apache.spark.sql.org.apache.spark.sql.org.apache.spark.sql.org.apache.spark.sql.org.apache.spark.sql.DataFrame required: org.apache.spark.sql.org.apache.spark.sql.org.apache.spark.sql.org.apache.spark.sql.org.apache.spark.sql.DataFrame val wordsData = tokenizer.transform(sentenceData)
Created ‎12-20-2016 07:44 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
You have Java, Spark, installed and you are running on one of those machines.
You can restart your server.
That is out of the box Apache Spark test code.
Are you running in Shell?
Did my code run?
1.6.0 is not the best, can you run on 1.6.1 or 1.6.2.
Created ‎12-20-2016 07:11 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Try this.
What version of spark are you using.
http://spark.apache.org/docs/1.6.1/ml-features.html#tokenizer
import org.apache.spark.ml.feature.{RegexTokenizer, Tokenizer} val sentenceDataFrame = sqlContext.createDataFrame(Seq( (0, "Hi I heard about Spark"), (1, "I wish Java could use case classes"), (2, "Logistic,regression,models,are,neat") )).toDF("label", "sentence") val tokenizer = new Tokenizer().setInputCol("sentence").setOutputCol("words") val regexTokenizer = new RegexTokenizer() .setInputCol("sentence") .setOutputCol("words") .setPattern("\\W") // alternatively .setPattern("\\w+").setGaps(false) val tokenized = tokenizer.transform(sentenceDataFrame) tokenized.select("words", "label").take(3).foreach(println) val regexTokenized = regexTokenizer.transform(sentenceDataFrame) regexTokenized.select("words", "label").take(3).foreach(println)
Created ‎12-20-2016 07:15 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@Timothy Spann - I use 1.6 version
Created ‎12-20-2016 07:44 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
You have Java, Spark, installed and you are running on one of those machines.
You can restart your server.
That is out of the box Apache Spark test code.
Are you running in Shell?
Did my code run?
1.6.0 is not the best, can you run on 1.6.1 or 1.6.2.
Created ‎12-20-2016 07:47 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I am working on the hortonworks sandbox 2.4 on azure environment. Currently running the program in zeppelin. Your code as well threw the same above listed error. Please advice.
Created ‎12-20-2016 10:31 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I guess its something to do with zeppelin version. I didn't face the issue while running it in spark-shell programming. Thanks for the support as always when needed.
Created ‎12-20-2016 11:00 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@Timothy Spann :: Whats missing in zeppelin version of Hortonworks sandbox 2.4 on azure causing this error.
Created ‎12-20-2016 11:33 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Zeppelin is a different story.
http://hortonworks.com/blog/introduction-to-data-science-with-apache-spark/
Try this tutorial
http://hortonworks.com/hadoop-tutorial/intro-machine-learning-apache-spark-apache-zeppelin/
