- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
How can i replace or handle null values in dataframe .
- Labels:
-
Apache Spark
Created ‎12-01-2016 01:09 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
val ebayds = sc.textFile("/user/spark/xbox.csv")
case class Auction(auctionid: String, bid: Float, bidtime: Float, bidder: String, bidderrate: Int, openbid: Float, price: Float)
val ebay = ebayds.map(a=>a.split(",")).map(p=>Auction(p(0),p(1).toFloat,p(2).toFloat,p(3),p(4).toInt,p(5).toFloat,p(6).toFloat)).toDF()
ebay.select("auctionid").distinct.count
am getting error
For input string: "" at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
Created ‎12-04-2016 06:35 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The error seem to be mismatch data type with data set and case class.Check the each columns data type first
Use csv api to read csv file and print schema
Eg:
val ebaydf = sqlcontect.read.format("com.databricks.spark.csv").option("header", "true").option("InferSchema", "true").load(path) ebaydf.printschema()
Created ‎12-01-2016 11:21 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@jayaprakash gadi why don't you implement a companion method in Auction class to handle null values.
Created ‎12-04-2016 06:35 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The error seem to be mismatch data type with data set and case class.Check the each columns data type first
Use csv api to read csv file and print schema
Eg:
val ebaydf = sqlcontect.read.format("com.databricks.spark.csv").option("header", "true").option("InferSchema", "true").load(path) ebaydf.printschema()
