Archives of Support Questions (Read Only)

This is an archived board for historical reference. Information and links may no longer be available or relevant
Announcements
This board is archived and read-only for historical reference. To ask a new question, please post a new topic on the appropriate active board.

How can i replace or handle null values in dataframe .

avatar

val ebayds = sc.textFile("/user/spark/xbox.csv")

case class Auction(auctionid: String, bid: Float, bidtime: Float, bidder: String, bidderrate: Int, openbid: Float, price: Float)

val ebay = ebayds.map(a=>a.split(",")).map(p=>Auction(p(0),p(1).toFloat,p(2).toFloat,p(3),p(4).toInt,p(5).toFloat,p(6).toFloat)).toDF()

ebay.select("auctionid").distinct.count

am getting error

For input string: "" at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)

1 ACCEPTED SOLUTION

avatar
New Member

The error seem to be mismatch data type with data set and case class.Check the each columns data type first

Use csv api to read csv file and print schema

Eg:

val ebaydf = sqlcontect.read.format("com.databricks.spark.csv").option("header", "true").option("InferSchema", "true").load(path)

ebaydf.printschema()

View solution in original post

2 REPLIES 2

avatar
Super Guru

@jayaprakash gadi why don't you implement a companion method in Auction class to handle null values.

avatar
New Member

The error seem to be mismatch data type with data set and case class.Check the each columns data type first

Use csv api to read csv file and print schema

Eg:

val ebaydf = sqlcontect.read.format("com.databricks.spark.csv").option("header", "true").option("InferSchema", "true").load(path)

ebaydf.printschema()