Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

How can i replace or handle null values in dataframe .

avatar

val ebayds = sc.textFile("/user/spark/xbox.csv")

case class Auction(auctionid: String, bid: Float, bidtime: Float, bidder: String, bidderrate: Int, openbid: Float, price: Float)

val ebay = ebayds.map(a=>a.split(",")).map(p=>Auction(p(0),p(1).toFloat,p(2).toFloat,p(3),p(4).toInt,p(5).toFloat,p(6).toFloat)).toDF()

ebay.select("auctionid").distinct.count

am getting error

For input string: "" at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)

1 ACCEPTED SOLUTION

avatar
Contributor

The error seem to be mismatch data type with data set and case class.Check the each columns data type first

Use csv api to read csv file and print schema

Eg:

val ebaydf = sqlcontect.read.format("com.databricks.spark.csv").option("header", "true").option("InferSchema", "true").load(path)

ebaydf.printschema()

View solution in original post

2 REPLIES 2

avatar
Super Guru

@jayaprakash gadi why don't you implement a companion method in Auction class to handle null values.

avatar
Contributor

The error seem to be mismatch data type with data set and case class.Check the each columns data type first

Use csv api to read csv file and print schema

Eg:

val ebaydf = sqlcontect.read.format("com.databricks.spark.csv").option("header", "true").option("InferSchema", "true").load(path)

ebaydf.printschema()