Support Questions

Find answers, ask questions, and share your expertise
Announcements
Now Live: Explore expert insights and technical deep dives on the new Cloudera Community BlogsRead the Announcement

How can i replace or handle null values in dataframe .

avatar

val ebayds = sc.textFile("/user/spark/xbox.csv")

case class Auction(auctionid: String, bid: Float, bidtime: Float, bidder: String, bidderrate: Int, openbid: Float, price: Float)

val ebay = ebayds.map(a=>a.split(",")).map(p=>Auction(p(0),p(1).toFloat,p(2).toFloat,p(3),p(4).toInt,p(5).toFloat,p(6).toFloat)).toDF()

ebay.select("auctionid").distinct.count

am getting error

For input string: "" at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)

1 ACCEPTED SOLUTION

avatar
New Member

The error seem to be mismatch data type with data set and case class.Check the each columns data type first

Use csv api to read csv file and print schema

Eg:

val ebaydf = sqlcontect.read.format("com.databricks.spark.csv").option("header", "true").option("InferSchema", "true").load(path)

ebaydf.printschema()

View solution in original post

2 REPLIES 2

avatar
Super Guru

@jayaprakash gadi why don't you implement a companion method in Auction class to handle null values.

avatar
New Member

The error seem to be mismatch data type with data set and case class.Check the each columns data type first

Use csv api to read csv file and print schema

Eg:

val ebaydf = sqlcontect.read.format("com.databricks.spark.csv").option("header", "true").option("InferSchema", "true").load(path)

ebaydf.printschema()