Archives of Support Questions (Read Only)

prakash_hadoopd · ‎12-01-2016

val ebayds = sc.textFile("/user/spark/xbox.csv")

case class Auction(auctionid: String, bid: Float, bidtime: Float, bidder: String, bidderrate: Int, openbid: Float, price: Float)

val ebay = ebayds.map(a=>a.split(",")).map(p=>Auction(p(0),p(1).toFloat,p(2).toFloat,p(3),p(4).toInt,p(5).toFloat,p(6).toFloat)).toDF()

ebay.select("auctionid").distinct.count

am getting error

For input string: "" at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)

senthilkumarP · ‎12-04-2016

The error seem to be mismatch data type with data set and case class.Check the each columns data type first

Use csv api to read csv file and print schema

Eg:

val ebaydf = sqlcontect.read.format("com.databricks.spark.csv").option("header", "true").option("InferSchema", "true").load(path)

ebaydf.printschema()

View solution in original post

rajkumar_singh · ‎12-01-2016

@jayaprakash gadi why don't you implement a companion method in Auction class to handle null values.

senthilkumarP · ‎12-04-2016

The error seem to be mismatch data type with data set and case class.Check the each columns data type first

Use csv api to read csv file and print schema

Eg:

val ebaydf = sqlcontect.read.format("com.databricks.spark.csv").option("header", "true").option("InferSchema", "true").load(path)

ebaydf.printschema()

Cloudera Community

Archives of Support Questions (Read Only)

How can i replace or handle null values in dataframe .