- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
Why the below Spark operation on Dataframe API shows the below error?.Please refer the attached data and commands executed.What am I missing here?
- Labels:
-
Apache Spark
Created ‎05-15-2018 06:38 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I have given the data in the csv file and the commands used.All I wanted was to read the csv file with my own schema using Dataframe and execute it with SparkSql.I'm just learning spark and excuse me if the question looks silly.I appreciate everyone who is willing to help me.
1,Fruit of the Loom Girls Socks,7.97,0.60,8.57
2,Rawlings Little League Baseball,2.97,0.22,3.19
3,Secret Antiperspirant,1.29,0.10,1.39
4,Deadpool DVD,14.96,1.12,16.08
5,Maxwell House Coffee 28 oz,7.28,0.55,7.83
6,Banana Boat Sunscreen,6.68,0.50,7.18
7,Wrench Set,10.00, 0.75, 10.75
8,M and Mz,8.98,0.67,9.65
9,Bertoli Alfredo Sauce,2.12, 0.16, 2.28
10,Large Paperclips,6.19, 0.46, 6.65
case class Person(index:Long,item:String,cost:Long,Tax:Long,Total:Long)
val peopleDs = sc.textFile("hdpcd/Samplecsv").map(_.split(",").map(_.trim)).map(attributes=> Person(attributes(0).toLong,attributes(1).toString,attributes(2).toLong,attributes(3).toLong,attributes(4).toLong)).toDF()
peopleDs.createOrReplaceTempView("people")
val res = spark.sql("Select * from people")
res.collect()
(0 + 2) / 2]18/05/13 10:03:28 ERROR Executor: Exception in task 1.0 in stage 0.0 (TID 1)
java.lang.NumberFormatException: For input string: "6.68"
at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
at java.lang.Long.parseLong(Long.java:589)
at java.lang.Long.parseLong(Long.java:631)
at scala.collection.immutable.StringLike$class.toLong(StringLike.scala:276)
at scala.collection.immutable.StringOps.toLong(StringOps.scala:29)
at $line15.$read$iw$iw$iw$iw$iw$iw$iw$iw$anonfun$2.apply(<console>:26)
at $line15.$read$iw$iw$iw$iw$iw$iw$iw$iw$anonfun$2.apply(<console>:26)
at scala.collection.Iterator$anon$11.next(Iterator.scala:409)
at scala.collection.Iterator$anon$11.next(Iterator.scala:409)
at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown Source)
at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
at org.apache.spark.sql.execution.WholeStageCodegenExec$anonfun$8$anon$1.hasNext(WholeStageCodegenExec.scala:395)
at org.apache.spark.sql.execution.SparkPlan$anonfun$2.apply(SparkPlan.scala:234)
at org.apache.spark.sql.execution.SparkPlan$anonfun$2.apply(SparkPlan.scala:228)
at org.apache.spark.rdd.RDD$anonfun$mapPartitionsInternal$1$anonfun$apply$25.apply(RDD.scala:827)
at org.apache.spark.rdd.RDD$anonfun$mapPartitionsInternal$1$anonfun$apply$25.apply(RDD.scala:827)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
at org.apache.spark.scheduler.Task.run(Task.scala:108)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:338)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
18/05/13 10:03:28 ERROR Executor: Exception in task 0.0 in stage 0.0 (TID 0)
Created ‎05-15-2018 02:24 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@Abdul Rahim The error is caused due you are parsing a string that contains a double into a long. Instead you should parse it into a double
The following code works fine for me:
case class Person(index:Long,item:String,cost:Double,Tax:Double,Total:Double) val peopleDs = sc.textFile("hdpcd/Samplecsv").map(_.split(",").map(_.trim)).map(attributes=> Person(attributes(0).toLong,attributes(1).toString,attributes(2).toDouble,attributes(3).toDouble,attributes(4).toDouble)).toDF() peopleDs.createOrReplaceTempView("people") val res = spark.sql("Select * from people") res.collect()
Results:
defined class Person peopleDs: org.apache.spark.sql.DataFrame = [index: bigint, item: string ... 3 more fields] res: org.apache.spark.sql.DataFrame = [index: bigint, item: string ... 3 more fields] res24: Array[org.apache.spark.sql.Row] = Array([1,Fruit of the Loom Girls Socks,7.97,0.6,8.57], [2,Rawlings Little League Baseball,2.97,0.22,3.19], [3,Secret Antiperspirant,1.29,0.1,1.39], [4,Deadpool DVD,14.96,1.12,16.08], [5,Maxwell House Coffee 28 oz,7.28,0.55,7.83], [6,Banana Boat Sunscreen,6.68,0.5,7.18], [7,Wrench Set,10.0,0.75,10.75], [8,M and Mz,8.98,0.67,9.65], [9,Bertoli Alfredo Sauce,2.12,0.16,2.28], [10,Large Paperclips,6.19,0.46,6.65])
Note: If you comment this post make sure you tag my name. And If you found this answer helped addressed your question, please take a moment to login and click the "accept" link on the answer.
Created ‎05-15-2018 02:24 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@Abdul Rahim The error is caused due you are parsing a string that contains a double into a long. Instead you should parse it into a double
The following code works fine for me:
case class Person(index:Long,item:String,cost:Double,Tax:Double,Total:Double) val peopleDs = sc.textFile("hdpcd/Samplecsv").map(_.split(",").map(_.trim)).map(attributes=> Person(attributes(0).toLong,attributes(1).toString,attributes(2).toDouble,attributes(3).toDouble,attributes(4).toDouble)).toDF() peopleDs.createOrReplaceTempView("people") val res = spark.sql("Select * from people") res.collect()
Results:
defined class Person peopleDs: org.apache.spark.sql.DataFrame = [index: bigint, item: string ... 3 more fields] res: org.apache.spark.sql.DataFrame = [index: bigint, item: string ... 3 more fields] res24: Array[org.apache.spark.sql.Row] = Array([1,Fruit of the Loom Girls Socks,7.97,0.6,8.57], [2,Rawlings Little League Baseball,2.97,0.22,3.19], [3,Secret Antiperspirant,1.29,0.1,1.39], [4,Deadpool DVD,14.96,1.12,16.08], [5,Maxwell House Coffee 28 oz,7.28,0.55,7.83], [6,Banana Boat Sunscreen,6.68,0.5,7.18], [7,Wrench Set,10.0,0.75,10.75], [8,M and Mz,8.98,0.67,9.65], [9,Bertoli Alfredo Sauce,2.12,0.16,2.28], [10,Large Paperclips,6.19,0.46,6.65])
Note: If you comment this post make sure you tag my name. And If you found this answer helped addressed your question, please take a moment to login and click the "accept" link on the answer.
Created ‎05-15-2018 05:42 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I get the below statements.Is that sign of successful execution ? @Felix Albani
18/05/15 17:35:27 ERROR Executor: Exception in task 0.0 in stage 0.0 (TID 0)
java.lang.NumberFormatException: For input string: "1 Fruit of the Loom Girls Socks 7.97 0.6 8.57"
at java.lang.NumberFormatException.forInputString(NumberFormatException. java:65)
at java.lang.Long.parseLong(Long.java:589)
at java.lang.Long.parseLong(Long.java:631)
at scala.collection.immutable.StringLike$class.toLong(StringLike.scala:2 76)
at scala.collection.immutable.StringOps.toLong(StringOps.scala:29)
at $line15.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$anonfun$2.apply(<conso le>:26)
at $line15.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$anonfun$2.apply(<conso le>:26)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:409)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:409)
at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIte rator.processNext(Unknown Source)
at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRo wIterator.java:43)
at org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$8$$anon $1.hasNext(WholeStageCodegenExec.scala:395)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.s cala:234)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.s cala:228)
at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$ap ply$25.apply(RDD.scala:827)
at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$ap ply$25.apply(RDD.scala:827)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala: 38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
at org.apache.spark.scheduler.Task.run(Task.scala:108)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:338)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor. java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor .java:624)
at java.lang.Thread.run(Thread.java:748)
18/05/15 17:35:27 ERROR Executor: Exception in task 1.0 in stage 0.0 (TID 1)
java.lang.NumberFormatException: For input string: "4 Banana Boat Sunscree 1 .12 1.12 16.08"
at java.lang.NumberFormatException.forInputString(NumberFormatException. java:65)
at java.lang.Long.parseLong(Long.java:589)
at java.lang.Long.parseLong(Long.java:631)
at scala.collection.immutable.StringLike$class.toLong(StringLike.scala:2 76)
at scala.collection.immutable.StringOps.toLong(StringOps.scala:29)
at $line15.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$anonfun$2.apply(<conso le>:26)
at $line15.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$anonfun$2.apply(<conso le>:26)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:409)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:409)
at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIte rator.processNext(Unknown Source)
at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRo wIterator.java:43)
at org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$8$$anon $1.hasNext(WholeStageCodegenExec.scala:395)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.s cala:234)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.s cala:228)
at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$ap ply$25.apply(RDD.scala:827)
at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$ap ply$25.apply(RDD.scala:827)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala: 38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
at org.apache.spark.scheduler.Task.run(Task.scala:108)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:338)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor. java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor .java:624)
at java.lang.Thread.run(Thread.java:748)
18/05/15 17:35:27 WARN TaskSetManager: Lost task 1.0 in stage 0.0 (TID 1, localh ost, executor driver): java.lang.NumberFormatException: For input string: "4 B anana Boat Sunscree 1.12 1.12 16.08"
at java.lang.NumberFormatException.forInputString(NumberFormatException. java:65)
at java.lang.Long.parseLong(Long.java:589)
at java.lang.Long.parseLong(Long.java:631)
at scala.collection.immutable.StringLike$class.toLong(StringLike.scala:2 76)
at scala.collection.immutable.StringOps.toLong(StringOps.scala:29)
at $line15.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$anonfun$2.apply(<conso le>:26)
at $line15.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$anonfun$2.apply(<conso le>:26)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:409)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:409)
at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIte rator.processNext(Unknown Source)
at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRo wIterator.java:43)
at org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$8$$anon $1.hasNext(WholeStageCodegenExec.scala:395)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.s cala:234)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.s cala:228)
at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$ap ply$25.apply(RDD.scala:827)
at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$ap ply$25.apply(RDD.scala:827)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala: 38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
at org.apache.spark.scheduler.Task.run(Task.scala:108)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:338)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor. java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor .java:624)
at java.lang.Thread.run(Thread.java:748)
18/05/15 17:35:27 ERROR TaskSetManager: Task 1 in stage 0.0 failed 1 times; abor ting job
18/05/15 17:35:27 WARN TaskSetManager: Lost task 0.0 in stage 0.0 (TID 0, localh ost, executor driver): java.lang.NumberFormatException: For input string: "1 F ruit of the Loom Girls Socks 7.97 0.6 8.57"
at java.lang.NumberFormatException.forInputString(NumberFormatException. java:65)
at java.lang.Long.parseLong(Long.java:589)
at java.lang.Long.parseLong(Long.java:631)
at scala.collection.immutable.StringLike$class.toLong(StringLike.scala:2 76)
at scala.collection.immutable.StringOps.toLong(StringOps.scala:29)
at $line15.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$anonfun$2.apply(<conso le>:26)
at $line15.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$anonfun$2.apply(<conso le>:26)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:409)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:409)
at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIte rator.processNext(Unknown Source)
at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRo wIterator.java:43)
at org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$8$$anon $1.hasNext(WholeStageCodegenExec.scala:395)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.s cala:234)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.s cala:228)
Created ‎05-15-2018 09:17 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@Abdul Rahim it seems the problem you have now is different. Looks like the input data is not being split on comma. Make sure the map _.split(",") is working because it seems not to be working for you now. Also please mark the other answer I provided as it solved the original parsing issue you had.
Created ‎05-16-2018 06:37 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I doubt the approach on file creation using Vi Samplecsv.I have created the file and simply pasted the data in it.Is that a right approach or is there any other way to do that.?.Since there is no wrong with the code as it worked fine for you.@Felix Albani
Created ‎05-16-2018 05:43 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I doubt the approach on file creation using Vi Samplecsv.I have created the file and simply pasted the data in it.Is that a right approach or is there any other way to do that.?.Since there is no wrong with the code as it worked fine for you. @Felix Albani
Created ‎05-17-2018 04:56 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@Abdul Rahim
Please use following code
case class Person(index:Long,item:String,cost:Float,Tax:Float,Total:Float)
val peopleDs = sc.textFile("C:/hcubeapi/test-case1123.txt").map(_.split(",").map(_.trim)).map(attributes=> Person(attributes(0).toLong,attributes(1).toString,attributes(2).toFloat,attributes(3).toFloat,attributes(4).toFloat)).toDF() peopleDs.createOrReplaceTempView("people")
val res = spark.sql("Select * from people")
res.show()
