About kamaludeena

kamaludeena · ‎05-16-2018

I doubt the approach on file creation using Vi Samplecsv.I have created the file and simply pasted the data in it.Is that a right approach or is there any other way to do that.?.Since there is no wrong with the code as it worked fine for you. @Felix Albani

kamaludeena · ‎05-16-2018

I doubt the approach on file creation using Vi Samplecsv.I have created the file and simply pasted the data in it.Is that a right approach or is there any other way to do that.?.Since there is no wrong with the code as it worked fine for you.@Felix Albani

kamaludeena · ‎05-15-2018

I get the below statements.Is that sign of successful execution ? @Felix Albani 18/05/15 17:35:27 ERROR Executor: Exception in task 0.0 in stage 0.0 (TID 0) java.lang.NumberFormatException: For input string: "1 Fruit of the Loom Girls Socks 7.97 0.6 8.57" at java.lang.NumberFormatException.forInputString(NumberFormatException. java:65) at java.lang.Long.parseLong(Long.java:589) at java.lang.Long.parseLong(Long.java:631) at scala.collection.immutable.StringLike$class.toLong(StringLike.scala:2 76) at scala.collection.immutable.StringOps.toLong(StringOps.scala:29) at $line15.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$anonfun$2.apply(<conso le>:26) at $line15.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$anonfun$2.apply(<conso le>:26) at scala.collection.Iterator$$anon$11.next(Iterator.scala:409) at scala.collection.Iterator$$anon$11.next(Iterator.scala:409) at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIte rator.processNext(Unknown Source) at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRo wIterator.java:43) at org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$8$$anon $1.hasNext(WholeStageCodegenExec.scala:395) at org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.s cala:234) at org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.s cala:228) at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$ap ply$25.apply(RDD.scala:827) at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$ap ply$25.apply(RDD.scala:827) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala: 38) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323) at org.apache.spark.rdd.RDD.iterator(RDD.scala:287) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87) at org.apache.spark.scheduler.Task.run(Task.scala:108) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:338) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor. java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor .java:624) at java.lang.Thread.run(Thread.java:748) 18/05/15 17:35:27 ERROR Executor: Exception in task 1.0 in stage 0.0 (TID 1) java.lang.NumberFormatException: For input string: "4 Banana Boat Sunscree 1 .12 1.12 16.08" at java.lang.NumberFormatException.forInputString(NumberFormatException. java:65) at java.lang.Long.parseLong(Long.java:589) at java.lang.Long.parseLong(Long.java:631) at scala.collection.immutable.StringLike$class.toLong(StringLike.scala:2 76) at scala.collection.immutable.StringOps.toLong(StringOps.scala:29) at $line15.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$anonfun$2.apply(<conso le>:26) at $line15.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$anonfun$2.apply(<conso le>:26) at scala.collection.Iterator$$anon$11.next(Iterator.scala:409) at scala.collection.Iterator$$anon$11.next(Iterator.scala:409) at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIte rator.processNext(Unknown Source) at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRo wIterator.java:43) at org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$8$$anon $1.hasNext(WholeStageCodegenExec.scala:395) at org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.s cala:234) at org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.s cala:228) at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$ap ply$25.apply(RDD.scala:827) at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$ap ply$25.apply(RDD.scala:827) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala: 38) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323) at org.apache.spark.rdd.RDD.iterator(RDD.scala:287) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87) at org.apache.spark.scheduler.Task.run(Task.scala:108) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:338) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor. java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor .java:624) at java.lang.Thread.run(Thread.java:748) 18/05/15 17:35:27 WARN TaskSetManager: Lost task 1.0 in stage 0.0 (TID 1, localh ost, executor driver): java.lang.NumberFormatException: For input string: "4 B anana Boat Sunscree 1.12 1.12 16.08" at java.lang.NumberFormatException.forInputString(NumberFormatException. java:65) at java.lang.Long.parseLong(Long.java:589) at java.lang.Long.parseLong(Long.java:631) at scala.collection.immutable.StringLike$class.toLong(StringLike.scala:2 76) at scala.collection.immutable.StringOps.toLong(StringOps.scala:29) at $line15.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$anonfun$2.apply(<conso le>:26) at $line15.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$anonfun$2.apply(<conso le>:26) at scala.collection.Iterator$$anon$11.next(Iterator.scala:409) at scala.collection.Iterator$$anon$11.next(Iterator.scala:409) at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIte rator.processNext(Unknown Source) at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRo wIterator.java:43) at org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$8$$anon $1.hasNext(WholeStageCodegenExec.scala:395) at org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.s cala:234) at org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.s cala:228) at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$ap ply$25.apply(RDD.scala:827) at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$ap ply$25.apply(RDD.scala:827) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala: 38) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323) at org.apache.spark.rdd.RDD.iterator(RDD.scala:287) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87) at org.apache.spark.scheduler.Task.run(Task.scala:108) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:338) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor. java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor .java:624) at java.lang.Thread.run(Thread.java:748) 18/05/15 17:35:27 ERROR TaskSetManager: Task 1 in stage 0.0 failed 1 times; abor ting job 18/05/15 17:35:27 WARN TaskSetManager: Lost task 0.0 in stage 0.0 (TID 0, localh ost, executor driver): java.lang.NumberFormatException: For input string: "1 F ruit of the Loom Girls Socks 7.97 0.6 8.57" at java.lang.NumberFormatException.forInputString(NumberFormatException. java:65) at java.lang.Long.parseLong(Long.java:589) at java.lang.Long.parseLong(Long.java:631) at scala.collection.immutable.StringLike$class.toLong(StringLike.scala:2 76) at scala.collection.immutable.StringOps.toLong(StringOps.scala:29) at $line15.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$anonfun$2.apply(<conso le>:26) at $line15.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$anonfun$2.apply(<conso le>:26) at scala.collection.Iterator$$anon$11.next(Iterator.scala:409) at scala.collection.Iterator$$anon$11.next(Iterator.scala:409) at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIte rator.processNext(Unknown Source) at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRo wIterator.java:43) at org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$8$$anon $1.hasNext(WholeStageCodegenExec.scala:395) at org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.s cala:234) at org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.s cala:228)

kamaludeena · ‎05-15-2018

I have given the data in the csv file and the commands used.All I wanted was to read the csv file with my own schema using Dataframe and execute it with SparkSql.I'm just learning spark and excuse me if the question looks silly.I appreciate everyone who is willing to help me. 1,Fruit of the Loom Girls Socks,7.97,0.60,8.57 2,Rawlings Little League Baseball,2.97,0.22,3.19 3,Secret Antiperspirant,1.29,0.10,1.39 4,Deadpool DVD,14.96,1.12,16.08 5,Maxwell House Coffee 28 oz,7.28,0.55,7.83 6,Banana Boat Sunscreen,6.68,0.50,7.18 7,Wrench Set,10.00, 0.75, 10.75 8,M and Mz,8.98,0.67,9.65 9,Bertoli Alfredo Sauce,2.12, 0.16, 2.28 10,Large Paperclips,6.19, 0.46, 6.65 case class Person(index:Long,item:String,cost:Long,Tax:Long,Total:Long) val peopleDs = sc.textFile("hdpcd/Samplecsv").map(_.split(",").map(_.trim)).map(attributes=> Person(attributes(0).toLong,attributes(1).toString,attributes(2).toLong,attributes(3).toLong,attributes(4).toLong)).toDF() peopleDs.createOrReplaceTempView("people") val res = spark.sql("Select * from people") res.collect() (0 + 2) / 2]18/05/13 10:03:28 ERROR Executor: Exception in task 1.0 in stage 0.0 (TID 1) java.lang.NumberFormatException: For input string: "6.68" at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65) at java.lang.Long.parseLong(Long.java:589) at java.lang.Long.parseLong(Long.java:631) at scala.collection.immutable.StringLike$class.toLong(StringLike.scala:276) at scala.collection.immutable.StringOps.toLong(StringOps.scala:29) at $line15.$read$iw$iw$iw$iw$iw$iw$iw$iw$anonfun$2.apply(<console>:26) at $line15.$read$iw$iw$iw$iw$iw$iw$iw$iw$anonfun$2.apply(<console>:26) at scala.collection.Iterator$anon$11.next(Iterator.scala:409) at scala.collection.Iterator$anon$11.next(Iterator.scala:409) at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown Source) at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43) at org.apache.spark.sql.execution.WholeStageCodegenExec$anonfun$8$anon$1.hasNext(WholeStageCodegenExec.scala:395) at org.apache.spark.sql.execution.SparkPlan$anonfun$2.apply(SparkPlan.scala:234) at org.apache.spark.sql.execution.SparkPlan$anonfun$2.apply(SparkPlan.scala:228) at org.apache.spark.rdd.RDD$anonfun$mapPartitionsInternal$1$anonfun$apply$25.apply(RDD.scala:827) at org.apache.spark.rdd.RDD$anonfun$mapPartitionsInternal$1$anonfun$apply$25.apply(RDD.scala:827) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323) at org.apache.spark.rdd.RDD.iterator(RDD.scala:287) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87) at org.apache.spark.scheduler.Task.run(Task.scala:108) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:338) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) 18/05/13 10:03:28 ERROR Executor: Exception in task 0.0 in stage 0.0 (TID 0)

Online	Offline
Last Visited	‎09-28-2018 03:11 AM

Member Since	‎05-15-2018 06:37 AM
Last Visited	‎09-28-2018 03:11 AM
Posts	8

Cloudera Community

Re: Why the below Spark operation on Dataframe API...

Re: Why the below Spark operation on Dataframe API...

Re: Why the below Spark operation on Dataframe API...

Why the below Spark operation on Dataframe API sho...