Member since
08-11-2014
481
Posts
92
Kudos Received
72
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
3043 | 01-26-2018 04:02 AM | |
6424 | 12-22-2017 09:18 AM | |
3099 | 12-05-2017 06:13 AM | |
3351 | 10-16-2017 07:55 AM | |
9583 | 10-04-2017 08:08 PM |
12-12-2022
02:07 PM
For me it worked using --master local $ pyspark --master local
... View more
08-23-2021
06:07 PM
I am using Spark 2.4.0 CDH 6.3.4. I got the issue of java.lang.ClassCastException: cannot assign instance of org.apache.commons.lang3.time.FastDateFormat to field org.apache.spark.sql.catalyst.csv.CSVOptions.dateFormat of type org.apache.commons.lang3.time.FastDateFormat in instance of org.apache.spark.sql.catalyst.csv.CSVOptions Caused by: java.lang.ClassCastException: cannot assign instance of org.apache.commons.lang3.time.FastDateFormat to field org.apache.spark.sql.catalyst.csv.CSVOptions.dateFormat of type org.apache.commons.lang3.time.FastDateFormat in instance of org.apache.spark.sql.catalyst.csv.CSVOptions at java.io.ObjectStreamClass$FieldReflector.setObjFieldValues(ObjectStreamClass.java:2301) at java.io.ObjectStreamClass.setObjFieldValues(ObjectStreamClass.java:1431) at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2371) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2289) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2147) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1646) at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2365) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2289) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2147) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1646) at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2365) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2289) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2147) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1646) at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2365) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2289) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2147) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1646) at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2365) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2289) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2147) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1646) at java.io.ObjectInputStream.readObject(ObjectInputStream.java:482) at java.io.ObjectInputStream.readObject(ObjectInputStream.java:440) at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:75) at org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:114) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:83) at org.apache.spark.scheduler.Task.run(Task.scala:121) at org.apache.spark.executor.Executor$TaskRunner$$anonfun$11.apply(Executor.scala:407) at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1408) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:413) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Finally I able to resolve the issue. I was using org.apache.spark:spark-core_2.11:jar:2.4.0-cdh6.3.4:provided. Even though it is mentioned as provided, but it includes some of the transitive dependencies as scope compile. org.apache.commons:commons-lang3:jar:3.7 is one of those. If you provide commons-lang3 from outside it will create the problem as it gets packaged inside your fat jar. Therefore I forced few of the jars scope as provided explicitly as listed below. org.apache.commons:commons-lang3:3.7 org.apache.zookeeper:zookeeper:3.4.5-cdh6.3.4 io.dropwizard.metrics:metrics-core:3.1.5 com.fasterxml.jackson.core:jackson-databind:2.9.10.6 org.apache.commons:commons-crypto:1.0.0 By doing this application is forced to use the commons-lang3 jar provided by the platform. Pom snippet to solve the issue <dependency> <groupId>org.apache.spark</groupId> <artifactId>spark-core_${scala.binary.version}</artifactId> <version>${spark.core.version}</version> <scope>provided</scope> </dependency> <!-- Declaring following dependencies explicitly as provided as they are not declared as provide as part of spark-core --> <!-- Start --> <dependency> <groupId>org.apache.commons</groupId> <artifactId>commons-lang3</artifactId> <version>3.7</version> <scope>provided</scope> </dependency> <dependency> <groupId>org.apache.zookeeper</groupId> <artifactId>zookeeper</artifactId> <version>3.4.5-cdh6.3.4</version> <scope>provided</scope> </dependency> <dependency> <groupId>io.dropwizard.metrics</groupId> <artifactId>metrics-core</artifactId> <version>3.1.5</version> <scope>provided</scope> </dependency> <dependency> <groupId>com.fasterxml.jackson.core</groupId> <artifactId>jackson-databind</artifactId> <version>2.9.10.6</version> <scope>provided</scope> </dependency> <dependency> <groupId>org.apache.commons</groupId> <artifactId>commons-crypto</artifactId> <version>1.0.0</version> <scope>provided</scope> </dependency> <!-- End -->
... View more
03-25-2021
01:01 AM
Import implicit where sc= val sc = SparkSession .builder() .appName("demo") .master("local") .getOrCreate() import sc.implicits._
... View more
03-02-2021
06:09 PM
No worries @PR_224 Glad it's fixed : )
... View more
07-26-2019
09:59 AM
Hi Pal, Can you grep for the particular application ID in the folder /user/spark/applicationHistory to make sure whether the job has been successfully completed or still in .inprogress state? Thanks AKR
... View more
02-07-2019
05:07 AM
I am facing the same problem. I want to explore hadoop services such as flume, hive etc for learning purpose. I read this thread, but I couldn't come to any conclusion. Can anyone please tell me the direct solution?
... View more
12-19-2018
07:02 PM
Had trouble with this as well, but removing the ".mode(...)" actually worked, AND it appended. spark.read.parquet("/path/to/parq1.parq","/path/to/parq2.parq").coalesce(1).write.format("parquet").saveAsTable("db.table")
... View more
11-19-2018
09:40 AM
Hi @srowen I am using CDH 5.15.1 and running the spark-submit to train the model and save the prediction dataframe of the model to HDFS. I am facing this errors when I am trying to save the dataframe to HDFS, 2018-11-19 11:17:33 ERROR YarnClusterScheduler:70 - Lost executor 2 on gworker6.vcse.lab: Executor heartbeat timed out after 149836 ms
2018-11-19 11:17:33 ERROR YarnClusterScheduler:70 - Lost executor 2 on gworker6.vcse.lab: Executor heartbeat timed out after 149836 ms
2018-11-19 11:18:07 ERROR YarnClusterScheduler:70 - Lost executor 2 on gworker6.vcse.lab: Container container_1542123439491_0080_01_000004 exited from explicit termination request.
2018-11-19 11:18:07 ERROR YarnClusterScheduler:70 - Lost executor 2 on gworker6.vcse.lab: Container container_1542123439491_0080_01_000004 exited from explicit termination request. I have also tried using the spark.yarn.executor.memoryOverhead which I have set that to 10% of the executor-memory mentioned in my spark-submit and still I am seeing this errors. Do you have any suggestions for this issue? Spark-Submit Command: spark-submit-with-zoo.sh --master yarn --deploy-mode cluster --num-executors 8 --executor-cores 16 --driver-memory 300g --executor-memory 400g Main_Final_auc.py 256
... View more
09-26-2018
09:26 AM
I posted an issue yesterday that relates to this -- the spark-submit classpath seems to conflict with commons-compress from a suppiled uber-jar. I've tried the --conf, --jar, and the --packages flags with spark-submit with no resolution. Spark 2.x + Tika: java.lang.NoSuchMethodError: org.apache.commons.compress.archivers.ArchiveStreamF Any help would be greatly appreciated!!!!
... View more