Member since
05-11-2016
5
Posts
0
Kudos Received
0
Solutions
08-23-2021
06:07 PM
I am using Spark 2.4.0 CDH 6.3.4. I got the issue of java.lang.ClassCastException: cannot assign instance of org.apache.commons.lang3.time.FastDateFormat to field org.apache.spark.sql.catalyst.csv.CSVOptions.dateFormat of type org.apache.commons.lang3.time.FastDateFormat in instance of org.apache.spark.sql.catalyst.csv.CSVOptions Caused by: java.lang.ClassCastException: cannot assign instance of org.apache.commons.lang3.time.FastDateFormat to field org.apache.spark.sql.catalyst.csv.CSVOptions.dateFormat of type org.apache.commons.lang3.time.FastDateFormat in instance of org.apache.spark.sql.catalyst.csv.CSVOptions at java.io.ObjectStreamClass$FieldReflector.setObjFieldValues(ObjectStreamClass.java:2301) at java.io.ObjectStreamClass.setObjFieldValues(ObjectStreamClass.java:1431) at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2371) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2289) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2147) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1646) at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2365) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2289) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2147) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1646) at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2365) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2289) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2147) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1646) at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2365) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2289) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2147) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1646) at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2365) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2289) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2147) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1646) at java.io.ObjectInputStream.readObject(ObjectInputStream.java:482) at java.io.ObjectInputStream.readObject(ObjectInputStream.java:440) at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:75) at org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:114) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:83) at org.apache.spark.scheduler.Task.run(Task.scala:121) at org.apache.spark.executor.Executor$TaskRunner$$anonfun$11.apply(Executor.scala:407) at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1408) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:413) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Finally I able to resolve the issue. I was using org.apache.spark:spark-core_2.11:jar:2.4.0-cdh6.3.4:provided. Even though it is mentioned as provided, but it includes some of the transitive dependencies as scope compile. org.apache.commons:commons-lang3:jar:3.7 is one of those. If you provide commons-lang3 from outside it will create the problem as it gets packaged inside your fat jar. Therefore I forced few of the jars scope as provided explicitly as listed below. org.apache.commons:commons-lang3:3.7 org.apache.zookeeper:zookeeper:3.4.5-cdh6.3.4 io.dropwizard.metrics:metrics-core:3.1.5 com.fasterxml.jackson.core:jackson-databind:2.9.10.6 org.apache.commons:commons-crypto:1.0.0 By doing this application is forced to use the commons-lang3 jar provided by the platform. Pom snippet to solve the issue <dependency> <groupId>org.apache.spark</groupId> <artifactId>spark-core_${scala.binary.version}</artifactId> <version>${spark.core.version}</version> <scope>provided</scope> </dependency> <!-- Declaring following dependencies explicitly as provided as they are not declared as provide as part of spark-core --> <!-- Start --> <dependency> <groupId>org.apache.commons</groupId> <artifactId>commons-lang3</artifactId> <version>3.7</version> <scope>provided</scope> </dependency> <dependency> <groupId>org.apache.zookeeper</groupId> <artifactId>zookeeper</artifactId> <version>3.4.5-cdh6.3.4</version> <scope>provided</scope> </dependency> <dependency> <groupId>io.dropwizard.metrics</groupId> <artifactId>metrics-core</artifactId> <version>3.1.5</version> <scope>provided</scope> </dependency> <dependency> <groupId>com.fasterxml.jackson.core</groupId> <artifactId>jackson-databind</artifactId> <version>2.9.10.6</version> <scope>provided</scope> </dependency> <dependency> <groupId>org.apache.commons</groupId> <artifactId>commons-crypto</artifactId> <version>1.0.0</version> <scope>provided</scope> </dependency> <!-- End -->
... View more
03-29-2017
01:19 AM
Unfortunately, the content of this file is under NDA, so I can't provice you the file. Some information that I can give is summarized here: Output from "hdfs dfs -ls": -rwxrwx--x+ 3 hive hive 1093251527 2016-09-30 21:15 /path/to/file/month=12/part-r-00000-be7725db-da77-4a34-a3c6-2e5a9276228c.snappy.parquet We have a _metadata and an _common_metadata file in the same directory (I tried removing them, but this did not resolve the issue) Compression: snappy It was created using: parquet-mr version 1.5.0-cdh5.7.1 (build ${buildNumber}) (output from parquet-tools, version 1.9.0) Software used for creation: Bundled Spark 1.6.0 from CDH 5.7.1 (in the meantime we are using CDH 5.9.0) The file contains 713 row groups The file contains 867 columns (of types int64, double and binary) One further things that I tried is copying the problematic file to a seperate directory (without the two metadata files) create a new table from this file with Impala and do the test here. Unfortunately this produces the exactly same behaviour. When it is cached I get the error message, when it is not cached, everything works fine. Let me know if this helps you in understanding this problem or if you need further information (except from the contents of the file. Thanks a lot already! Kind Regards
... View more