Reply
New Contributor
Posts: 5
Registered: ‎05-11-2016
Accepted Solution

Spark 2.0 App not working on cluster

Hi all

 

We have Spark 2.0 (*) installed from the Cloudera parcel on our cluster (CDH 5.9.0).

When running a quite simple App which just reads in some csv files and does a groupBy I always receive errors.

The App is submitted with:

spark2-submit --class my_class myapp-1.0-SNAPSHOT.jar

And I receive the following error message:

java.io.InvalidClassException: org.apache.commons.lang3.time.FastDateFormat; local class incompatible: stream classdesc serialVersionUID = 2, local class serialVersionUID = 1

I figured out that there are multiple versions of lang3 installed with the Cloudera release and modified the spark2-submit to:

spark2-submit --conf spark.driver.userClassPathFirst=true --conf spark.executor.userClassPathFirst=true --jars /var/opt/teradata/cloudera/parcels/CDH/jars/commons-lang3-3.3.2.jar --class my_class myapp-1.0-SNAPSHOT.jar

This way I cloud get rid of the first error message, but now I get:

java.lang.ClassCastException: cannot assign instance of org.apache.commons.lang3.time.FastDateFormat to field org.apache.spark.sql.execution.datasources.csv.CSVOptions.dateFormat of type org.apache.commons.lang3.time.FastDateFormat in instance of org.apache.spark.sql.execution.datasources.csv.CSVOptions

The App was written in Scala and compiled using Maven. The source code (**) and the maven pom file (***) are attached at the bottom of this post.

Does anybody have an idea on solving this issue?

Any help is highly appreciated!

 

Thanks a lot in advance!

Kind Regards

 

(*)

$spark2-submit --version
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /___/ .__/\_,_/_/ /_/\_\   version 2.0.0.cloudera1
      /_/

Branch HEAD
Compiled by user jenkins on 2016-12-06T18:34:13Z
Revision 2389f44e0185f33969d782ed09b41ae45fe30324

(**)

import org.apache.spark.sql.SparkSession

object my_class {
  def main(args: Array[String]): Unit = {
    val spark = SparkSession
      .builder
      .appName("myapp")
      .getOrCreate()

    val csv = spark.read.option("header", value = false).csv("/path/to/folder/with/some/csv/files/")

    val pivot = csv.groupBy("_c0").count()

    csv.take(10).foreach(println)
    pivot.take(10).foreach(println)
    spark.stop()
  }
}

(***)

<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
         xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
         xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
    <modelVersion>4.0.0</modelVersion>

    <groupId>de.lht.datalab.ingestion</groupId>
    <artifactId>myapp</artifactId>
    <version>1.0-SNAPSHOT</version>

    <properties>
        <scala.version.base>2.11</scala.version.base>
        <scala.version>${scala.version.base}.8</scala.version>
        <spark.version>2.0.0.cloudera1</spark.version>
    </properties>

    <repositories>
        <repository>
            <id>cloudera</id>
            <url>https://repository.cloudera.com/artifactory/cloudera-repos/</url>
        </repository>
    </repositories>

    <dependencies>
        <dependency>
            <groupId>org.apache.spark</groupId>
            <artifactId>spark-sql_${scala.version.base}</artifactId>
            <version>${spark.version}</version>
        </dependency>
    </dependencies>


    <build>
        <plugins>
            <plugin>
                <groupId>org.scala-tools</groupId>
                <artifactId>maven-scala-plugin</artifactId>
                <version>2.15.2</version>
                <executions>
                    <execution>
                        <goals>
                            <goal>compile</goal>
                            <goal>testCompile</goal>
                        </goals>
                    </execution>
                </executions>
            </plugin>
        </plugins>
    </build>

</project>
Cloudera Employee
Posts: 423
Registered: ‎08-11-2014

Re: Spark 2.0 App not working on cluster

This is due to a difference in the version of commons-lang3 you use and the one Spark does, generally. See https://issues.apache.org/jira/browse/ZEPPELIN-1977 for example.

I believe you'll find that it's resolved in the latest Spark 2 release for CDH.

http://community.cloudera.com/t5/Community-News-Release/ANNOUNCE-Spark-2-0-Release-2/m-p/51464#M161

Highlighted
New Contributor
Posts: 5
Registered: ‎05-11-2016

Re: Spark 2.0 App not working on cluster

Thanks a lot.

With the given workaround at the end of the Zeppelin issue, it works for me now.

 

New Contributor
Posts: 3
Registered: ‎01-31-2017

Re: Spark 2.0 App not working on cluster

What is the solution? (I do not have an enterprise account and we may not be able to upgrade the cluster soon enough).

Announcements