Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Local Spark Development against a remote cluster

avatar
Rising Star

What is the best way to develop Spark applications on your local computer? I'm using IntelliJ and trying to set the master, just for debugging purposes, to my remote HDP cluster so I can test code against Hive and other resources on my cluster. I'm using HDP 2.5.3 and I've added the spark libraries for scala 2.10 and spark 1.6.2 from the maven repository. I've set my build.sbt scalaVersion to 2.10.5 and added the library dependencies. As far as I can tell, I have the exact same versions that are running in HDP 2.5.3 in my project, but when I try to run the application pointing the SparkConf to my remote spark master I get the following error for an incompatible class:

java.io.InvalidClassException: org.apache.spark.rdd.RDD; local class incompatible: stream classdesc serialVersionUID = 5009924811397974881, local class serialVersionUID = 7185378471520864965

Is there something I'm missing, or is there a better way to develop and test against the remote cluster?

21 REPLIES 21

avatar

@Eric Hanson I have not used IntelliJ so I can't advise on options there. However try building the uber jar, scp it to your edge node and use spark-submit. This will verify you have the correct jar building.

As for local testing the spark-testing-base mentioned in Tim's article looks like it will work for unit tests but at some point you are going to need to run it on a remote cluster.

avatar
New Contributor

Development environment

IntelliJ 2017.1.4 / jdk 1.8.40 / scala 2.10.5 / spark 1.6.2 / hive 1.2.1

VM : virtualBox / 5 VM (ambari, master slave1, slave2, slave3)

The reason is that the jar version is wrong. I use HDP 2.4.3.0-227.

I modified and solved the following.

================ modify bash_profile ===============

add export SPARK_HOME=/usr/hdp/current/spark-client

=================modify pom.xml========================

<repositories> <repository> <id>HDP</id> <name>HDP Releases</name> <url>http://repo.hortonworks.com/content/groups/public</url> <!--url>http://repo.hortonworks.com/content/repositories/releases/</url--> <layout>default</layout> <releases> <enabled>true</enabled> <updatePolicy>always</updatePolicy> <checksumPolicy>warn</checksumPolicy> </releases> <snapshots> <enabled>false</enabled> <updatePolicy>never</updatePolicy> <checksumPolicy>fail</checksumPolicy> </snapshots> </repository> </repositories> <dependencies> <dependency> <groupId>org.apache.spark</groupId> <artifactId>spark-core_2.10</artifactId> <version>1.6.2.2.4.3.0-227</version> </dependency> <dependency> <groupId>org.apache.spark</groupId> <artifactId>spark-sql_2.10</artifactId> <version>1.6.2.2.4.3.0-227</version> </dependency> <dependency> <groupId>org.apache.spark</groupId> <artifactId>spark-streaming_2.10</artifactId> <version>1.6.2.2.4.3.0-227</version> </dependency> <dependency> <groupId>org.apache.spark</groupId> <artifactId>spark-hive_2.10</artifactId> <version>1.6.2.2.4.3.0-227</version> </dependency> <dependency> <groupId>org.apache.spark</groupId> <artifactId>spark-yarn_2.10</artifactId> <version>1.6.2.2.4.3.0-227</version> </dependency> <dependency> <groupId>org.apache.hive</groupId> <artifactId>hive-jdbc</artifactId> <version>1.2.1</version> </dependency> <dependency> <groupId>org.scala-lang</groupId> <artifactId>scala-library</artifactId> <version>2.10.4</version> </dependency>

<!-- Test --> </dependencies>

====================================================