Created on 11-14-2017 01:39 AM
Objective
Using correct HDP repositories is a requirement when building Spark production applications that run on HDP. Hence I decided to create this article to help those creating new spark applications using Eclipse with maven that may not know how to reference the Hortornworks repositories instead of the default ones.
How-To
Following video goes step by step on how to create a simple spark application using the Hortonworks repositories. I will share the content of pom.xml and Hello scala class bellow.
Perquisites
1. From market place you need to install Scala IDE for Eclipse. Site: http://scala-ide.org/docs/current-user-doc/gettingstarted/index.html
2. Second install the maven integration for Scala IDE plugin. Site: http://alchim31.free.fr/m2e-scala/update-site/
3. Finally you need to add the archtype Remote Catalog - Url: http://repo1.maven.org/maven2/archetype-catalog.xml
The pom.xml
<project xmlns="http://maven.apache.org/POM/4.0.0"; xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"; xsi:schemaLocation="http://maven.apache.org/POM/4.0.0http://maven.apache.org/maven-v4_0_0.xsd">; <modelVersion>4.0.0</modelVersion> <groupId>example</groupId> <artifactId>spark101</artifactId> <version>0.0.1-SNAPSHOT</version> <name>${project.artifactId}</name> <description>My wonderfull scala app</description> <inceptionYear>2015</inceptionYear> <repositories> <repository> <id>hortonworks repo</id> <name>hortonworks repo</name> <url>http://repo.hortonworks.com/content/repositories/releases/</url> </repository> <repository> <id>hortonworks jetty</id> <name>hortonworks jetty repo</name> <url>http://repo.hortonworks.com/content/repositories/jetty-hadoop/</url> </repository> </repositories> <licenses> <license> <name>My License</name> <url>http://....</url>; <distribution>repo</distribution> </license> </licenses> <properties> <maven.compiler.source>1.6</maven.compiler.source> <maven.compiler.target>1.6</maven.compiler.target> <encoding>UTF-8</encoding> <scala.version>2.11.5</scala.version> <scala.compat.version>2.11</scala.compat.version> </properties> <dependencies> <dependency> <groupId>org.scala-lang</groupId> <artifactId>scala-library</artifactId> <version>${scala.version}</version> </dependency> <!-- Test --> <dependency> <groupId>junit</groupId> <artifactId>junit</artifactId> <version>4.11</version> <scope>test</scope> </dependency> <dependency> <groupId>org.specs2</groupId> <artifactId>specs2-core_${scala.compat.version}</artifactId> <version>2.4.16</version> <scope>test</scope> </dependency> <dependency> <groupId>org.scalatest</groupId> <artifactId>scalatest_${scala.compat.version}</artifactId> <version>2.2.4</version> <scope>test</scope> </dependency> <dependency> <groupId>org.apache.spark</groupId> <artifactId>spark-core_2.11</artifactId> <version>2.1.1.2.6.1.0-129</version> </dependency> </dependencies> <build> <sourceDirectory>src/main/scala</sourceDirectory> <testSourceDirectory>src/test/scala</testSourceDirectory> <plugins> <plugin> <!-- see http://davidb.github.com/scala-maven-plugin --> <groupId>net.alchim31.maven</groupId> <artifactId>scala-maven-plugin</artifactId> <version>3.2.0</version> <executions> <execution> <goals> <goal>compile</goal> <goal>testCompile</goal> </goals> <configuration> <args> <arg>-make:transitive</arg> <arg>-dependencyfile</arg> <arg>${project.build.directory}/.scala_dependencies</arg> </args> </configuration> </execution> </executions> </plugin> <plugin> <groupId>org.apache.maven.plugins</groupId> <artifactId>maven-surefire-plugin</artifactId> <version>2.18.1</version> <configuration> <useFile>false</useFile> <disableXmlReport>true</disableXmlReport> <!-- If you have classpath issue like NoDefClassError,... --> <!-- useManifestOnlyJar>false</useManifestOnlyJar --> <includes> <include>**/*Test.*</include> <include>**/*Suite.*</include> </includes> </configuration> </plugin> </plugins> </build> </project> <br>
Important Note: We will use the HDP 2.6.1 Spark 2.1.1 dependencies to build the project. If you are running different HDP version you need to check and correct the dependency to match the correct version being used. Also you should check which is the correct scala you should use in your project. For spark 2.1.1 the correct scala version is 2.11.x.
The Hello scala class
package example import org.apache.spark.{SparkConf, SparkContext} object Hello extends Greeting with App { val conf = new SparkConf().setAppName(appName) val sc = new SparkContext(conf) println(greeting) println(sc.version) } trait Greeting { lazy val appName = "Hello World Spark App" lazy val greeting: String = "hello" }