Community Articles

Find and share helpful community-sourced technical articles.
Announcements
Celebrating as our community reaches 100,000 members! Thank you!
Labels (1)
avatar

Objective

Using correct HDP repositories is a requirement when building Spark production applications that run on HDP. Hence I decided to create this article to help those creating new spark applications using Eclipse with maven that may not know how to reference the Hortornworks repositories instead of the default ones.

How-To

Following video goes step by step on how to create a simple spark application using the Hortonworks repositories. I will share the content of pom.xml and Hello scala class bellow.


Perquisites

1. From market place you need to install Scala IDE for Eclipse. Site: http://scala-ide.org/docs/current-user-doc/gettingstarted/index.html

2. Second install the maven integration for Scala IDE plugin. Site: http://alchim31.free.fr/m2e-scala/update-site/

3. Finally you need to add the archtype Remote Catalog - Url: http://repo1.maven.org/maven2/archetype-catalog.xml

The pom.xml

<project xmlns="http://maven.apache.org/POM/4.0.0"; xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"; xsi:schemaLocation="http://maven.apache.org/POM/4.0.0http://maven.apache.org/maven-v4_0_0.xsd">;
  <modelVersion>4.0.0</modelVersion>
  <groupId>example</groupId>
  <artifactId>spark101</artifactId>
  <version>0.0.1-SNAPSHOT</version>
  <name>${project.artifactId}</name>
  <description>My wonderfull scala app</description>
  <inceptionYear>2015</inceptionYear>
  <repositories>
        <repository>
            <id>hortonworks repo</id>
            <name>hortonworks repo</name>
            <url>http://repo.hortonworks.com/content/repositories/releases/</url>
        </repository>
        <repository>
            <id>hortonworks jetty</id>
            <name>hortonworks jetty repo</name>
            <url>http://repo.hortonworks.com/content/repositories/jetty-hadoop/</url>
        </repository>
    </repositories>
  <licenses>
    <license>
      <name>My License</name>
      <url>http://....</url>;
      <distribution>repo</distribution>
    </license>
  </licenses>
  <properties>
    <maven.compiler.source>1.6</maven.compiler.source>
    <maven.compiler.target>1.6</maven.compiler.target>
    <encoding>UTF-8</encoding>
    <scala.version>2.11.5</scala.version>
    <scala.compat.version>2.11</scala.compat.version>
  </properties>
  <dependencies>
    <dependency>
      <groupId>org.scala-lang</groupId>
      <artifactId>scala-library</artifactId>
      <version>${scala.version}</version>
    </dependency>
    <!-- Test -->
    <dependency>
      <groupId>junit</groupId>
      <artifactId>junit</artifactId>
      <version>4.11</version>
      <scope>test</scope>
    </dependency>
    <dependency>
      <groupId>org.specs2</groupId>
      <artifactId>specs2-core_${scala.compat.version}</artifactId>
      <version>2.4.16</version>
      <scope>test</scope>
    </dependency>
    <dependency>
      <groupId>org.scalatest</groupId>
      <artifactId>scalatest_${scala.compat.version}</artifactId>
      <version>2.2.4</version>
      <scope>test</scope>
    </dependency>
    <dependency>
        <groupId>org.apache.spark</groupId>
        <artifactId>spark-core_2.11</artifactId>
        <version>2.1.1.2.6.1.0-129</version>
    </dependency>
  </dependencies>
  <build>
    <sourceDirectory>src/main/scala</sourceDirectory>
    <testSourceDirectory>src/test/scala</testSourceDirectory>
    <plugins>
      <plugin>
        <!-- see http://davidb.github.com/scala-maven-plugin -->
        <groupId>net.alchim31.maven</groupId>
        <artifactId>scala-maven-plugin</artifactId>
        <version>3.2.0</version>
        <executions>
          <execution>
            <goals>
              <goal>compile</goal>
              <goal>testCompile</goal>
            </goals>
            <configuration>
              <args>
                <arg>-make:transitive</arg>
                <arg>-dependencyfile</arg>
                <arg>${project.build.directory}/.scala_dependencies</arg>
              </args>
            </configuration>
          </execution>
        </executions>
      </plugin>
      <plugin>
        <groupId>org.apache.maven.plugins</groupId>
        <artifactId>maven-surefire-plugin</artifactId>
        <version>2.18.1</version>
        <configuration>
          <useFile>false</useFile>
          <disableXmlReport>true</disableXmlReport>
          <!-- If you have classpath issue like NoDefClassError,... -->
          <!-- useManifestOnlyJar>false</useManifestOnlyJar -->
          <includes>
            <include>**/*Test.*</include>
            <include>**/*Suite.*</include>
          </includes>
        </configuration>
      </plugin>
    </plugins>
  </build>
</project>
<br>

Important Note: We will use the HDP 2.6.1 Spark 2.1.1 dependencies to build the project. If you are running different HDP version you need to check and correct the dependency to match the correct version being used. Also you should check which is the correct scala you should use in your project. For spark 2.1.1 the correct scala version is 2.11.x.


The Hello scala class

  1. Create a new package called example
  2. Create a new scala class called Hello inside the example package with following content:
package example
import org.apache.spark.{SparkConf, SparkContext}
object Hello extends Greeting with App {
  val conf = new SparkConf().setAppName(appName)
  val sc = new SparkContext(conf)
  println(greeting)
  println(sc.version)
}
trait Greeting {
  lazy val appName = "Hello World Spark App"
  lazy val greeting: String = "hello"
}
2,896 Views