Support Questions

Find answers, ask questions, and share your expertise
Announcements
Check out our newest addition to the community, the Cloudera Data Analytics (CDA) group hub.

Minimal executable jar based on Scala code packed with Maven

Contributor

Hi folks,

I would like to make a minimal example packed with Maven based on Scala code (like a hello world) and run it on a HDP2.5 sandbox. What do I need to specify in my pom.xml? So far I have this:

<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd">
  <modelVersion>4.0.0</modelVersion>
  <groupId>com.test.spark</groupId>
  <artifactId>Test</artifactId>
  <version>0.0.1</version>
  <name>${project.artifactId}</name>
  <description>Simple test app</description>
  <inceptionYear>2017</inceptionYear>

  <!-- change from 1.6 to 1.7 depending on Java version -->
  <properties>
    <maven.compiler.source>1.6</maven.compiler.source>
    <maven.compiler.target>1.6</maven.compiler.target>
    <encoding>UTF-8</encoding>
    <scala.version>2.11.5</scala.version>
    <scala.compat.version>2.11</scala.compat.version>
    <spark.version>1.6.1</spark.version>
  </properties>

  <dependencies>
    <dependency>
      <groupId>org.scala-lang</groupId>
      <artifactId>scala-library</artifactId>
      <version>${scala.version}</version>
    </dependency>

    <!-- Spark dependency -->
    <dependency>
        <groupId>org.apache.spark</groupId>
        <artifactId>spark-core_${scala.compat.version}</artifactId>
        <version>${spark.version}</version>
    </dependency>
    <!-- Spark sql dependency -->
    <dependency>
        <groupId>org.apache.spark</groupId>
        <artifactId>spark-sql_${scala.compat.version}</artifactId>
        <version>${spark.version}</version>
    </dependency>
    <!-- Spark hive dependency -->
    <dependency>
        <groupId>org.apache.spark</groupId>
        <artifactId>spark-hive_${scala.compat.version}</artifactId>
        <version>${spark.version}</version>
    </dependency>
 </dependencies>

  <build>
    <sourceDirectory>src/main/scala</sourceDirectory>
    <!-- Create JAR with all dependencies -->
    <plugins>
       <plugin>
        <artifactId>maven-assembly-plugin</artifactId>
        <version>3.0.0</version>
        <executions>
          <execution>
            <phase>package</phase>
            <goals>
              <goal>single</goal>
            </goals>
          </execution>
        </executions>
        <configuration>
          <descriptorRefs>
            <descriptorRef>jar-with-dependencies</descriptorRef>
          </descriptorRefs>
        </configuration>
      </plugin>
      <plugin>
        <!-- see http://davidb.github.com/scala-maven-plugin -->
        <groupId>net.alchim31.maven</groupId>
        <artifactId>scala-maven-plugin</artifactId>
        <version>3.2.2</version>
        <configuration>
            <scalaVersion>${scala.version}</scalaVersion>
            <scalaCompatVersion>${scala.compat.version}</scalaCompatVersion>
        </configuration>
        <executions>
          <execution>
            <phase>compile</phase>
            <goals>
              <goal>compile</goal>
            </goals>
          </execution>
        </executions>
      </plugin>
    </plugins>
  </build>
</project>

And my scala code is this:

package com.test.spark

import org.apache.spark.SparkContext
import org.apache.spark.SparkConf
import org.apache.spark.sql
import org.apache.commons.lang
import org.apache.spark.sql.SQLContext
import org.apache.spark.sql.hive.HiveContext
import org.apache.spark.rdd.RDD

object Test {
 def main(args: Array[String]) {
 val conf = new SparkConf().setAppName("Test") .setMaster("local[2]")
 val spark = new SparkContext(conf)
 println( "Hello World!" )
 }
}

I run the code with

spark-submit --class com.test.spark.Test --master yarn --deploy-mode cluster  hdfs://HDP25/test.jar

Unfortunately it does not run. 😞 See the image attached.

13843-01.png

Am I missing something? Can you please help me to get a minimal example running?

Thanks and kind regards

1 ACCEPTED SOLUTION

Contributor

Thanks @Ken Jiiii

Looking at your error, application master failed 2 times due to exit code 15, Did you check your /spark/conf if you have placed hive-site.xml and in your code can you try removing " .setMaster("local[2]") " as you are running on yarn.

try running it

spark-submit --class com.test.spark.Test --master yarn-cluster hdfs://HDP25/test.jar

View solution in original post

11 REPLIES 11

Contributor

First things first: it was not clear to me, if you can build your project and create a jar file by running a maven command? Once you can build a jar file out of your project, your pom.xml is fine.

To check your pom.xml, run this command:

mvn package

If it returns en error, please update your original post and include the mvn error

Contributor

Hi pbarna,

Oh I am sorry. Yes I could run mvn package without any errors (at the beginninh some dependencies were missing but I fixed that). The error in my original post is when I try to run my paavked jar on HDP. So I do not need any cluster or Hortonworks specific things in my pom, right?

Contributor
@Ken Jiiii

You can follow this link where you have example and pom.xml. and answering to questions " I do not need any cluster or Hortonworks specific things in my pom, right? " Yes you dont need. All those values should be in you code or client configs ( core-site.xml, yarn-site.xml )

Contributor

Hi @Vinod Bonthu,

Thanks gor the quick guide but I have a follow up question. So it seems my pom is fine since I can package it with maven. Although when I run it via "spark-submit" I get the error stated in my original post. But when I run one of the spark examples by executing below code works. So why is that? I have the feeling that still something is not right and as a forst step I just want to run an example code packaged by myself via spark-submit. Afterwards I will try an oozie workflow.

Thanks again!

spark-submit org.apache.spark.exaples.SOMEEXAMPlE --master yarn-cluster  hdfs://path-to-spark-examples.jar

Contributor

have you copied the jar file to the hdfs? if you run this command, what is the result?

hadoop fs -ls /path/to/your/spark.jar

Contributor

Yes I did. The reult is

-rwxr-xr-x  3 USER USER  12252 2017-03-17 10:49 /path-to-jar/my.jar

And the ouput when I run the spark-commit command is stated above.

Can you post the error logs which are given in the tracking url -

Contributor

Hi @Aditya Deshpande,

17/03/22 11:27:53 INFO HiveContext: Initializing execution hive, version 1.2.1
17/03/22 11:27:53 INFO ClientWrapper: Inspected Hadoop version: 2.7.3.2.5.0.0-1245
17/03/22 11:27:53 INFO ClientWrapper: Loaded org.apache.hadoop.hive.shims.Hadoop23Shims for Hadoop version 2.7.3.2.5.0.0-1245
17/03/22 11:27:53 INFO HiveMetaStore: 0: Opening raw store with implemenation class:org.apache.hadoop.hive.metastore.ObjectStore
17/03/22 11:27:53 INFO ObjectStore: ObjectStore, initialize called
17/03/22 11:27:53 WARN HiveMetaStore: Retrying creating default database after error: Class org.datanucleus.api.jdo.JDOPersistenceManagerFactory was not found.
javax.jdo.JDOFatalUserException: Class org.datanucleus.api.jdo.JDOPersistenceManagerFactory was not found.
        at javax.jdo.JDOHelper.invokeGetPersistenceManagerFactoryOnImplementation(JDOHelper.java:1175)
        at javax.jdo.JDOHelper.getPersistenceManagerFactory(JDOHelper.java:808)
        at javax.jdo.JDOHelper.getPersistenceManagerFactory(JDOHelper.java:701)
...
...

That is the first error I find in the logs in stderr. Another one is

17/03/22 11:27:53 INFO HiveMetaStore: 0: Opening raw store with implemenation class:org.apache.hadoop.hive.metastore.ObjectStore
17/03/22 11:27:53 INFO ObjectStore: ObjectStore, initialize called
17/03/22 11:27:53 WARN Hive: Failed to access metastore. This class should not accessed in runtime.
org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient
        at org.apache.hadoop.hive.ql.metadata.Hive.getAllDatabases(Hive.java:1236)
        at org.apache.hadoop.hive.ql.metadata.Hive.reloadFunctions(Hive.java:174)
        at org.apache.hadoop.hive.ql.metadata.Hive.<clinit>(Hive.java:166)
...
...


Can you make use of this?

Contributor

Thanks @Ken Jiiii

Looking at your error, application master failed 2 times due to exit code 15, Did you check your /spark/conf if you have placed hive-site.xml and in your code can you try removing " .setMaster("local[2]") " as you are running on yarn.

try running it

spark-submit --class com.test.spark.Test --master yarn-cluster hdfs://HDP25/test.jar

Contributor

Hi @Vinod Bonthu,

your answer really helped but unfortunately it wasn't the complete solution. As statd in my error log I had to add some jars via the

spark-submit --jars

command and I also added my hive site with

spark-submit --files /usr/hdp/current/spark-client/conf/hive-site.xml

With those two changes and removing

.setMaster("local[2]")

it worked!

Thanks for your help!

Contributor

awesome @Ken Jiiii

hive-site.xml should be available across the cluster in /etc/spark/conf ( where /usr/hdp/current/spark-client/conf will be symlink to) and spark client need to be installed across the cluster worker nodes for your yarn-cluster mode to run as your spark driver can run on any worker node and should be having client installed with spark/conf. If you are using Ambari it will taking care of hive-site.xml available in /spark-client/conf/

Take a Tour of the Community
Don't have an account?
Your experience may be limited. Sign in to explore more.