Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here. Want to know more about what has changed? Check out the Community News blog.

Spark With Yarn-client mode

Spark With Yarn-client mode

New Contributor

I try to make a java program, to connect to Spark on CDH 5.5 with the yarn-client mode.

 

My Java code is (package com.semsoft.spark in my case)

public static void main(String[] args) {
try (JavaSparkContext javaSparkContext = modeCDHSimple()) {
LOGGER.info("UI on http://localhost:4040");
List<String> data = Arrays.asList("a", "b", "c", "d");
long collect = javaSparkContext.parallelize(data).count();
LOGGER.info("Result {}.", collect);
} catch (Throwable t) {
LOGGER.error("Error ", t);
}
System.exit(0);
}

private static
JavaSparkContext modeCDHSimple() {
SparkConf sparkConf = new SparkConf();
sparkConf.setAppName("My Test")
.setMaster("yarn-client")
.set("spark.yarn.jar", "hdfs:///user/spark/spark-assembly.jar");

SparkContext sparkContext = new SparkContext(sparkConf);
return new JavaSparkContext(sparkContext);
}

 On the cluster i have done :

sudo -u hdfs hdfs dfs -mkdir -p /user/spark
sudo -u hdfs hdfs dfs -put /usr/lib/spark/lib/spark-assembly.jar /user/spark/spark-assembly.jar

And the Maven POM is

<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>

<groupId>boris</groupId>
<artifactId>sparkYarn</artifactId>
<version>1.0-SNAPSHOT</version>

<repositories>
<repository>
<id>cloudera</id>
<url>https://repository.cloudera.com/artifactory/cloudera-repos/</url>
</repository>
</repositories>

<properties>
<spark.version>1.5.0-cdh5.5.1</spark.version>
</properties>


<dependencies>

<!-- Spark -->
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_2.10</artifactId>
<version>${spark.version}</version>
<!--<scope>runtime</scope>-->
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-yarn_2.10</artifactId>
<version>${spark.version}</version>
<!--<scope>runtime</scope>-->
</dependency>

<!-- LogBack -->
<dependency>
<groupId>ch.qos.logback</groupId>
<artifactId>logback-classic</artifactId>
<version>1.0.12</version>
</dependency>
<dependency>
<groupId>ch.qos.logback</groupId>
<artifactId>logback-core</artifactId>
<version>1.0.12</version>
</dependency>

<!-- SLF4J -->
<dependency>
<groupId>org.slf4j</groupId>
<artifactId>slf4j-api</artifactId>
<version>1.7.10</version>
</dependency>
<dependency>
<groupId>org.slf4j</groupId>
<artifactId>slf4j-log4j12</artifactId>
<version>1.7.10</version>
</dependency>
<dependency>
<groupId>org.slf4j</groupId>
<artifactId>jcl-over-slf4j</artifactId>
<version>1.7.10</version>
</dependency>
<dependency>
<groupId>org.slf4j</groupId>
<artifactId>jul-to-slf4j</artifactId>
<version>1.7.10</version>
</dependency>
<dependency>
<groupId>org.slf4j</groupId>
<artifactId>log4j-over-slf4j</artifactId>
<version>1.7.10</version>
</dependency>
</dependencies>

</project>

 

 

My spark task is send to the CDH cluster, but is status is never done, as we can see in the following logs :

14:40:14.021 [main] INFO  org.apache.spark.deploy.yarn.Client - Application report for application_1452605804783_0001 (state: ACCEPTED)
14:40:14.021 [main] DEBUG org.apache.spark.deploy.yarn.Client - 
	 client token: N/A
	 diagnostics: N/A
	 ApplicationMaster host: N/A
	 ApplicationMaster RPC port: -1
	 queue: root.spark
	 start time: 1452605995082
	 final status: UNDEFINED
	 tracking URL: http://cloudera-vm:8088/proxy/application_1452605804783_0001/
	 user: spark

What's wrong.

 

Thank's for your help.

Boris.