Reply
New Contributor
Posts: 1
Registered: ‎01-12-2016

Spark With Yarn-client mode

I try to make a java program, to connect to Spark on CDH 5.5 with the yarn-client mode.

 

My Java code is (package com.semsoft.spark in my case)

public static void main(String[] args) {
try (JavaSparkContext javaSparkContext = modeCDHSimple()) {
LOGGER.info("UI on http://localhost:4040");
List<String> data = Arrays.asList("a", "b", "c", "d");
long collect = javaSparkContext.parallelize(data).count();
LOGGER.info("Result {}.", collect);
} catch (Throwable t) {
LOGGER.error("Error ", t);
}
System.exit(0);
}

private static
JavaSparkContext modeCDHSimple() {
SparkConf sparkConf = new SparkConf();
sparkConf.setAppName("My Test")
.setMaster("yarn-client")
.set("spark.yarn.jar", "hdfs:///user/spark/spark-assembly.jar");

SparkContext sparkContext = new SparkContext(sparkConf);
return new JavaSparkContext(sparkContext);
}

 On the cluster i have done :

sudo -u hdfs hdfs dfs -mkdir -p /user/spark
sudo -u hdfs hdfs dfs -put /usr/lib/spark/lib/spark-assembly.jar /user/spark/spark-assembly.jar

And the Maven POM is

<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>

<groupId>boris</groupId>
<artifactId>sparkYarn</artifactId>
<version>1.0-SNAPSHOT</version>

<repositories>
<repository>
<id>cloudera</id>
<url>https://repository.cloudera.com/artifactory/cloudera-repos/</url>
</repository>
</repositories>

<properties>
<spark.version>1.5.0-cdh5.5.1</spark.version>
</properties>


<dependencies>

<!-- Spark -->
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_2.10</artifactId>
<version>${spark.version}</version>
<!--<scope>runtime</scope>-->
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-yarn_2.10</artifactId>
<version>${spark.version}</version>
<!--<scope>runtime</scope>-->
</dependency>

<!-- LogBack -->
<dependency>
<groupId>ch.qos.logback</groupId>
<artifactId>logback-classic</artifactId>
<version>1.0.12</version>
</dependency>
<dependency>
<groupId>ch.qos.logback</groupId>
<artifactId>logback-core</artifactId>
<version>1.0.12</version>
</dependency>

<!-- SLF4J -->
<dependency>
<groupId>org.slf4j</groupId>
<artifactId>slf4j-api</artifactId>
<version>1.7.10</version>
</dependency>
<dependency>
<groupId>org.slf4j</groupId>
<artifactId>slf4j-log4j12</artifactId>
<version>1.7.10</version>
</dependency>
<dependency>
<groupId>org.slf4j</groupId>
<artifactId>jcl-over-slf4j</artifactId>
<version>1.7.10</version>
</dependency>
<dependency>
<groupId>org.slf4j</groupId>
<artifactId>jul-to-slf4j</artifactId>
<version>1.7.10</version>
</dependency>
<dependency>
<groupId>org.slf4j</groupId>
<artifactId>log4j-over-slf4j</artifactId>
<version>1.7.10</version>
</dependency>
</dependencies>

</project>

 

 

My spark task is send to the CDH cluster, but is status is never done, as we can see in the following logs :

14:40:14.021 [main] INFO  org.apache.spark.deploy.yarn.Client - Application report for application_1452605804783_0001 (state: ACCEPTED)
14:40:14.021 [main] DEBUG org.apache.spark.deploy.yarn.Client - 
	 client token: N/A
	 diagnostics: N/A
	 ApplicationMaster host: N/A
	 ApplicationMaster RPC port: -1
	 queue: root.spark
	 start time: 1452605995082
	 final status: UNDEFINED
	 tracking URL: http://cloudera-vm:8088/proxy/application_1452605804783_0001/
	 user: spark

What's wrong.

 

Thank's for your help.

Boris.