Community Articles

dice · ‎09-05-2017

Symptoms

A Spark job fails with INTERNAL_FAILURE. In the WA (Workload Analytics) page of the job that failed, the following message is reported:

org.apache.spark.SparkException: Application application_1503474791091_0002 finished with failed status

Diagnosis

As the Telemetry Publisher didn't retrieve the application log due to a known bug, we have to diagnose the application logs (application_1503474791091_0002) directly, which are stored in the user's S3 bucket.

If the following exception is found, it indicates that the application failed to resolve a dependency in the Hadoop class path:

17/08/24 13:13:33 INFO ApplicationMaster: Preparing Local resources
Exception in thread "main" java.lang.NoSuchMethodError: org.apache.hadoop.tracing.TraceUtils.wrapHadoopConf(Ljava/lang/String;Lorg/apache/hadoop/conf/Configuration;)Lorg/apache/htrace/core/HTraceConfiguration;
 at org.apache.hadoop.fs.FsTracer.get(FsTracer.java:42)
 at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:687)
 at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:671)
 at org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:155)
 at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2653)

This most likely occurred because the jar may have been built using the another Hadoop distribution's repository, for example EMR (Amazon Elastic MapReduce)

Solution

To resolve the issue, rebuild the application using the CDH repository, https://repository.cloudera.com/artifactory/cloudera-repos/, using Maven or sbt. The example of using Maven is as follows.

https://www.cloudera.com/documentation/enterprise/release-notes/topics/cdh_vd_cdh5_maven_repo.html

Cloudera Community

Community Articles

Jars Built for EMR Failed to Run on Cloudera Altus

Apache Hadoop

Apache Spark

Cloudera Enterprise Data Hub

HDFS

MapReduce

Symptoms

Diagnosis

Solution