Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Hadoop 2.7: MapReduce Job failes if HADOOP_CLASSPATH is not set and runs on setting nonsense value

Highlighted

Hadoop 2.7: MapReduce Job failes if HADOOP_CLASSPATH is not set and runs on setting nonsense value

Expert Contributor

I found a strange behavior when running an example WordCount MapReduce job via command line: If I set the HADOOP_CLASSPATH to any value, the Job runs correctly. If I don't set the HADOOP_CLASSPATH it fails.

Question here is: Why does the Job fail if Hadoop Classpath is not set and succeeds if any value is set? Is this a Bug?

Steps to reproduce:

1. WordCount.java

package test;

import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.mapreduce.Job;

public class WordCount {
  public static void main(String[] args) throws Exception {
    if (args.length != 2) {
      System.out.printf("Usage: WordCount <input dir> <output dir>\n");
      System.exit(-1);
    }
    Job job = new Job();
    job.setJarByClass(WordCount.class);
    job.setJobName("Word Count");

    FileInputFormat.setInputPaths(job, new Path(args[0]));
    FileOutputFormat.setOutputPath(job, new Path(args[1]));
    
    job.setMapperClass(WordMapper.class);
    job.setReducerClass(SumReducer.class);

    job.setOutputKeyClass(Text.class);
    job.setOutputValueClass(IntWritable.class);

    boolean success = job.waitForCompletion(true);
    System.exit(success ? 0 : 1);
  }
}

2. WordMapper.java

package test;

import java.io.IOException;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;

public class WordMapper extends Mapper<LongWritable, Text, Text, IntWritable> {
  @Override
  public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
    String line = value.toString();
    for (String word : line.split("\\W+")) {
      if (word.length() > 0) {
        context.write(new Text(word), new IntWritable(1));
      }
    }
  }
}

3. SumReducer.java

package test;

import java.io.IOException;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Reducer;

public class SumReducer extends Reducer<Text, IntWritable, Text, IntWritable> {
  @Override
	public void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException {
		int wordCount = 0;
		for (IntWritable value : values) {
			wordCount += value.get();
		}
		context.write(key, new IntWritable(wordCount));
	}
}

4. Change to src directory of WordCount Java project

cd ~/wordcount/src

5. Compile the .java files

javac -classpath `hadoop classpath` test/*.java

6. Build .jar from the .class files

jar cvf wordcount.jar test/*.class

7. Set any value (e.g. "abc"), if the Job shall suceed, if not don't set the variable (skip this step)

export HADOOP_CLASSPATH=abc

8. Run the Job (runs if HADOOP_CLASSPATH is set, fails otherwise)

hadoop jar wordcount.jar test.WordCount input output

9. If Classpath is set, Job finishes successfully without the following warning and error. Is Classpath is not set, the following is printed:

...

WARN mapreduce.JobResourceUploader: No job jar file set.  User classes may not be found. See Job or Job#setJar(String).
...

Error: java.lang.RuntimeException: java.lang.ClassNotFoundException: Class test.WordMapper not found
        at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2308)
        at org.apache.hadoop.mapreduce.task.JobContextImpl.getMapperClass(JobContextImpl.java:187)
        at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:747)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
        at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:170)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1869)
        at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:164)
Caused by: java.lang.ClassNotFoundException: Class test.WordMapper not found
        at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:2214)
        at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2306)
        ... 8 more

Can someone explain this behavior? Why is it also running if HADOOP_CLASSPATH contains any (nonsense) value? Is this a Bug or is there a good reason for this?

--- Update 2019-02-15 ---

Seems it has something to do with my default hadoop classpath, which is printed when running hadoop classpath command (wihtout calling export HADOOP_CLASSPATH=... before!)

hadoop classpath

/usr/hdp/2.6.5.0-292/hadoop/conf:/usr/hdp/2.6.5.0-292/hadoop/lib/*:/usr/hdp/2.6.5.0-292/hadoop/.//*:/usr/hdp/2.6.5.0-292/hadoop-hdfs/./:/usr/hdp/2.6.5.0-292/hadoop-hdfs/lib/*:/usr/hdp/2.6.5.0-292/hadoop-hdfs/.//*:/usr/hdp/2.6.5.0-292/hadoop-yarn/lib/*:/usr/hdp/2.6.5.0-292/hadoop-yarn/.//*:/usr/hdp/2.6.5.0-292/hadoop-mapreduce/lib/*:/usr/hdp/2.6.5.0-292/hadoop-mapreduce/.//*::mysql-connector-java.jar:/usr/hdp/2.6.5.0-292/tez/*:/usr/hdp/2.6.5.0-292/tez/lib/*:/usr/hdp/2.6.5.0-292/tez/conf

My list contains an empty entry (to see by the :: in the list). The Job fails at this (default) setting.

If I set the HADOOP_CLASSPATH with any value, the empty entry is filled by this value, so there's no :: anymore. After doing this the Job runs:

export HADOOP_CLASSPATH=anything

hadoop classpath

/usr/hdp/2.6.5.0-292/hadoop/conf:/usr/hdp/2.6.5.0-292/hadoop/lib/*:/usr/hdp/2.6.5.0-292/hadoop/.//*:/usr/hdp/2.6.5.0-292/hadoop-hdfs/./:/usr/hdp/2.6.5.0-292/hadoop-hdfs/lib/*:/usr/hdp/2.6.5.0-292/hadoop-hdfs/.//*:/usr/hdp/2.6.5.0-292/hadoop-yarn/lib/*:/usr/hdp/2.6.5.0-292/hadoop-yarn/.//*:/usr/hdp/2.6.5.0-292/hadoop-mapreduce/lib/*:/usr/hdp/2.6.5.0-292/hadoop-mapreduce/.//*:anything:mysql-connector-java.jar:/usr/hdp/2.6.5.0-292/tez/*:/usr/hdp/2.6.5.0-292/tez/lib/*:/usr/hdp/2.6.5.0-292/tez/conf