Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Failed to run MapReduce job when specifying more than one input file

Highlighted

Failed to run MapReduce job when specifying more than one input file

New Contributor

I am facing below error while trying to run MapReduce job with more than one input file. Although I am able to run MapReduce job with only one input file. I go through some posts and almost every one is saying there is firewall Issue or not setup properly hostnames in /etc/hosts file. Even IF this is the case my MapReduce job will fail whether the input is single file or directory(multiple files)

Below is the output from console.

INFO input.FileInputFormat: Total input paths to process : 2
WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
WARN snappy.LoadSnappy: Snappy native library not loaded
INFO mapred.JobClient: Running job: job_201505201700_0005
INFO mapred.JobClient:  map 0% reduce 0%
INFO mapred.JobClient:  map 50% reduce 0%
INFO mapred.JobClient:  map 100% reduce 0%
INFO mapred.JobClient:  map 100% reduce 16%

INFO mapred.JobClient:  map 100% reduce 0%
INFO mapred.JobClient: Task Id : attempt_201505201700_0005_r_000000_0, Status : FAILED
    Shuffle Error: Exceeded MAX_FAILED_UNIQUE_FETCHES; bailing-out.
WARN mapred.JobClient: Error reading task outputAMR-DEV02.local
WARN mapred.JobClient: Error reading task outputAMR-DEV02.local
INFO mapred.JobClient:  map 100% reduce 16%
INFO mapred.JobClient:  map 100% reduce 0%
INFO mapred.JobClient: Task Id : attempt_201505201700_0005_r_000000_1, Status : FAILED
    Shuffle Error: Exceeded MAX_FAILED_UNIQUE_FETCHES; bailing-out.
WARN mapred.JobClient: Error reading task outputEmbeddedQASrv.local
WARN mapred.JobClient: Error reading task outputEmbeddedQASrv.local
INFO mapred.JobClient:  map 100% reduce 16%

 

Note. EmbeddedQASrv.local(ip address. 192.168.115.80) and AMR-DEV02.local(ip address. 192.168.115.79) are my slave node host names.

My Hadoop cluster is consisting of 1 Master and 2 Slaves.

This is the command I am running from console.(emp_dept_data is a directory contains empData and deptData files)

hadoop jar testdata/joindevice.jar JoinDevice emp_dept_data output15

However, If i run this command MapReduce job gets successed(single file as input)

hadoop jar testdata/joindevice.jar JoinDevice emp_dept_data/empData output16

Here is my /etc/hosts file entry set up Master node. However same entry's were copied to my slave nodes also.

127.0.0.1               amr-dev01.local amr-dev01 localhost
::1             localhost6.localdomain6 localhost6
#Hadoop Configurations
192.168.115.78    master
192.168.115.79    slave01
192.168.115.80     slave02

I am clueless for what is wrong and where to check for exact root cause.

2 REPLIES 2

Re: Failed to run MapReduce job when specifying more than one input file

Cloudera Employee
The error comes from the shuffle - it looks like the copy from map-to-reduce is failing. Could you look into your NodeManager logs for more details on the shuffle error?

Single vs multiple split might be a red herring. When running with a single split, it is possible there is a single map task and may be the copy from mapper to reducer is local and hence succeeds.
Karthik Kambatla
Software Engineer, Cloudera Inc.
Highlighted

Re: Failed to run MapReduce job when specifying more than one input file

New Contributor

Thanks Karthik. 

 


@kasha wrote:
The error comes from the shuffle - it looks like the copy from map-to-reduce is failing. Could you look into your NodeManager logs for more details on the shuffle error?

Single vs multiple split might be a red herring. When running with a single split, it is possible there is a single map task and may be the copy from mapper to reducer is local and hence succeeds.


Yes, the actual problem was with /etc/hosts file. I commented my local host configuration. 

#127.0.0.1 MR-DEV02 localhost

and Instead of specifying different names like master, slave01, slave02...I used same hostnames

Don't have an account?
Coming from Hortonworks? Activate your account here