MR Job Error : Status : FAILED Error: INSTANCE

Hi All,


I am newbie when it comes to Hadooop and MR.  I liked cloudera b/c I have a VM that I can start off and learn and go deep in to the stack. 


I was able to run the WordCount example and also my own MR code that does exactly the same. The issues happened when I added a new MR code

to do the following,


1. URL to fetch data is in the input file

2. During map call, I use the URL provided and fetch data ( returned data is JSON ) and set it to the Output collection. The url provided is a REST Api. 

3. No need for Reduce job so just set the number of reduce tasks  to 0. 


When I run the MR job with this code. I get the following error,


mapreduce.Job: Task Id : attempt_1443023429055_0003_m_000000_0,

Status : FAILED


I googled and researched and found that it might be a memory issues where JVM is not getting enough. So, i increased the memory for the

mapreduce job in the mapred-site.xml with the following property,



restarted the hadoop services and ran the job again. I get the same error as above. I spent quite bit of time trying to find a solution and couldn't. 


I need help to resolve the issue, I think issues like this is already resolved, may be I not finding the right blog, post that actually guides me for a resolution. 


Please see the MR code below and let me know your feedback. 


package com.test.prototype;

import java.util.Calendar;
import java.util.HashMap;
import java.util.Iterator;
import java.util.Random;
import java.util.StringTokenizer;

import org.apache.hadoop.fs.Path;
import org.apache.hadoop.mapred.*;
import org.apache.hadoop.mapreduce.Cluster;
import org.apache.hadoop.mapreduce.MRJobConfig;
import org.apache.http.client.methods.CloseableHttpResponse;
import org.apache.http.client.methods.HttpGet;
import org.apache.http.impl.client.CloseableHttpClient;
import org.apache.http.impl.client.HttpClients;
import org.codehaus.jackson.JsonNode;

public class OnboardData
	public static class UrlMap extends MapReduceBase implements Mapper<LongWritable,
		public void map(LongWritable key, Text value, OutputCollector<IntWritable, Text> output, Reporter reporter)  //Object key, Text value, OutputCollector<Text, Text> output, Reporter reporter)
				throws IOException
			System.out.println("Key = " + key.toString());
			reporter.setStatus("Value = " + value.toString());
			reporter.setStatus("Key = "+ key.toString());
			IntWritable one = new IntWritable(new Random().nextInt(Calendar.getInstance().get(Calendar.SECOND)));
			IDataProvider provider = new HttpDataProvider();
			HashMap configMap = new HashMap();
			StringBuffer retBuf = new StringBuffer();
			configMap.put("url", value.toString());
				String retString = provider.getData(configMap);
				output.collect(one, new Text(retBuf.toString()));
			catch (DataProviderException e)
				// TODO Auto-generated catch block
				throw new IOException(e.getMessage());



	public static void main(String[] args) throws Exception
		JobConf conf = new JobConf(OnboardData.class);
//		conf.setCombinerClass(Reduce.class);
//		conf.setReducerClass(Reduce.class);
		FileInputFormat.setInputPaths(conf, new Path(args[0]));
		FileOutputFormat.setOutputPath(conf, new Path(args[1]));

 Here is the console output,


Configuration: core-default.xml, core-site.xml, mapred-default.xml, mapred-site.xml, yarn-default.xml, yarn-site.xml
15/09/23 09:18:29 INFO client.RMProxy: Connecting to ResourceManager at /
15/09/23 09:18:30 INFO client.RMProxy: Connecting to ResourceManager at /
15/09/23 09:18:31 WARN mapreduce.JobSubmitter: Hadoop command-line option parsing not performed. Implement the Tool interface and execute your application with ToolRunner to remedy this.
15/09/23 09:18:32 INFO mapred.FileInputFormat: Total input paths to process : 1
15/09/23 09:18:32 INFO mapreduce.JobSubmitter: number of splits:2
15/09/23 09:18:33 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1443023429055_0003
15/09/23 09:18:34 INFO impl.YarnClientImpl: Submitted application application_1443023429055_0003
15/09/23 09:18:34 INFO mapreduce.Job: The url to track the job: http://quickstart.cloudera:8088/proxy/application_1443023429055_0003/
15/09/23 09:18:34 INFO mapreduce.Job: Running job: job_1443023429055_0003
15/09/23 09:18:58 INFO mapreduce.Job: Job job_1443023429055_0003 running in uber mode : false
15/09/23 09:18:58 INFO mapreduce.Job:  map 0% reduce 0%
15/09/23 09:19:41 INFO mapreduce.Job:  map 100% reduce 0%
15/09/23 09:19:41 INFO mapreduce.Job: Task Id : attempt_1443023429055_0003_m_000000_0, Status : FAILED
15/09/23 09:19:42 INFO mapreduce.Job:  map 50% reduce 0%
15/09/23 09:20:07 INFO mapreduce.Job:  map 100% reduce 0%
15/09/23 09:20:07 INFO mapreduce.Job: Task Id : attempt_1443023429055_0003_m_000000_1, Status : FAILED
15/09/23 09:20:08 INFO mapreduce.Job:  map 50% reduce 0%
15/09/23 09:20:17 INFO mapreduce.Job: Task Id : attempt_1443023429055_0003_m_000000_2, Status : FAILED
15/09/23 09:20:40 INFO mapreduce.Job:  map 100% reduce 0%
15/09/23 09:20:41 INFO mapreduce.Job: Job job_1443023429055_0003 failed with state FAILED due to: Task failed task_1443023429055_0003_m_000000
Job failed as tasks failed. failedMaps:1 failedReduces:0

15/09/23 09:20:42 INFO mapreduce.Job: Counters: 32
        File System Counters
                FILE: Number of bytes read=0
                FILE: Number of bytes written=110664
                FILE: Number of read operations=0
                FILE: Number of large read operations=0
                FILE: Number of write operations=0
                HDFS: Number of bytes read=171
                HDFS: Number of bytes written=0
                HDFS: Number of read operations=5
                HDFS: Number of large read operations=0
                HDFS: Number of write operations=2
        Job Counters
                Failed map tasks=4
                Launched map tasks=6
                Other local map tasks=4
                Data-local map tasks=2
                Total time spent by all maps in occupied slots (ms)=431952
                Total time spent by all reduces in occupied slots (ms)=0
                Total time spent by all map tasks (ms)=143984
                Total vcore-seconds taken by all map tasks=143984
                Total megabyte-seconds taken by all map tasks=368599040
        Map-Reduce Framework
                Map input records=0
                Map output records=0
                Input split bytes=125
                Spilled Records=0
                Failed Shuffles=0
                Merged Map outputs=0
                GC time elapsed (ms)=427
                CPU time spent (ms)=1100
                Physical memory (bytes) snapshot=134017024
                Virtual memory (bytes) snapshot=2595618816
                Total committed heap usage (bytes)=92602368
        File Input Format Counters
                Bytes Read=46
        File Output Format Counters
                Bytes Written=0
Exception in thread "main" Job failed!
        at org.apache.hadoop.mapred.JobClient.runJob(
        at com.test.protoypes.OnboardData.main(
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(
        at java.lang.reflect.Method.invoke(
        at org.apache.hadoop.util.RunJar.main(




Re: MR Job Error : Status : FAILED Error: INSTANCE

Master Guru
Do you have any task logs from the failures? Try visiting, as the log points to, the http://quickstart.cloudera:8088/proxy/application_1443023429055_0003/ URL in the VM browser, and browse into the Job's history to view the failed attempt logs (syslog/stderr/stdout) - these logs will carry a more precise/clear/direct message about what caused the failure, which can then be investigated.