Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

MR Job Error : Status : FAILED Error: INSTANCE

MR Job Error : Status : FAILED Error: INSTANCE

New Contributor

Hi All,

 

I am newbie when it comes to Hadooop and MR.  I liked cloudera b/c I have a VM that I can start off and learn and go deep in to the stack. 

 

I was able to run the WordCount example and also my own MR code that does exactly the same. The issues happened when I added a new MR code

to do the following,

 

1. URL to fetch data is in the input file

2. During map call, I use the URL provided and fetch data ( returned data is JSON ) and set it to the Output collection. The url provided is a REST Api. 

3. No need for Reduce job so just set the number of reduce tasks  to 0. 

 

When I run the MR job with this code. I get the following error,

 

mapreduce.Job: Task Id : attempt_1443023429055_0003_m_000000_0,

Status : FAILED
Error: INSTANCE

 

I googled and researched and found that it might be a memory issues where JVM is not getting enough. So, i increased the memory for the

mapreduce job in the mapred-site.xml with the following property,

 

  <property>
        <name>mapreduce.map.java.opts</name>
        <value>-Xmx2048M</value>
  </property>

restarted the hadoop services and ran the job again. I get the same error as above. I spent quite bit of time trying to find a solution and couldn't. 

 

I need help to resolve the issue, I think issues like this is already resolved, may be I not finding the right blog, post that actually guides me for a resolution. 

 

Please see the MR code below and let me know your feedback. 

 

package com.test.prototype;

import java.io.BufferedReader;
import java.io.IOException;
import java.io.InputStreamReader;
import java.net.URI;
import java.util.Calendar;
import java.util.HashMap;
import java.util.Iterator;
import java.util.Random;
import java.util.StringTokenizer;

import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.*;
import org.apache.hadoop.mapred.*;
import org.apache.hadoop.mapreduce.Cluster;
import org.apache.hadoop.mapreduce.MRJobConfig;
import org.apache.http.client.methods.CloseableHttpResponse;
import org.apache.http.client.methods.HttpGet;
import org.apache.http.impl.client.CloseableHttpClient;
import org.apache.http.impl.client.HttpClients;
import org.codehaus.jackson.JsonNode;
import org.codehaus.jackson.map.DeserializationConfig;
import org.codehaus.jackson.map.ObjectMapper;

public class OnboardData
{
	public static class UrlMap extends MapReduceBase implements Mapper<LongWritable,
																		 Text,
																		 IntWritable,
																		 Text>
	{
		public void map(LongWritable key, Text value, OutputCollector<IntWritable, Text> output, Reporter reporter)  //Object key, Text value, OutputCollector<Text, Text> output, Reporter reporter)
				throws IOException
		{
			System.out.println(value.toString());
			System.out.println("Key = " + key.toString());
			reporter.setStatus("Value = " + value.toString());
			reporter.setStatus("Key = "+ key.toString());
			
			IntWritable one = new IntWritable(new Random().nextInt(Calendar.getInstance().get(Calendar.SECOND)));
			
			IDataProvider provider = new HttpDataProvider();
			
			HashMap configMap = new HashMap();
			StringBuffer retBuf = new StringBuffer();
			
			configMap.put("url", value.toString());
			try
			{
				String retString = provider.getData(configMap);
				reporter.setStatus(retString);
				retBuf.append(retString);
				output.collect(one, new Text(retBuf.toString()));
			}
			catch (DataProviderException e)
			{
				reporter.setStatus(e.getMessage());
				// TODO Auto-generated catch block
				throw new IOException(e.getMessage());
			}

			
			
		}

	}


	public static void main(String[] args) throws Exception
	{
		JobConf conf = new JobConf(OnboardData.class);
		conf.setJobName("HttpClient");
		conf.setOutputKeyClass(IntWritable.class);
		conf.setOutputValueClass(Text.class);
		conf.setMapperClass(UrlMap.class);
		conf.setNumReduceTasks(0);
		conf.setMapDebugScript("/bin/echo");
		//conf.set(JobContext.REDUCE_DEBUG_SCRIPT,"/bin/echo");
//		conf.setCombinerClass(Reduce.class);
//		conf.setReducerClass(Reduce.class);
		conf.setInputFormat(TextInputFormat.class);
		conf.setOutputFormat(TextOutputFormat.class);
		System.out.println(args[0]);
		System.out.println(args[1]);
		System.out.println(conf.toString());
		System.out.println(conf.getOutputValueClass().toString());
		FileInputFormat.setInputPaths(conf, new Path(args[0]));
		FileOutputFormat.setOutputPath(conf, new Path(args[1]));
		JobClient.runJob(conf);
	}
}

 Here is the console output,

 

Configuration: core-default.xml, core-site.xml, mapred-default.xml, mapred-site.xml, yarn-default.xml, yarn-site.xml
class org.apache.hadoop.io.Text
15/09/23 09:18:29 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
15/09/23 09:18:30 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
15/09/23 09:18:31 WARN mapreduce.JobSubmitter: Hadoop command-line option parsing not performed. Implement the Tool interface and execute your application with ToolRunner to remedy this.
15/09/23 09:18:32 INFO mapred.FileInputFormat: Total input paths to process : 1
15/09/23 09:18:32 INFO mapreduce.JobSubmitter: number of splits:2
15/09/23 09:18:33 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1443023429055_0003
15/09/23 09:18:34 INFO impl.YarnClientImpl: Submitted application application_1443023429055_0003
15/09/23 09:18:34 INFO mapreduce.Job: The url to track the job: http://quickstart.cloudera:8088/proxy/application_1443023429055_0003/
15/09/23 09:18:34 INFO mapreduce.Job: Running job: job_1443023429055_0003
15/09/23 09:18:58 INFO mapreduce.Job: Job job_1443023429055_0003 running in uber mode : false
15/09/23 09:18:58 INFO mapreduce.Job:  map 0% reduce 0%
15/09/23 09:19:41 INFO mapreduce.Job:  map 100% reduce 0%
15/09/23 09:19:41 INFO mapreduce.Job: Task Id : attempt_1443023429055_0003_m_000000_0, Status : FAILED
Error: INSTANCE
15/09/23 09:19:42 INFO mapreduce.Job:  map 50% reduce 0%
15/09/23 09:20:07 INFO mapreduce.Job:  map 100% reduce 0%
15/09/23 09:20:07 INFO mapreduce.Job: Task Id : attempt_1443023429055_0003_m_000000_1, Status : FAILED
Error: INSTANCE
15/09/23 09:20:08 INFO mapreduce.Job:  map 50% reduce 0%
15/09/23 09:20:17 INFO mapreduce.Job: Task Id : attempt_1443023429055_0003_m_000000_2, Status : FAILED
Error: INSTANCE
15/09/23 09:20:40 INFO mapreduce.Job:  map 100% reduce 0%
15/09/23 09:20:41 INFO mapreduce.Job: Job job_1443023429055_0003 failed with state FAILED due to: Task failed task_1443023429055_0003_m_000000
Job failed as tasks failed. failedMaps:1 failedReduces:0

15/09/23 09:20:42 INFO mapreduce.Job: Counters: 32
        File System Counters
                FILE: Number of bytes read=0
                FILE: Number of bytes written=110664
                FILE: Number of read operations=0
                FILE: Number of large read operations=0
                FILE: Number of write operations=0
                HDFS: Number of bytes read=171
                HDFS: Number of bytes written=0
                HDFS: Number of read operations=5
                HDFS: Number of large read operations=0
                HDFS: Number of write operations=2
        Job Counters
                Failed map tasks=4
                Launched map tasks=6
                Other local map tasks=4
                Data-local map tasks=2
                Total time spent by all maps in occupied slots (ms)=431952
                Total time spent by all reduces in occupied slots (ms)=0
                Total time spent by all map tasks (ms)=143984
                Total vcore-seconds taken by all map tasks=143984
                Total megabyte-seconds taken by all map tasks=368599040
        Map-Reduce Framework
                Map input records=0
                Map output records=0
                Input split bytes=125
                Spilled Records=0
                Failed Shuffles=0
                Merged Map outputs=0
                GC time elapsed (ms)=427
                CPU time spent (ms)=1100
                Physical memory (bytes) snapshot=134017024
                Virtual memory (bytes) snapshot=2595618816
                Total committed heap usage (bytes)=92602368
        File Input Format Counters
                Bytes Read=46
        File Output Format Counters
                Bytes Written=0
Exception in thread "main" java.io.IOException: Job failed!
        at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:838)
        at com.test.protoypes.OnboardData.main(OnboardData.java:216)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:606)
        at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
        at org.apache.hadoop.util.RunJar.main(RunJar.java:136)

 Thanks,

-Shiv

1 REPLY 1

Re: MR Job Error : Status : FAILED Error: INSTANCE

Master Guru
Do you have any task logs from the failures? Try visiting, as the log points to, the http://quickstart.cloudera:8088/proxy/application_1443023429055_0003/ URL in the VM browser, and browse into the Job's history to view the failed attempt logs (syslog/stderr/stdout) - these logs will carry a more precise/clear/direct message about what caused the failure, which can then be investigated.