Member since
09-23-2015
1
Post
0
Kudos Received
0
Solutions
09-23-2015
11:13 AM
Hi All, I am newbie when it comes to Hadooop and MR. I liked cloudera b/c I have a VM that I can start off and learn and go deep in to the stack. I was able to run the WordCount example and also my own MR code that does exactly the same. The issues happened when I added a new MR code to do the following, 1. URL to fetch data is in the input file 2. During map call, I use the URL provided and fetch data ( returned data is JSON ) and set it to the Output collection. The url provided is a REST Api. 3. No need for Reduce job so just set the number of reduce tasks to 0. When I run the MR job with this code. I get the following error, mapreduce.Job: Task Id : attempt_1443023429055_0003_m_000000_0, Status : FAILED Error: INSTANCE I googled and researched and found that it might be a memory issues where JVM is not getting enough. So, i increased the memory for the mapreduce job in the mapred-site.xml with the following property, <property>
<name>mapreduce.map.java.opts</name>
<value>-Xmx2048M</value>
</property> restarted the hadoop services and ran the job again. I get the same error as above. I spent quite bit of time trying to find a solution and couldn't. I need help to resolve the issue, I think issues like this is already resolved, may be I not finding the right blog, post that actually guides me for a resolution. Please see the MR code below and let me know your feedback. package com.test.prototype;
import java.io.BufferedReader;
import java.io.IOException;
import java.io.InputStreamReader;
import java.net.URI;
import java.util.Calendar;
import java.util.HashMap;
import java.util.Iterator;
import java.util.Random;
import java.util.StringTokenizer;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.*;
import org.apache.hadoop.mapred.*;
import org.apache.hadoop.mapreduce.Cluster;
import org.apache.hadoop.mapreduce.MRJobConfig;
import org.apache.http.client.methods.CloseableHttpResponse;
import org.apache.http.client.methods.HttpGet;
import org.apache.http.impl.client.CloseableHttpClient;
import org.apache.http.impl.client.HttpClients;
import org.codehaus.jackson.JsonNode;
import org.codehaus.jackson.map.DeserializationConfig;
import org.codehaus.jackson.map.ObjectMapper;
public class OnboardData
{
public static class UrlMap extends MapReduceBase implements Mapper<LongWritable,
Text,
IntWritable,
Text>
{
public void map(LongWritable key, Text value, OutputCollector<IntWritable, Text> output, Reporter reporter) //Object key, Text value, OutputCollector<Text, Text> output, Reporter reporter)
throws IOException
{
System.out.println(value.toString());
System.out.println("Key = " + key.toString());
reporter.setStatus("Value = " + value.toString());
reporter.setStatus("Key = "+ key.toString());
IntWritable one = new IntWritable(new Random().nextInt(Calendar.getInstance().get(Calendar.SECOND)));
IDataProvider provider = new HttpDataProvider();
HashMap configMap = new HashMap();
StringBuffer retBuf = new StringBuffer();
configMap.put("url", value.toString());
try
{
String retString = provider.getData(configMap);
reporter.setStatus(retString);
retBuf.append(retString);
output.collect(one, new Text(retBuf.toString()));
}
catch (DataProviderException e)
{
reporter.setStatus(e.getMessage());
// TODO Auto-generated catch block
throw new IOException(e.getMessage());
}
}
}
public static void main(String[] args) throws Exception
{
JobConf conf = new JobConf(OnboardData.class);
conf.setJobName("HttpClient");
conf.setOutputKeyClass(IntWritable.class);
conf.setOutputValueClass(Text.class);
conf.setMapperClass(UrlMap.class);
conf.setNumReduceTasks(0);
conf.setMapDebugScript("/bin/echo");
//conf.set(JobContext.REDUCE_DEBUG_SCRIPT,"/bin/echo");
// conf.setCombinerClass(Reduce.class);
// conf.setReducerClass(Reduce.class);
conf.setInputFormat(TextInputFormat.class);
conf.setOutputFormat(TextOutputFormat.class);
System.out.println(args[0]);
System.out.println(args[1]);
System.out.println(conf.toString());
System.out.println(conf.getOutputValueClass().toString());
FileInputFormat.setInputPaths(conf, new Path(args[0]));
FileOutputFormat.setOutputPath(conf, new Path(args[1]));
JobClient.runJob(conf);
}
} Here is the console output, Configuration: core-default.xml, core-site.xml, mapred-default.xml, mapred-site.xml, yarn-default.xml, yarn-site.xml
class org.apache.hadoop.io.Text
15/09/23 09:18:29 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
15/09/23 09:18:30 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
15/09/23 09:18:31 WARN mapreduce.JobSubmitter: Hadoop command-line option parsing not performed. Implement the Tool interface and execute your application with ToolRunner to remedy this.
15/09/23 09:18:32 INFO mapred.FileInputFormat: Total input paths to process : 1
15/09/23 09:18:32 INFO mapreduce.JobSubmitter: number of splits:2
15/09/23 09:18:33 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1443023429055_0003
15/09/23 09:18:34 INFO impl.YarnClientImpl: Submitted application application_1443023429055_0003
15/09/23 09:18:34 INFO mapreduce.Job: The url to track the job: http://quickstart.cloudera:8088/proxy/application_1443023429055_0003/
15/09/23 09:18:34 INFO mapreduce.Job: Running job: job_1443023429055_0003
15/09/23 09:18:58 INFO mapreduce.Job: Job job_1443023429055_0003 running in uber mode : false
15/09/23 09:18:58 INFO mapreduce.Job: map 0% reduce 0%
15/09/23 09:19:41 INFO mapreduce.Job: map 100% reduce 0%
15/09/23 09:19:41 INFO mapreduce.Job: Task Id : attempt_1443023429055_0003_m_000000_0, Status : FAILED
Error: INSTANCE
15/09/23 09:19:42 INFO mapreduce.Job: map 50% reduce 0%
15/09/23 09:20:07 INFO mapreduce.Job: map 100% reduce 0%
15/09/23 09:20:07 INFO mapreduce.Job: Task Id : attempt_1443023429055_0003_m_000000_1, Status : FAILED
Error: INSTANCE
15/09/23 09:20:08 INFO mapreduce.Job: map 50% reduce 0%
15/09/23 09:20:17 INFO mapreduce.Job: Task Id : attempt_1443023429055_0003_m_000000_2, Status : FAILED
Error: INSTANCE
15/09/23 09:20:40 INFO mapreduce.Job: map 100% reduce 0%
15/09/23 09:20:41 INFO mapreduce.Job: Job job_1443023429055_0003 failed with state FAILED due to: Task failed task_1443023429055_0003_m_000000
Job failed as tasks failed. failedMaps:1 failedReduces:0
15/09/23 09:20:42 INFO mapreduce.Job: Counters: 32
File System Counters
FILE: Number of bytes read=0
FILE: Number of bytes written=110664
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=171
HDFS: Number of bytes written=0
HDFS: Number of read operations=5
HDFS: Number of large read operations=0
HDFS: Number of write operations=2
Job Counters
Failed map tasks=4
Launched map tasks=6
Other local map tasks=4
Data-local map tasks=2
Total time spent by all maps in occupied slots (ms)=431952
Total time spent by all reduces in occupied slots (ms)=0
Total time spent by all map tasks (ms)=143984
Total vcore-seconds taken by all map tasks=143984
Total megabyte-seconds taken by all map tasks=368599040
Map-Reduce Framework
Map input records=0
Map output records=0
Input split bytes=125
Spilled Records=0
Failed Shuffles=0
Merged Map outputs=0
GC time elapsed (ms)=427
CPU time spent (ms)=1100
Physical memory (bytes) snapshot=134017024
Virtual memory (bytes) snapshot=2595618816
Total committed heap usage (bytes)=92602368
File Input Format Counters
Bytes Read=46
File Output Format Counters
Bytes Written=0
Exception in thread "main" java.io.IOException: Job failed!
at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:838)
at com.test.protoypes.OnboardData.main(OnboardData.java:216)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
at org.apache.hadoop.util.RunJar.main(RunJar.java:136) Thanks, -Shiv
... View more
09-23-2015
10:48 AM
Hi, My name is Shiv, I work as a Architect. We are just started to look at Big Data stack for our Analytics Business cases. I downloaded cloudera starter VM and exploring it. Excited to learn and understand how cloudera works and how we can use it. Thanks!
... View more