Member since
05-28-2015
47
Posts
28
Kudos Received
7
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
5904 | 06-20-2016 04:00 PM | |
10617 | 01-16-2016 03:15 PM | |
11210 | 01-16-2016 05:06 AM | |
5182 | 01-14-2016 06:45 PM | |
2918 | 01-14-2016 01:56 AM |
01-02-2016
03:25 AM
1 Kudo
Try starting the pig with -useHCatalog as below pig -useHCatalog -f yourscript.pig this can be used for running in the terminal, or for calling a script. or Specify the location of the HCatalog jar and add a REGISTER statement with the path of the jar to the top of your script as below REGISTER /home/user/Installations/hive-0.11.0-bin/hcatalog/share/hcatalog/hcatalog-core-0.11.0.jar; Please note your path may be different.
... View more
12-30-2015
04:45 PM
2 Kudos
@Nilesh Shrimant There is an issue with renaming partitions in hive 0.13 which is fixed in hive 0.14 I guess. Possible workaround - Setting fs.hdfs.impl.disable.cache=false and fs.file.impl.disable.cache=false Refer https://issues.apache.org/jira/browse/HIVE-7623 for more details.
... View more
12-29-2015
01:28 AM
Thank Chris
... View more
12-29-2015
01:25 AM
Thanks Chris!!
... View more
12-29-2015
01:18 AM
Thanks Pradeep!
... View more
12-28-2015
09:53 PM
is it the NodeManager? Is it the ApplicationMaster? As per the Hadoop Definitive guide- 1. The client, which submits the MapReduce Job 2.The YARN Resource manager, which coordinates the allocation of the compute resources on the cluster 3.The YARN Node Managers, which launch and monitor the compute containers on machines on the cluster 4. The MapReduce ApplicationMaster, which co-ordinates the task running the MapReduce job. The application master and the MapReduce tasks run in container that are scheduled by RM and managed by NM. From what I understand it should the Node Manager but I am not sure. Can anyone clarify on this please. Thanks.
... View more
Labels:
- Labels:
-
Apache Hadoop
-
Apache YARN
12-28-2015
08:24 PM
We can use the org.apache.hadoop.fs.FileStatus class to get the file metadata like Block size, File permissions & Ownership, replication factor. Can we get the same metadata using org.apache.hadoop.fs.FileSystem class. If yes whats the difference between FileSystem and FileSatus class? Thanks!
... View more
Labels:
- Labels:
-
Apache Hadoop
12-28-2015
07:58 PM
As per the The Definitive Guide- Mapper as in the Map task spawned by the Tasktracker in a separate JVM to process an input split. ( all of it ). For TextInputFormat , this would be a specific number of lines from your input file. Map method that is called for every record(key-value pair) in the split. Mapper.map(...) . In case of TextInputFormat, each map method (invocation)will process a line in your input split With the above consideration the TaskTracker spawns a new Mapper for each input split. But if you look at the Mapper class code- public class MaxTemperatureMapper
extends Mapper<LongWritable, Text, Text, IntWritable> {
It means the Mapper class/object will take one key/value pair each time, when this k/v pair is been processed, the class/object is done, it is finished. Next k/v pair will be processed by another Mapper, a new class/object. For Example, Think of 64MB block size contains 1000 records(key-value pairs). does the framework creates 1000 mapper here or just a single mapper. This is little confusing. Can any one highlight more on whats exactly happens in this case. Thanks in advance.
... View more
Labels:
- Labels:
-
Apache Hadoop
-
Apache YARN
12-10-2015
04:34 AM
2 Kudos
Thanks Brandon.It worked.
... View more
12-09-2015
03:41 PM
1 Kudo
I am trying to run a MapReduce job using(using new API) on Hadoop 2.7.1 using command line. I have followed the below steps. javac -cp `hadoop classpath`MaxTemperatureWithCompression.java -d /Users/gangadharkadam/hadoopdata/build
jar -cvf MaxTemperatureWithCompression.jar /Users/gangadharkadam/hadoopdata/build
hadoop jar MaxTemperatureWithCompression.jar org.myorg.MaxTemperatureWithCompression user/ncdc/input /user/ncdc/output No error in compiling and creating a jar file. But on execution I am gettign the folowing error Error Messages- Exception in thread "main" java.lang.ClassNotFoundException: org.myorg.MaxTemperatureWithCompression at java.net.URLClassLoader$1.run(URLClassLoader.java:366) at java.net.URLClassLoader$1.run(URLClassLoader.java:355) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:354) at java.lang.ClassLoader.loadClass(ClassLoader.java:425) at java.lang.ClassLoader.loadClass(ClassLoader.java:358) at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Class.java:274) at org.apache.hadoop.util.RunJar.run(RunJar.java:214) at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
Java Code-
package org.myorg;
//Standard Java Classes
import java.io.IOException;
import java.util.regex.Pattern;
//extends the class Configured, and implements the Tool utility class
import org.apache.hadoop.conf.Configured;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.util.Tool;
import org.apache.hadoop.util.ToolRunner;
import org.apache.hadoop.util.GenericOptionsParser;
//send debugging messages from inside the mapper and reducer classes
import org.apache.log4j.Logger;
//Job class in order to create, configure, and run an instance of your MapReduce
import org.apache.hadoop.mapreduce.Job;
//extend the Mapper class with your own Map class and add your own processing instructions
import org.apache.hadoop.mapreduce.Mapper;
//extend it to create and customize your own Reduce class
import org.apache.hadoop.mapreduce.Reducer;
//Path class to access files in HDFS
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.fs.FileSystem;
//pass required paths using the FileInputFormat and FileOutputFormat classes
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
//Writable objects for writing, reading,and comparing values during map and reduce processing
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.io.compress.GzipCodec;
public class MaxTemperatureWithCompression extends Configured implements Tool {
private static final Logger LOG = Logger.getLogger(MaxTemperatureWithCompression.class);
//main menhod to invoke the toolrunner to create instance of MaxTemperatureWithCompression
public static void main(String[] args) throws Exception {
int res = ToolRunner.run(new MaxTemperatureWithCompression(), args);
System.exit(res);
}
//call the run method to configure the job
public int run(String[] args) throws Exception {
if (args.length != 2) {
System.err.println("Usage: MaxTemperatureWithCompression <input path> " + "<output path>");
System.exit(-1);
}
Job job = Job.getInstance(getConf(), "MaxTemperatureWithCompression");
//set the jar to use based on the class
job.setJarByClass(MaxTemperatureWithCompression.class);
//set the input and output path
FileInputFormat.addInputPath(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));
//set the output key and value
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
//set the compressionformat
/*[*/FileOutputFormat.setCompressOutput(job, true);
FileOutputFormat.setOutputCompressorClass(job, GzipCodec.class);/*]*/
//set the mapper and reducer class
job.setMapperClass(Map.class);
job.setCombinerClass(Reduce.class);
job.setReducerClass(Reduce.class);
return job.waitForCompletion(true) ? 0 : 1;
}
//mapper
public static class Map extends Mapper<LongWritable, Text, Text, IntWritable> {
private static final int MISSING = 9999;
@Override
public void map(LongWritable key, Text value, Context context)
throws IOException,InterruptedException {
String line = value.toString();
String year = line.substring(15,19);
int airTemperature;
if (line.charAt(87) == '+') {
airTemperature = Integer.parseInt(line.substring(88, 92));
}
else {
airTemperature = Integer.parseInt(line.substring(87, 92));
}
String quality = line.substring(92,93);
if (airTemperature != MISSING && quality.matches("[01459]")) {
context.write(new Text(year), new IntWritable(airTemperature));
}
}
}
//reducer
public static class Reduce extends Reducer<Text, IntWritable, Text, IntWritable> {
@Override
public void reduce(Text key, Iterable<IntWritable> values, Context context)
throws IOException, InterruptedException {
int maxValue = Integer.MIN_VALUE;
for (IntWritable value : values) {
maxValue = Math.max(maxValue, value.get());
}
context.write(key, new IntWritable(maxValue));
}
}
}
I checked jar file and the folder structure org/myorg/MaxTemperatureWithCompression.class is present. What could be the reason for this error. Any help in resolving this is highly apprciated. Thanks.
... View more
Labels:
- Labels:
-
Apache Hadoop
-
Apache YARN
- « Previous
- Next »