About GeeKay2015

GeeKay2015 · ‎01-02-2016

Try starting the pig with -useHCatalog as below pig -useHCatalog -f yourscript.pig this can be used for running in the terminal, or for calling a script. or Specify the location of the HCatalog jar and add a REGISTER statement with the path of the jar to the top of your script as below REGISTER /home/user/Installations/hive-0.11.0-bin/hcatalog/share/hcatalog/hcatalog-core-0.11.0.jar; Please note your path may be different.

GeeKay2015 · ‎12-30-2015

@Nilesh Shrimant There is an issue with renaming partitions in hive 0.13 which is fixed in hive 0.14 I guess. Possible workaround - Setting fs.hdfs.impl.disable.cache=false and fs.file.impl.disable.cache=false Refer https://issues.apache.org/jira/browse/HIVE-7623 for more details.

GeeKay2015 · ‎12-29-2015

Thank Chris

GeeKay2015 · ‎12-29-2015

Thanks Chris!!

GeeKay2015 · ‎12-29-2015

Thanks Pradeep!

GeeKay2015 · ‎12-28-2015

is it the NodeManager? Is it the ApplicationMaster? As per the Hadoop Definitive guide- 1. The client, which submits the MapReduce Job 2.The YARN Resource manager, which coordinates the allocation of the compute resources on the cluster 3.The YARN Node Managers, which launch and monitor the compute containers on machines on the cluster 4. The MapReduce ApplicationMaster, which co-ordinates the task running the MapReduce job. The application master and the MapReduce tasks run in container that are scheduled by RM and managed by NM. From what I understand it should the Node Manager but I am not sure. Can anyone clarify on this please. Thanks.

GeeKay2015 · ‎12-28-2015

We can use the org.apache.hadoop.fs.FileStatus class to get the file metadata like Block size, File permissions & Ownership, replication factor. Can we get the same metadata using org.apache.hadoop.fs.FileSystem class. If yes whats the difference between FileSystem and FileSatus class? Thanks!

GeeKay2015 · ‎12-28-2015

As per the The Definitive Guide- Mapper as in the Map task spawned by the Tasktracker in a separate JVM to process an input split. ( all of it ). For TextInputFormat , this would be a specific number of lines from your input file. Map method that is called for every record(key-value pair) in the split. Mapper.map(...) . In case of TextInputFormat, each map method (invocation)will process a line in your input split With the above consideration the TaskTracker spawns a new Mapper for each input split. But if you look at the Mapper class code- public class MaxTemperatureMapper extends Mapper<LongWritable, Text, Text, IntWritable> { It means the Mapper class/object will take one key/value pair each time, when this k/v pair is been processed, the class/object is done, it is finished. Next k/v pair will be processed by another Mapper, a new class/object. For Example, Think of 64MB block size contains 1000 records(key-value pairs). does the framework creates 1000 mapper here or just a single mapper. This is little confusing. Can any one highlight more on whats exactly happens in this case. Thanks in advance.

GeeKay2015 · ‎12-10-2015

Thanks Brandon.It worked.

GeeKay2015 · ‎12-09-2015

I am trying to run a MapReduce job using(using new API) on Hadoop 2.7.1 using command line. I have followed the below steps. javac -cp `hadoop classpath`MaxTemperatureWithCompression.java -d /Users/gangadharkadam/hadoopdata/build jar -cvf MaxTemperatureWithCompression.jar /Users/gangadharkadam/hadoopdata/build hadoop jar MaxTemperatureWithCompression.jar org.myorg.MaxTemperatureWithCompression user/ncdc/input /user/ncdc/output No error in compiling and creating a jar file. But on execution I am gettign the folowing error Error Messages- Exception in thread "main" java.lang.ClassNotFoundException: org.myorg.MaxTemperatureWithCompression at java.net.URLClassLoader$1.run(URLClassLoader.java:366) at java.net.URLClassLoader$1.run(URLClassLoader.java:355) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:354) at java.lang.ClassLoader.loadClass(ClassLoader.java:425) at java.lang.ClassLoader.loadClass(ClassLoader.java:358) at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Class.java:274) at org.apache.hadoop.util.RunJar.run(RunJar.java:214) at org.apache.hadoop.util.RunJar.main(RunJar.java:136) Java Code- package org.myorg; //Standard Java Classes import java.io.IOException; import java.util.regex.Pattern; //extends the class Configured, and implements the Tool utility class import org.apache.hadoop.conf.Configured; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.util.Tool; import org.apache.hadoop.util.ToolRunner; import org.apache.hadoop.util.GenericOptionsParser; //send debugging messages from inside the mapper and reducer classes import org.apache.log4j.Logger; //Job class in order to create, configure, and run an instance of your MapReduce import org.apache.hadoop.mapreduce.Job; //extend the Mapper class with your own Map class and add your own processing instructions import org.apache.hadoop.mapreduce.Mapper; //extend it to create and customize your own Reduce class import org.apache.hadoop.mapreduce.Reducer; //Path class to access files in HDFS import org.apache.hadoop.fs.Path; import org.apache.hadoop.fs.FileSystem; //pass required paths using the FileInputFormat and FileOutputFormat classes import org.apache.hadoop.mapreduce.lib.input.FileInputFormat; import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat; //Writable objects for writing, reading,and comparing values during map and reduce processing import org.apache.hadoop.io.IntWritable; import org.apache.hadoop.io.LongWritable; import org.apache.hadoop.io.Text; import org.apache.hadoop.io.compress.GzipCodec; public class MaxTemperatureWithCompression extends Configured implements Tool { private static final Logger LOG = Logger.getLogger(MaxTemperatureWithCompression.class); //main menhod to invoke the toolrunner to create instance of MaxTemperatureWithCompression public static void main(String[] args) throws Exception { int res = ToolRunner.run(new MaxTemperatureWithCompression(), args); System.exit(res); } //call the run method to configure the job public int run(String[] args) throws Exception { if (args.length != 2) { System.err.println("Usage: MaxTemperatureWithCompression <input path> " + "<output path>"); System.exit(-1); } Job job = Job.getInstance(getConf(), "MaxTemperatureWithCompression"); //set the jar to use based on the class job.setJarByClass(MaxTemperatureWithCompression.class); //set the input and output path FileInputFormat.addInputPath(job, new Path(args[0])); FileOutputFormat.setOutputPath(job, new Path(args[1])); //set the output key and value job.setOutputKeyClass(Text.class); job.setOutputValueClass(IntWritable.class); //set the compressionformat /*[*/FileOutputFormat.setCompressOutput(job, true); FileOutputFormat.setOutputCompressorClass(job, GzipCodec.class);/*]*/ //set the mapper and reducer class job.setMapperClass(Map.class); job.setCombinerClass(Reduce.class); job.setReducerClass(Reduce.class); return job.waitForCompletion(true) ? 0 : 1; } //mapper public static class Map extends Mapper<LongWritable, Text, Text, IntWritable> { private static final int MISSING = 9999; @Override public void map(LongWritable key, Text value, Context context) throws IOException,InterruptedException { String line = value.toString(); String year = line.substring(15,19); int airTemperature; if (line.charAt(87) == '+') { airTemperature = Integer.parseInt(line.substring(88, 92)); } else { airTemperature = Integer.parseInt(line.substring(87, 92)); } String quality = line.substring(92,93); if (airTemperature != MISSING && quality.matches("[01459]")) { context.write(new Text(year), new IntWritable(airTemperature)); } } } //reducer public static class Reduce extends Reducer<Text, IntWritable, Text, IntWritable> { @Override public void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException { int maxValue = Integer.MIN_VALUE; for (IntWritable value : values) { maxValue = Math.max(maxValue, value.get()); } context.write(key, new IntWritable(maxValue)); } } } I checked jar file and the folder structure org/myorg/MaxTemperatureWithCompression.class is present. What could be the reason for this error. Any help in resolving this is highly apprciated. Thanks.

Online	Offline
Last Visited	‎02-02-2016 11:58 PM

Member Since	‎05-28-2015 02:28 PM
Last Visited	‎02-02-2016 11:58 PM
Posts	47
Kudos received	28

Cloudera Community

Re: When to go with ETL on Hive using Tez VS When...

Re: Hadoop Services are not starting up after succ...

Re: Can Hive avro tables support changing schemas?

Re: Pig ParquetStorer is not working

Re: Spark-sql command line in cluster mode on Sand...

Re: HCatLoader() Error in Load Pig Statement

Re: How to rename partition value in Hive?

Re: Which MRv2/YARN daemons is the one responsibl...

Re: Can we get file metadata using org.apache.hado...

Re: Does the TaskTracker spawns a new Mapper for ...

Which MRv2/YARN daemons is the one responsible fo...

Can we get file metadata using org.apache.hadoop.f...

Does the TaskTracker spawns a new Mapper for each...

Re: Exception in thread "main" java.lang.ClassNotF...

Exception in thread "main" java.lang.ClassNotFound...