About aervits

aervits · ‎02-29-2016

@Cassandra Spencer some checks here https://community.hortonworks.com/questions/2133/facilitating-hdp-cluster-hostname-change.html also this there's also this handy util https://github.com/u39kun/ambari-util and this blog post is pretty elaborate http://www.swiss-scalability.com/2015/01/rename-host-in-ambari-170.html

aervits · ‎02-29-2016

@sachin gupta you need to include hadoop jars when you build it. Pass the parameter below with your build statement -Phadoop-2 whole command would look like mvn clean install -DskipTests -Phadoop-2 Why are you building 0.13 release? Please refer to the docs here https://cwiki.apache.org/confluence/display/Hive/HiveDeveloperFAQ

aervits · ‎02-29-2016

@Saurabh Kumar only use the link I provided to double check your values, for all values refer to our docs as you did. Did you read this paragraph clearly from the blog? "The other alternative is to configure the client with both service ids and make it aware of the way to identify the active NameNode of both clusters. For this you would need to define a custom configuration you are only going to use for distcp. The hdfs client can be configured to point to that config like this" create a custom xml file and pass it to hadoop disctp command every time you want to distcp. Don't use that config as your global config for hdfs. Revert back the configuration to previous in Ambari and create a custom hdfs-site.xml in your user directory, pass it to hadoop distcp and report results back.

aervits · ‎02-29-2016

please see this blog and double check your values http://henning.kropponline.de/2015/03/15/distcp-two-ha-cluster/

aervits · ‎02-28-2016

Please refer to the following articles https://hadoop.apache.org/docs/r2.7.1/hadoop-aws/tools/hadoop-aws/index.htm https://cwiki.apache.org/confluence/display/Hive/HiveAws+HivingS3nRemotely

aervits · ‎02-28-2016

There are four modes available, mapreduce, local, tez and tez_local. You pass the mode with -x switch. With version 0.15 of pig, tez and tez_local are recommended for better performance, in tez modes pig uses tez execution engine and by default it is still mapreduce mode, in which pig uses MR as execution engine. At some point tez will be a default engine.

aervits · ‎02-28-2016

@Revathy Mourouguessane excellent question, it's been a while since I'd touched MR and learned something new (KeyValueTextInputFormat). So firstly, assuming your data looks like this http://url12.com 36 http://url11.com 4 http://url20.com 36 http://url1.com 256 http://url1.com 267 KeyValueInputFormat class states the following An InputFormat for plain text files. Files are broken into lines. Either linefeed or carriage-return are used to signal end of line. Each line is divided into key and value parts by a separator byte. If no such a byte exists, the key will be the entire line and value will be empty. You did not specify a separator in your job configuration. I made a few changes to your code package com.hortonworks.mapreduce; /** * * @author aervits */ import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.conf.Configured; import org.apache.hadoop.fs.Path; import org.apache.hadoop.io.IntWritable; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapreduce.Job; import org.apache.hadoop.mapreduce.lib.input.FileInputFormat; import org.apache.hadoop.mapreduce.lib.input.KeyValueTextInputFormat; import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat; import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat; import org.apache.hadoop.util.Tool; import org.apache.hadoop.util.ToolRunner; public class URLCount extends Configured implements Tool { public static void main(String[] args) throws Exception { int res = ToolRunner.run(new Configuration(), new URLCount(), args); System.exit(res); } @Override public int run(String[] args) throws Exception { Configuration conf = this.getConf(); conf.set("mapreduce.input.keyvaluelinerecordreader.key.value.separator", " "); Job job = Job.getInstance(conf, "URLCount"); job.setJarByClass(getClass()); job.setInputFormatClass(KeyValueTextInputFormat.class); job.setOutputFormatClass(TextOutputFormat.class); job.setMapperClass(URLCountM.class); job.setReducerClass(URLCountR.class); job.setMapOutputKeyClass(Text.class); job.setMapOutputValueClass(IntWritable.class); job.setOutputKeyClass(IntWritable.class); job.setOutputValueClass(Text.class); FileInputFormat.addInputPath(job, new Path(args[0])); FileOutputFormat.setOutputPath(job, new Path(args[1])); return (job.waitForCompletion(true) == true ? 0 : -1); } } notice in your tool runner, you don't terminate the job with expected result, then in your run method, I added setOutputKeyClass, setOutputValueClass, setMapOutputKeyClass and setMapOutputValueClass, I also set separator config, changed LongWritable to IntWritable. What does that mean? Well, not setting the separator means KeyValueTextInputFormat will treat the whole line as string and you don't receive any Value from Mapper function. So you'd think you're not getting results from reducer but in reality you weren't passing anything from mapper to reducer in the first place. Moving on, package com.hortonworks.mapreduce; import java.io.IOException; import java.util.logging.Level; import java.util.logging.Logger; import org.apache.hadoop.mapreduce.Mapper; import org.apache.hadoop.io.IntWritable; import org.apache.hadoop.io.Text; /** * * @author aervits */ public class URLCountM extends Mapper<Text, Text,Text, IntWritable> { private static final Logger LOG = Logger.getLogger(URLCountM.class.getName()); public final IntWritable iw = new IntWritable(); @Override public void map(Text key, Text value, Context context){ try{ LOG.log(Level.INFO, "MAP_KEY: ".concat(key.toString()).concat(" MAP_VALUE: ".concat(value.toString()))); context.write(key, new IntWritable(Integer.valueOf(value.toString()))); } catch(NumberFormatException | IOException | InterruptedException e){ LOG.log(Level.SEVERE, "ERROR: ".concat(e.toString())); } } } notice I added logger, this is a better way of printing out to log expected keys and values package com.hortonworks.mapreduce; import java.io.IOException; import java.util.logging.Level; import java.util.logging.Logger; import org.apache.hadoop.io.IntWritable; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapreduce.Reducer; /** * * @author aervits */ public class URLCountR extends Reducer<Text, IntWritable, Text, IntWritable> { private static final Logger LOG = Logger.getLogger(URLCountR.class.getName()); private IntWritable result = new IntWritable(); @Override public void reduce(Text key, Iterable<IntWritable> values, Context context ) throws IOException, InterruptedException { int sum = 0; for (IntWritable val : values) { sum += val.get(); } result.set(sum); LOG.log(Level.INFO, "REDUCER_VALUE: ".concat(result.toString())); context.write(key, result); } } notice I'm printing to log again in reducer, just to keep sanity, below is what it looks like in the logs. finally the result looks like so http://url1.com 523 http://url11.com 4 http://url12.com 36 http://url20.com 36 finally, I published the code to my repo, grab it if you need a working example, I compiled it with HDP specific versions of Hadoop. Using our repositories vs Apache is recommended. Take a look at my pom.xml. https://github.com/dbist/URLCount One final hint would be to look at the statistics being printed out after job completes. I realized you were not sending data to Reducer when I ran your code and saw 0 records from mapper. This is what it should look like having only 5 lines of input data. Map-Reduce Framework Map input records=5 Map output records=5 Map output bytes=103

aervits · ‎02-27-2016

Only apply patches if necessary and instructed by support. In case you don't have a support contract, here's Pivotal instructions to patch Ambari, we don't provide steps due to the reasons above. http://hawq.docs.pivotal.io/docs-hawq/topics/hdp-prerequisites.html Needless to say its at your own risk.

aervits · ‎02-26-2016

I would contact support as this is not trivial upgrade.

aervits · ‎02-26-2016

@Zack Riesland I tagged to notify docs to add this. Thanks for opening the thread!

Online	Offline
Last Visited	‎08-15-2019 06:35 AM

Member Since	‎10-01-2015 11:46 AM
Last Visited	‎08-15-2019 06:35 AM
Posts	3,933
Kudos received	1074

Cloudera Community

Re: Where can I get latest resource_management.c...

Re: How to Kerberize Flume?

Re: Load Hive Table form Pig Output File.

Re: HDP 2.6 Cluster Issues with Hive Metastore

Re: which HDP release will storm 1.1.0 be packaged...

Re: What is the recommended method for changing ho...

Re: How to build hive source code from hortonworks...

Re: Distcp is failing in HA

Re: Distcp is failing in HA

Re: Making hive default to s3

Re: Does pig uses mapreduce in backend in tezmode?

Re: MapReduce: 0 records written from Reducer

Re: Many Ambari "stale alerts" messages

Re: Issues with YARN classpath after manual upgrad...

Re: Help after moving Ambari metrics server