Member since
10-01-2015
3933
Posts
1150
Kudos Received
374
Solutions
My Accepted Solutions
| Title | Views | Posted |
|---|---|---|
| 3654 | 05-03-2017 05:13 PM | |
| 3012 | 05-02-2017 08:38 AM | |
| 3270 | 05-02-2017 08:13 AM | |
| 3219 | 04-10-2017 10:51 PM | |
| 1684 | 03-28-2017 02:27 AM |
02-29-2016
02:55 PM
@Cassandra Spencer some checks here https://community.hortonworks.com/questions/2133/facilitating-hdp-cluster-hostname-change.html also this there's also this handy util https://github.com/u39kun/ambari-util and this blog post is pretty elaborate http://www.swiss-scalability.com/2015/01/rename-host-in-ambari-170.html
... View more
02-29-2016
02:47 PM
1 Kudo
@sachin gupta you need to include hadoop jars when you build it. Pass the parameter below with your build statement -Phadoop-2 whole command would look like mvn clean install -DskipTests -Phadoop-2 Why are you building 0.13 release? Please refer to the docs here https://cwiki.apache.org/confluence/display/Hive/HiveDeveloperFAQ
... View more
02-29-2016
01:23 PM
@Saurabh Kumar only use the link I provided to double check your values, for all values refer to our docs as you did. Did you read this paragraph clearly from the blog? "The other alternative is to configure the client with both service ids and make it aware of the way to identify the active NameNode of both clusters. For this you would need to define a custom configuration you are only going to use for distcp. The hdfs client can be configured to point to that config like this" create a custom xml file and pass it to hadoop disctp command every time you want to distcp. Don't use that config as your global config for hdfs. Revert back the configuration to previous in Ambari and create a custom hdfs-site.xml in your user directory, pass it to hadoop distcp and report results back.
... View more
02-29-2016
11:48 AM
please see this blog and double check your values http://henning.kropponline.de/2015/03/15/distcp-two-ha-cluster/
... View more
02-28-2016
09:49 PM
Please refer to the following articles https://hadoop.apache.org/docs/r2.7.1/hadoop-aws/tools/hadoop-aws/index.htm https://cwiki.apache.org/confluence/display/Hive/HiveAws+HivingS3nRemotely
... View more
02-28-2016
07:59 PM
2 Kudos
There are four modes available, mapreduce, local, tez and tez_local. You pass the mode with -x switch. With version 0.15 of pig, tez and tez_local are recommended for better performance, in tez modes pig uses tez execution engine and by default it is still mapreduce mode, in which pig uses MR as execution engine. At some point tez will be a default engine.
... View more
02-28-2016
02:13 AM
1 Kudo
@Revathy Mourouguessane excellent question, it's been a while since I'd touched MR and learned something new (KeyValueTextInputFormat). So firstly, assuming your data looks like this http://url12.com 36
http://url11.com 4
http://url20.com 36
http://url1.com 256
http://url1.com 267
KeyValueInputFormat class states the following An InputFormat for plain text files. Files are broken into lines. Either linefeed or carriage-return are used to signal end of line. Each line is divided into key and value parts by a separator byte. If no such a byte exists, the key will be the entire line and value will be empty. You did not specify a separator in your job configuration. I made a few changes to your code package com.hortonworks.mapreduce;
/**
*
* @author aervits
*/
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.conf.Configured;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.input.KeyValueTextInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;
import org.apache.hadoop.util.Tool;
import org.apache.hadoop.util.ToolRunner;
public class URLCount extends Configured implements Tool {
public static void main(String[] args) throws Exception {
int res = ToolRunner.run(new Configuration(), new URLCount(), args);
System.exit(res);
}
@Override
public int run(String[] args) throws Exception {
Configuration conf = this.getConf();
conf.set("mapreduce.input.keyvaluelinerecordreader.key.value.separator", " ");
Job job = Job.getInstance(conf, "URLCount");
job.setJarByClass(getClass());
job.setInputFormatClass(KeyValueTextInputFormat.class);
job.setOutputFormatClass(TextOutputFormat.class);
job.setMapperClass(URLCountM.class);
job.setReducerClass(URLCountR.class);
job.setMapOutputKeyClass(Text.class);
job.setMapOutputValueClass(IntWritable.class);
job.setOutputKeyClass(IntWritable.class);
job.setOutputValueClass(Text.class);
FileInputFormat.addInputPath(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));
return (job.waitForCompletion(true) == true ? 0 : -1);
}
}
notice in your tool runner, you don't terminate the job with expected result, then in your run method, I added setOutputKeyClass, setOutputValueClass, setMapOutputKeyClass and setMapOutputValueClass, I also set separator config, changed LongWritable to IntWritable. What does that mean? Well, not setting the separator means KeyValueTextInputFormat will treat the whole line as string and you don't receive any Value from Mapper function. So you'd think you're not getting results from reducer but in reality you weren't passing anything from mapper to reducer in the first place. Moving on, package com.hortonworks.mapreduce;
import java.io.IOException;
import java.util.logging.Level;
import java.util.logging.Logger;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
/**
*
* @author aervits
*/
public class URLCountM extends Mapper<Text, Text,Text, IntWritable> {
private static final Logger LOG = Logger.getLogger(URLCountM.class.getName());
public final IntWritable iw = new IntWritable();
@Override
public void map(Text key, Text value, Context context){
try{
LOG.log(Level.INFO, "MAP_KEY: ".concat(key.toString()).concat(" MAP_VALUE: ".concat(value.toString())));
context.write(key, new IntWritable(Integer.valueOf(value.toString())));
}
catch(NumberFormatException | IOException | InterruptedException e){
LOG.log(Level.SEVERE, "ERROR: ".concat(e.toString()));
}
}
}
notice I added logger, this is a better way of printing out to log expected keys and values package com.hortonworks.mapreduce;
import java.io.IOException;
import java.util.logging.Level;
import java.util.logging.Logger;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Reducer;
/**
*
* @author aervits
*/
public class URLCountR extends Reducer<Text, IntWritable, Text, IntWritable> {
private static final Logger LOG = Logger.getLogger(URLCountR.class.getName());
private IntWritable result = new IntWritable();
@Override
public void reduce(Text key, Iterable<IntWritable> values, Context context
) throws IOException, InterruptedException {
int sum = 0;
for (IntWritable val : values) {
sum += val.get();
}
result.set(sum);
LOG.log(Level.INFO, "REDUCER_VALUE: ".concat(result.toString()));
context.write(key, result);
}
}
notice I'm printing to log again in reducer, just to keep sanity, below is what it looks like in the logs. finally the result looks like so http://url1.com 523
http://url11.com 4
http://url12.com 36
http://url20.com 36
finally, I published the code to my repo, grab it if you need a working example, I compiled it with HDP specific versions of Hadoop. Using our repositories vs Apache is recommended. Take a look at my pom.xml. https://github.com/dbist/URLCount One final hint would be to look at the statistics being printed out after job completes. I realized you were not sending data to Reducer when I ran your code and saw 0 records from mapper. This is what it should look like having only 5 lines of input data. Map-Reduce Framework
Map input records=5
Map output records=5
Map output bytes=103
... View more
02-27-2016
05:03 PM
Only apply patches if necessary and instructed by support. In case you don't have a support contract, here's Pivotal instructions to patch Ambari, we don't provide steps due to the reasons above. http://hawq.docs.pivotal.io/docs-hawq/topics/hdp-prerequisites.html Needless to say its at your own risk.
... View more
02-26-2016
10:02 PM
I would contact support as this is not trivial upgrade.
... View more
02-26-2016
07:54 PM
@Zack Riesland I tagged to notify docs to add this. Thanks for opening the thread!
... View more