Member since
11-11-2016
43
Posts
4
Kudos Received
2
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
3796 | 08-22-2017 09:21 AM | |
3191 | 02-25-2017 03:18 AM |
08-22-2017
09:21 AM
So i will answer my question
here is what was needed to make it work
Because we use HBase to store our data and this reducer outputs its result to HBase table, Hadoop is telling us that he doesn’t know how to serialize our data. That is why we need to help it. Inside setUp set the io.serializations variable hbaseConf.setStrings("io.serializations", new String[]{hbaseConf.get("io.serializations"), MutationSerialization.class.getName(), ResultSerialization.class.getName()});
... View more
08-21-2017
05:56 AM
Hi Jay, Thanks for responding I don't have such property in the core-site.xml . Here is the details also . <property>
<name>fs.defaultFS</name>
<value>hdfs://quickstart.cloudera:8020</value>
</property>
<!-- OOZIE proxy user setting -->
<property>
<name>hadoop.proxyuser.oozie.hosts</name>
<value>*</value>
</property>
<property>
<name>hadoop.proxyuser.oozie.groups</name>
<value>*</value>
</property>
<!-- HTTPFS proxy user setting -->
<property>
<name>hadoop.proxyuser.httpfs.hosts</name>
<value>*</value>
</property>
<property>
<name>hadoop.proxyuser.httpfs.groups</name>
<value>*</value>
</property>
<!-- Llama proxy user setting -->
<property>
<name>hadoop.proxyuser.llama.hosts</name>
<value>*</value>
</property>
<property>
<name>hadoop.proxyuser.llama.groups</name>
<value>*</value>
</property>
<!-- Hue proxy user setting -->
<property>
<name>hadoop.proxyuser.hue.hosts</name>
<value>*</value>
</property>
<property>
<name>hadoop.proxyuser.hue.groups</name>
<value>*</value>
</property>
</configuration>
... View more
08-21-2017
05:23 AM
I have taken the Hbase table backup using Hbase Export utility tool .
I got all data transferred into HDFS correctly in sequence file format .
Now i want to run mapreduce to read the key value from the output file but getting below exception
java.lang.Exception: java.io.IOException: Could not find a deserializer for the Value class: 'org.apache.hadoop.hbase.client.Result'. Please ensure that the configuration 'io.serializations' is properly configured, if you're using custom serialization.
at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:406)
Caused by: java.io.IOException: Could not find a deserializer for the Value class: 'org.apache.hadoop.hbase.client.Result'. Please ensure that the configuration 'io.serializations' is properly configured, if you're using custom serialization.
at org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1964)
at org.apache.hadoop.io.SequenceFile$Reader.initialize(SequenceFile.java:1811)
at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1760)
at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1774)
at org.apache.hadoop.mapreduce.lib.input.SequenceFileRecordReader.initialize(SequenceFileRecordReader.java:50)
at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.initialize(MapTask.java:478)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:671)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:330)
Here is my driver code package SEQ;
import org.apache.hadoop.conf.Configured;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.hbase.client.Result;
import org.apache.hadoop.hbase.io.ImmutableBytesWritable;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.input.SequenceFileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.util.Tool;
import org.apache.hadoop.util.ToolRunner;
public class SeqDriver extends Configured implements Tool
{
public static void main(String[] args) throws Exception{
int exitCode = ToolRunner.run(new SeqDriver(), args);
System.exit(exitCode);
}
public int run(String[] args) throws Exception {
if (args.length != 2) {
System.err.printf("Usage: %s needs two arguments files\n",
getClass().getSimpleName());
return -1;
}
String outputPath = args[1];
FileSystem hfs = FileSystem.get(getConf());
Job job = new Job();
job.setJarByClass(SeqDriver.class);
job.setJobName("SequenceFileReader");
HDFSUtil.removeHdfsSubDirIfExists(hfs, new Path(outputPath), true);
FileInputFormat.addInputPath(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));
job.setOutputKeyClass(ImmutableBytesWritable.class);
job.setOutputValueClass(Result.class);
job.setInputFormatClass(SequenceFileInputFormat.class);
job.setMapperClass(MySeqMapper.class);
job.setNumReduceTasks(0);
int returnValue = job.waitForCompletion(true) ? 0:1;
if(job.isSuccessful()) {
System.out.println("Job was successful");
} else if(!job.isSuccessful()) {
System.out.println("Job was not successful");
}
return returnValue;
}
} Here is my mapper code package SEQ;
import java.io.IOException;
import org.apache.hadoop.hbase.client.Result;
import org.apache.hadoop.hbase.io.ImmutableBytesWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;
public class MySeqMapper extends Mapper <ImmutableBytesWritable, Result, Text, Text>{
@Override
public void map(ImmutableBytesWritable row, Result value,Context context)
throws IOException, InterruptedException {
}
}
... View more
Labels:
- Labels:
-
Apache Hadoop
-
Apache HBase
06-03-2017
07:04 PM
Solved it after using correct path Create snapshot snapshot 'FundamentalAnalytic','FundamentalAnalyticSnapshot' Export Snapshot to local hdfs hbase org.apache.hadoop.hbase.snapshot.ExportSnapshot -snapshot FundamentalAnalyticSnapshot -copy-to /tmp -mappers 16 Driver Job Configuration to rum mapreduce on Hbase snapshot
String snapshotName="FundamentalAnalyticSnapshot";
Path restoreDir = new Path("hdfs://quickstart.cloudera:8020/tmp");
String hbaseRootDir = "hdfs://quickstart.cloudera:8020/hbase";
TableMapReduceUtil.initTableSnapshotMapperJob(snapshotName, // snapshot name
scan, // Scan instance to control CF and attribute selection
DefaultMapper.class, // mapper class
NullWritable.class, // mapper output key
Text.class, // mapper output value
job,
true,
restoreDir); Also running mapreduce on Hbase snapshot will skip scan on Hbase table and also there will be no impact on region server.
... View more
06-01-2017
03:54 PM
I did not get your point snapshot to exist in the HBase installation.Do i have to move snapshot somewhere ? When i take snapshot will this not automatically available in the Hbase directory . Also i changes the restorePath as hdfs://quickstart.cloudera:8020/hbase. java.lang.IllegalArgumentException: java.net.URISyntaxException: Relative path in absolute URI: hdfs://quickstart.cloudera:8020bd03e5d6-bb0a-46ac-b900-65ae0fe0a439
... View more
06-01-2017
09:28 AM
In order to avoid full table scan on Hbase table i thought to run mapreduce on Hbase table snapshot . I have created snapshot of my table using below command snapshot 'FundamentalAnalytic','FundamentalAnalyticSnapshot' After that to run mapreduce i have to transfer it to my local HDFS .So i ran export command like following and copy it to tmp dir . hbase org.apache.hadoop.hbase.snapshot.ExportSnapshot -snapshot FundamentalAnalyticSnapshot -copy-to /tmp/ -mappers 16 It got copied successfully not i ran mapreduce job that has driver code like this. String snapshotName="FundamentalAnalyticSnapshot";
TableMapReduceUtil.initTableSnapshotMapperJob(snapshotName, scan, DefaultMapper.class, NullWritable.class, Text.class, job,true, new Path("/tmp"); But it throw error org.apache.hadoop.hbase.snapshot.CorruptedSnapshotException: Couldn't read snapshot info from:file:/tmp/hbase-cloudera/hbase/.hbase-snapshot/FundamentalAnalyticSnapshot/.snapshotinfo at org.apache.hadoop.hbase.snapshot.SnapshotDescriptionUtils.readSnapshotInfo(SnapshotDescriptionUtils.java:294) at org.apache.hadoop.hbase.snapshot.RestoreSnapshotHelper.copySnapshotForScanner(RestoreSnapshotHelper.java:818) at org.apache.hadoop.hbase.mapreduce.TableSnapshotInputFormatImpl.setInput(TableSnapshotInputFormatImpl.java:355) at org.apache.hadoop.hbase.mapreduce.TableSnapshotInputFormat.setInput(TableSnapshotInputFormat.java:204) at org.apache.hadoop.hbase.mapreduce.TableMapReduceUtil.initTableSnapshotMapperJob(TableMapReduceUtil.java:335) at com.thomsonretuers.hbase.HBaseToFileDriver.run(HBaseToFileDriver.java:128) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) at com.thomsonretuers.hbase.HBaseToFileDriver.main(HBaseToFileDriver.java:75) Caused by: java.io.FileNotFoundException: File file:/tmp/hbase-cloudera/hbase/.hbase-snapshot/FundamentalAnalyticSnapshot/.snapshotinfo does not exist I know i am doing some mistake in not exporting snapshot to correct dir . Please help me . Thanks, Sudarshan
... View more
Labels:
- Labels:
-
Apache Hadoop
-
Apache HBase
03-31-2017
05:52 AM
Customer needs data in the proper file .Even if one file will have 10 kb data also .
... View more
03-31-2017
05:51 AM
No i can not go for PIG now my full application is developed on mapreduce .
... View more
03-09-2017
03:19 AM
Hi me too getting the same error .i Did kinit but still exception persist.
... View more
02-25-2017
03:18 AM
Finally i manged to resolve it . I just used multipleOutputs.write(NullWritable.get(), new Text(sb.toString()),strName); inside the for loop and that solved my problem .I have tested it with very huge data set 19 gb file and it worked fine for me . This is my final solution .Initially i thought it might create many objects but it is working fine for me .Map reduce is also getting competed very fast .
... View more