Created 08-21-2017 05:23 AM
I have taken the Hbase table backup using Hbase Export utility tool .
I got all data transferred into HDFS correctly in sequence file format .
Now i want to run mapreduce to read the key value from the output file but getting below exception
java.lang.Exception: java.io.IOException: Could not find a deserializer for the Value class: 'org.apache.hadoop.hbase.client.Result'. Please ensure that the configuration 'io.serializations' is properly configured, if you're using custom serialization. at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:406) Caused by: java.io.IOException: Could not find a deserializer for the Value class: 'org.apache.hadoop.hbase.client.Result'. Please ensure that the configuration 'io.serializations' is properly configured, if you're using custom serialization. at org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1964) at org.apache.hadoop.io.SequenceFile$Reader.initialize(SequenceFile.java:1811) at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1760) at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1774) at org.apache.hadoop.mapreduce.lib.input.SequenceFileRecordReader.initialize(SequenceFileRecordReader.java:50) at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.initialize(MapTask.java:478) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:671) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:330)
Here is my driver code
package SEQ; import org.apache.hadoop.conf.Configured; import org.apache.hadoop.fs.FileSystem; import org.apache.hadoop.fs.Path; import org.apache.hadoop.hbase.client.Result; import org.apache.hadoop.hbase.io.ImmutableBytesWritable; import org.apache.hadoop.mapreduce.Job; import org.apache.hadoop.mapreduce.lib.input.FileInputFormat; import org.apache.hadoop.mapreduce.lib.input.SequenceFileInputFormat; import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat; import org.apache.hadoop.util.Tool; import org.apache.hadoop.util.ToolRunner; public class SeqDriver extends Configured implements Tool { public static void main(String[] args) throws Exception{ int exitCode = ToolRunner.run(new SeqDriver(), args); System.exit(exitCode); } public int run(String[] args) throws Exception { if (args.length != 2) { System.err.printf("Usage: %s needs two arguments files\n", getClass().getSimpleName()); return -1; } String outputPath = args[1]; FileSystem hfs = FileSystem.get(getConf()); Job job = new Job(); job.setJarByClass(SeqDriver.class); job.setJobName("SequenceFileReader"); HDFSUtil.removeHdfsSubDirIfExists(hfs, new Path(outputPath), true); FileInputFormat.addInputPath(job, new Path(args[0])); FileOutputFormat.setOutputPath(job, new Path(args[1])); job.setOutputKeyClass(ImmutableBytesWritable.class); job.setOutputValueClass(Result.class); job.setInputFormatClass(SequenceFileInputFormat.class); job.setMapperClass(MySeqMapper.class); job.setNumReduceTasks(0); int returnValue = job.waitForCompletion(true) ? 0:1; if(job.isSuccessful()) { System.out.println("Job was successful"); } else if(!job.isSuccessful()) { System.out.println("Job was not successful"); } return returnValue; } }
Here is my mapper code
package SEQ; import java.io.IOException; import org.apache.hadoop.hbase.client.Result; import org.apache.hadoop.hbase.io.ImmutableBytesWritable; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapreduce.Mapper; public class MySeqMapper extends Mapper <ImmutableBytesWritable, Result, Text, Text>{ @Override public void map(ImmutableBytesWritable row, Result value,Context context) throws IOException, InterruptedException { } }
Created 08-22-2017 09:21 AM
So i will answer my question here is what was needed to make it work Because we use HBase to store our data and this reducer outputs its result to HBase table, Hadoop is telling us that he doesn’t know how to serialize our data. That is why we need to help it. Inside setUp set the io.serializations variable
hbaseConf.setStrings("io.serializations", new String[]{hbaseConf.get("io.serializations"), MutationSerialization.class.getName(), ResultSerialization.class.getName()});
Created 08-21-2017 05:36 AM
Can you please share the Whole "io.serializations" property configuration from "core-site.xml", Looks like it is not set properly.
- I remember one such issue with the "io.serializations" property definition had a <final>true</final> in it which was causing issue. Please check if you have similar issue. Please try removing the <final>true</final> tag line if you find it inside your "io.serializations" property definition.
OR the value of this property might not be set properly.
.
Created 08-21-2017 05:56 AM
Hi Jay,
Thanks for responding
I don't have such property in the core-site.xml .
Here is the details also .
<property> <name>fs.defaultFS</name> <value>hdfs://quickstart.cloudera:8020</value> </property> <!-- OOZIE proxy user setting --> <property> <name>hadoop.proxyuser.oozie.hosts</name> <value>*</value> </property> <property> <name>hadoop.proxyuser.oozie.groups</name> <value>*</value> </property> <!-- HTTPFS proxy user setting --> <property> <name>hadoop.proxyuser.httpfs.hosts</name> <value>*</value> </property> <property> <name>hadoop.proxyuser.httpfs.groups</name> <value>*</value> </property> <!-- Llama proxy user setting --> <property> <name>hadoop.proxyuser.llama.hosts</name> <value>*</value> </property> <property> <name>hadoop.proxyuser.llama.groups</name> <value>*</value> </property> <!-- Hue proxy user setting --> <property> <name>hadoop.proxyuser.hue.hosts</name> <value>*</value> </property> <property> <name>hadoop.proxyuser.hue.groups</name> <value>*</value> </property> </configuration>
Created 08-22-2017 09:21 AM
So i will answer my question here is what was needed to make it work Because we use HBase to store our data and this reducer outputs its result to HBase table, Hadoop is telling us that he doesn’t know how to serialize our data. That is why we need to help it. Inside setUp set the io.serializations variable
hbaseConf.setStrings("io.serializations", new String[]{hbaseConf.get("io.serializations"), MutationSerialization.class.getName(), ResultSerialization.class.getName()});
Created 03-13-2019 09:48 PM
Can you please share a sample java code for reading the hadoop sequential file which has hbase.io.ImmutableBytesWritable as Key class and hbase.client.Results as value class?
Need to read from input stream which can read from hdfs. Would like to write it into output stream. My input stream shows the file can be read from hdfs but I cannot parse it. So need to build a parser for same.