About sudarshankumar_

sudarshankumar_ · ‎08-22-2017

So i will answer my question here is what was needed to make it work Because we use HBase to store our data and this reducer outputs its result to HBase table, Hadoop is telling us that he doesn’t know how to serialize our data. That is why we need to help it. Inside setUp set the io.serializations variable hbaseConf.setStrings("io.serializations", new String[]{hbaseConf.get("io.serializations"), MutationSerialization.class.getName(), ResultSerialization.class.getName()});

sudarshankumar_ · ‎08-21-2017

Hi Jay, Thanks for responding I don't have such property in the core-site.xml . Here is the details also . <property> <name>fs.defaultFS</name> <value>hdfs://quickstart.cloudera:8020</value> </property>  <property> <name>hadoop.proxyuser.oozie.hosts</name> <value>*</value> </property> <property> <name>hadoop.proxyuser.oozie.groups</name> <value>*</value> </property>  <property> <name>hadoop.proxyuser.httpfs.hosts</name> <value>*</value> </property> <property> <name>hadoop.proxyuser.httpfs.groups</name> <value>*</value> </property>  <property> <name>hadoop.proxyuser.llama.hosts</name> <value>*</value> </property> <property> <name>hadoop.proxyuser.llama.groups</name> <value>*</value> </property>  <property> <name>hadoop.proxyuser.hue.hosts</name> <value>*</value> </property> <property> <name>hadoop.proxyuser.hue.groups</name> <value>*</value> </property> </configuration>

sudarshankumar_ · ‎08-21-2017

I have taken the Hbase table backup using Hbase Export utility tool . I got all data transferred into HDFS correctly in sequence file format . Now i want to run mapreduce to read the key value from the output file but getting below exception java.lang.Exception: java.io.IOException: Could not find a deserializer for the Value class: 'org.apache.hadoop.hbase.client.Result'. Please ensure that the configuration 'io.serializations' is properly configured, if you're using custom serialization. at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:406) Caused by: java.io.IOException: Could not find a deserializer for the Value class: 'org.apache.hadoop.hbase.client.Result'. Please ensure that the configuration 'io.serializations' is properly configured, if you're using custom serialization. at org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1964) at org.apache.hadoop.io.SequenceFile$Reader.initialize(SequenceFile.java:1811) at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1760) at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1774) at org.apache.hadoop.mapreduce.lib.input.SequenceFileRecordReader.initialize(SequenceFileRecordReader.java:50) at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.initialize(MapTask.java:478) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:671) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:330) Here is my driver code package SEQ; import org.apache.hadoop.conf.Configured; import org.apache.hadoop.fs.FileSystem; import org.apache.hadoop.fs.Path; import org.apache.hadoop.hbase.client.Result; import org.apache.hadoop.hbase.io.ImmutableBytesWritable; import org.apache.hadoop.mapreduce.Job; import org.apache.hadoop.mapreduce.lib.input.FileInputFormat; import org.apache.hadoop.mapreduce.lib.input.SequenceFileInputFormat; import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat; import org.apache.hadoop.util.Tool; import org.apache.hadoop.util.ToolRunner; public class SeqDriver extends Configured implements Tool { public static void main(String[] args) throws Exception{ int exitCode = ToolRunner.run(new SeqDriver(), args); System.exit(exitCode); } public int run(String[] args) throws Exception { if (args.length != 2) { System.err.printf("Usage: %s needs two arguments files\n", getClass().getSimpleName()); return -1; } String outputPath = args[1]; FileSystem hfs = FileSystem.get(getConf()); Job job = new Job(); job.setJarByClass(SeqDriver.class); job.setJobName("SequenceFileReader"); HDFSUtil.removeHdfsSubDirIfExists(hfs, new Path(outputPath), true); FileInputFormat.addInputPath(job, new Path(args[0])); FileOutputFormat.setOutputPath(job, new Path(args[1])); job.setOutputKeyClass(ImmutableBytesWritable.class); job.setOutputValueClass(Result.class); job.setInputFormatClass(SequenceFileInputFormat.class); job.setMapperClass(MySeqMapper.class); job.setNumReduceTasks(0); int returnValue = job.waitForCompletion(true) ? 0:1; if(job.isSuccessful()) { System.out.println("Job was successful"); } else if(!job.isSuccessful()) { System.out.println("Job was not successful"); } return returnValue; } } Here is my mapper code package SEQ; import java.io.IOException; import org.apache.hadoop.hbase.client.Result; import org.apache.hadoop.hbase.io.ImmutableBytesWritable; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapreduce.Mapper; public class MySeqMapper extends Mapper <ImmutableBytesWritable, Result, Text, Text>{ @Override public void map(ImmutableBytesWritable row, Result value,Context context) throws IOException, InterruptedException { } }

sudarshankumar_ · ‎06-03-2017

Solved it after using correct path Create snapshot snapshot 'FundamentalAnalytic','FundamentalAnalyticSnapshot' Export Snapshot to local hdfs hbase org.apache.hadoop.hbase.snapshot.ExportSnapshot -snapshot FundamentalAnalyticSnapshot -copy-to /tmp -mappers 16 Driver Job Configuration to rum mapreduce on Hbase snapshot String snapshotName="FundamentalAnalyticSnapshot"; Path restoreDir = new Path("hdfs://quickstart.cloudera:8020/tmp"); String hbaseRootDir = "hdfs://quickstart.cloudera:8020/hbase"; TableMapReduceUtil.initTableSnapshotMapperJob(snapshotName, // snapshot name scan, // Scan instance to control CF and attribute selection DefaultMapper.class, // mapper class NullWritable.class, // mapper output key Text.class, // mapper output value job, true, restoreDir); Also running mapreduce on Hbase snapshot will skip scan on Hbase table and also there will be no impact on region server.

sudarshankumar_ · ‎06-01-2017

I did not get your point snapshot to exist in the HBase installation.Do i have to move snapshot somewhere ? When i take snapshot will this not automatically available in the Hbase directory . Also i changes the restorePath as hdfs://quickstart.cloudera:8020/hbase. java.lang.IllegalArgumentException: java.net.URISyntaxException: Relative path in absolute URI: hdfs://quickstart.cloudera:8020bd03e5d6-bb0a-46ac-b900-65ae0fe0a439

sudarshankumar_ · ‎06-01-2017

In order to avoid full table scan on Hbase table i thought to run mapreduce on Hbase table snapshot . I have created snapshot of my table using below command snapshot 'FundamentalAnalytic','FundamentalAnalyticSnapshot' After that to run mapreduce i have to transfer it to my local HDFS .So i ran export command like following and copy it to tmp dir . hbase org.apache.hadoop.hbase.snapshot.ExportSnapshot -snapshot FundamentalAnalyticSnapshot -copy-to /tmp/ -mappers 16 It got copied successfully not i ran mapreduce job that has driver code like this. String snapshotName="FundamentalAnalyticSnapshot"; TableMapReduceUtil.initTableSnapshotMapperJob(snapshotName, scan, DefaultMapper.class, NullWritable.class, Text.class, job,true, new Path("/tmp"); But it throw error org.apache.hadoop.hbase.snapshot.CorruptedSnapshotException: Couldn't read snapshot info from:file:/tmp/hbase-cloudera/hbase/.hbase-snapshot/FundamentalAnalyticSnapshot/.snapshotinfo at org.apache.hadoop.hbase.snapshot.SnapshotDescriptionUtils.readSnapshotInfo(SnapshotDescriptionUtils.java:294) at org.apache.hadoop.hbase.snapshot.RestoreSnapshotHelper.copySnapshotForScanner(RestoreSnapshotHelper.java:818) at org.apache.hadoop.hbase.mapreduce.TableSnapshotInputFormatImpl.setInput(TableSnapshotInputFormatImpl.java:355) at org.apache.hadoop.hbase.mapreduce.TableSnapshotInputFormat.setInput(TableSnapshotInputFormat.java:204) at org.apache.hadoop.hbase.mapreduce.TableMapReduceUtil.initTableSnapshotMapperJob(TableMapReduceUtil.java:335) at com.thomsonretuers.hbase.HBaseToFileDriver.run(HBaseToFileDriver.java:128) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) at com.thomsonretuers.hbase.HBaseToFileDriver.main(HBaseToFileDriver.java:75) Caused by: java.io.FileNotFoundException: File file:/tmp/hbase-cloudera/hbase/.hbase-snapshot/FundamentalAnalyticSnapshot/.snapshotinfo does not exist I know i am doing some mistake in not exporting snapshot to correct dir . Please help me . Thanks, Sudarshan

sudarshankumar_ · ‎03-31-2017

Customer needs data in the proper file .Even if one file will have 10 kb data also .

sudarshankumar_ · ‎03-31-2017

No i can not go for PIG now my full application is developed on mapreduce .

sudarshankumar_ · ‎03-09-2017

Hi me too getting the same error .i Did kinit but still exception persist.

sudarshankumar_ · ‎02-25-2017

Finally i manged to resolve it . I just used multipleOutputs.write(NullWritable.get(), new Text(sb.toString()),strName); inside the for loop and that solved my problem .I have tested it with very huge data set 19 gb file and it worked fine for me . This is my final solution .Initially i thought it might create many objects but it is working fine for me .Map reduce is also getting competed very fast .

Online	Offline
Last Visited	‎10-30-2018 10:27 AM

Member Since	‎11-11-2016 05:07 PM
Last Visited	‎10-30-2018 10:27 AM
Posts	43
Kudos received	4

Cloudera Community

Re: Running mapreduce on Hbase Exported Table thro...

Re: Error: Java heap space in reducer phase

Re: Running mapreduce on Hbase Exported Table thro...

Re: Running mapreduce on Hbase Exported Table thro...

Running mapreduce on Hbase Exported Table throws C...

Re: Running MaprRduce on Hbase table snapshot not ...

Re: Running MaprRduce on Hbase table snapshot not ...

Running MaprRduce on Hbase table snapshot not work...

Re: How to set No of output file per reducer in Cu...

Re: How to set No of output file per reducer in Cu...

Re: Failed to find any Kerberos tgt Error

Re: Error: Java heap space in reducer phase