Support Questions
Find answers, ask questions, and share your expertise

Running MaprRduce on Hbase table snapshot not working

Running MaprRduce on Hbase table snapshot not working

Rising Star

In order to avoid full table scan on Hbase table i thought to run mapreduce on Hbase table snapshot .

I have created snapshot of my table using below command

snapshot 'FundamentalAnalytic','FundamentalAnalyticSnapshot'

After that to run mapreduce i have to transfer it to my local HDFS .So i ran export command like following and copy it to tmp dir .

hbase org.apache.hadoop.hbase.snapshot.ExportSnapshot -snapshot FundamentalAnalyticSnapshot -copy-to /tmp/  -mappers 16

It got copied successfully not i ran mapreduce job that has driver code like this.

String snapshotName="FundamentalAnalyticSnapshot";
TableMapReduceUtil.initTableSnapshotMapperJob(snapshotName, scan, DefaultMapper.class, NullWritable.class, Text.class, job,true, new Path("/tmp");

But it throw error

org.apache.hadoop.hbase.snapshot.CorruptedSnapshotException: Couldn't read snapshot info from:file:/tmp/hbase-cloudera/hbase/.hbase-snapshot/FundamentalAnalyticSnapshot/.snapshotinfo     at org.apache.hadoop.hbase.snapshot.SnapshotDescriptionUtils.readSnapshotInfo(SnapshotDescriptionUtils.java:294)     at org.apache.hadoop.hbase.snapshot.RestoreSnapshotHelper.copySnapshotForScanner(RestoreSnapshotHelper.java:818)     at org.apache.hadoop.hbase.mapreduce.TableSnapshotInputFormatImpl.setInput(TableSnapshotInputFormatImpl.java:355)     at org.apache.hadoop.hbase.mapreduce.TableSnapshotInputFormat.setInput(TableSnapshotInputFormat.java:204)     at org.apache.hadoop.hbase.mapreduce.TableMapReduceUtil.initTableSnapshotMapperJob(TableMapReduceUtil.java:335)     at com.thomsonretuers.hbase.HBaseToFileDriver.run(HBaseToFileDriver.java:128)     at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)     at com.thomsonretuers.hbase.HBaseToFileDriver.main(HBaseToFileDriver.java:75)  Caused by: java.io.FileNotFoundException: File file:/tmp/hbase-cloudera/hbase/.hbase-snapshot/FundamentalAnalyticSnapshot/.snapshotinfo does not exist

I know i am doing some mistake in not exporting snapshot to correct dir .

Please help me .

Thanks,

Sudarshan

3 REPLIES 3

Re: Running MaprRduce on Hbase table snapshot not working

MapReduce over HBase Snapshots expects the snapshot to exist in the HBase installation, not the exported version of it.

You are also providing a path with file:// where you probably want hdfs://.

Re: Running MaprRduce on Hbase table snapshot not working

Rising Star

I did not get your point snapshot to exist in the HBase installation.Do i have to move snapshot somewhere ?

When i take snapshot will this not automatically available in the Hbase directory .

Also i changes the restorePath as hdfs://quickstart.cloudera:8020/hbase.
java.lang.IllegalArgumentException: java.net.URISyntaxException: Relative path in absolute URI: hdfs://quickstart.cloudera:8020bd03e5d6-bb0a-46ac-b900-65ae0fe0a439

Re: Running MaprRduce on Hbase table snapshot not working

Rising Star

Solved it after using correct path

Create snapshot

snapshot 'FundamentalAnalytic','FundamentalAnalyticSnapshot' 

Export Snapshot to local hdfs

hbase org.apache.hadoop.hbase.snapshot.ExportSnapshot -snapshot FundamentalAnalyticSnapshot -copy-to /tmp -mappers 16 

Driver Job Configuration to rum mapreduce on Hbase snapshot

    String snapshotName="FundamentalAnalyticSnapshot";
    Path restoreDir = new Path("hdfs://quickstart.cloudera:8020/tmp");
    String  hbaseRootDir =  "hdfs://quickstart.cloudera:8020/hbase";
TableMapReduceUtil.initTableSnapshotMapperJob(snapshotName, // snapshot name
    scan, // Scan instance to control CF and attribute selection
    DefaultMapper.class, // mapper class
    NullWritable.class, // mapper output key
    Text.class, // mapper output value
    job,
    true,
    restoreDir); 

Also running mapreduce on Hbase snapshot will skip scan on Hbase table and also there will be no impact on region server.