Running MaprRduce on Hbase table snapshot not working

In order to avoid full table scan on Hbase table i thought to run mapreduce on Hbase table snapshot .

I have created snapshot of my table using below command

snapshot 'FundamentalAnalytic','FundamentalAnalyticSnapshot'

After that to run mapreduce i have to transfer it to my local HDFS .So i ran export command like following and copy it to tmp dir .

hbase org.apache.hadoop.hbase.snapshot.ExportSnapshot -snapshot FundamentalAnalyticSnapshot -copy-to /tmp/  -mappers 16

It got copied successfully not i ran mapreduce job that has driver code like this.

String snapshotName="FundamentalAnalyticSnapshot";
TableMapReduceUtil.initTableSnapshotMapperJob(snapshotName, scan, DefaultMapper.class, NullWritable.class, Text.class, job,true, new Path("/tmp");

But it throw error

org.apache.hadoop.hbase.snapshot.CorruptedSnapshotException: Couldn't read snapshot info from:file:/tmp/hbase-cloudera/hbase/.hbase-snapshot/FundamentalAnalyticSnapshot/.snapshotinfo     at org.apache.hadoop.hbase.snapshot.SnapshotDescriptionUtils.readSnapshotInfo(     at org.apache.hadoop.hbase.snapshot.RestoreSnapshotHelper.copySnapshotForScanner(     at org.apache.hadoop.hbase.mapreduce.TableSnapshotInputFormatImpl.setInput(     at org.apache.hadoop.hbase.mapreduce.TableSnapshotInputFormat.setInput(     at org.apache.hadoop.hbase.mapreduce.TableMapReduceUtil.initTableSnapshotMapperJob(     at     at     at com.thomsonretuers.hbase.HBaseToFileDriver.main(  Caused by: File file:/tmp/hbase-cloudera/hbase/.hbase-snapshot/FundamentalAnalyticSnapshot/.snapshotinfo does not exist

I know i am doing some mistake in not exporting snapshot to correct dir .

Please help me .




MapReduce over HBase Snapshots expects the snapshot to exist in the HBase installation, not the exported version of it.

You are also providing a path with file:// where you probably want hdfs://.

I did not get your point snapshot to exist in the HBase installation.Do i have to move snapshot somewhere ?

When i take snapshot will this not automatically available in the Hbase directory .

Also i changes the restorePath as hdfs://quickstart.cloudera:8020/hbase.
java.lang.IllegalArgumentException: Relative path in absolute URI: hdfs://quickstart.cloudera:8020bd03e5d6-bb0a-46ac-b900-65ae0fe0a439

Solved it after using correct path

Create snapshot

snapshot 'FundamentalAnalytic','FundamentalAnalyticSnapshot' 

Export Snapshot to local hdfs

hbase org.apache.hadoop.hbase.snapshot.ExportSnapshot -snapshot FundamentalAnalyticSnapshot -copy-to /tmp -mappers 16 

Driver Job Configuration to rum mapreduce on Hbase snapshot

    String snapshotName="FundamentalAnalyticSnapshot";
    Path restoreDir = new Path("hdfs://quickstart.cloudera:8020/tmp");
    String  hbaseRootDir =  "hdfs://quickstart.cloudera:8020/hbase";
TableMapReduceUtil.initTableSnapshotMapperJob(snapshotName, // snapshot name
    scan, // Scan instance to control CF and attribute selection
    DefaultMapper.class, // mapper class
    NullWritable.class, // mapper output key
    Text.class, // mapper output value

Also running mapreduce on Hbase snapshot will skip scan on Hbase table and also there will be no impact on region server.