Created 06-01-2017 09:28 AM
In order to avoid full table scan on Hbase table i thought to run mapreduce on Hbase table snapshot .
I have created snapshot of my table using below command
snapshot 'FundamentalAnalytic','FundamentalAnalyticSnapshot'
After that to run mapreduce i have to transfer it to my local HDFS .So i ran export command like following and copy it to tmp dir .
hbase org.apache.hadoop.hbase.snapshot.ExportSnapshot -snapshot FundamentalAnalyticSnapshot -copy-to /tmp/ -mappers 16
It got copied successfully not i ran mapreduce job that has driver code like this.
String snapshotName="FundamentalAnalyticSnapshot";
TableMapReduceUtil.initTableSnapshotMapperJob(snapshotName, scan, DefaultMapper.class, NullWritable.class, Text.class, job,true, new Path("/tmp");
But it throw error
org.apache.hadoop.hbase.snapshot.CorruptedSnapshotException: Couldn't read snapshot info from:file:/tmp/hbase-cloudera/hbase/.hbase-snapshot/FundamentalAnalyticSnapshot/.snapshotinfo at org.apache.hadoop.hbase.snapshot.SnapshotDescriptionUtils.readSnapshotInfo(SnapshotDescriptionUtils.java:294) at org.apache.hadoop.hbase.snapshot.RestoreSnapshotHelper.copySnapshotForScanner(RestoreSnapshotHelper.java:818) at org.apache.hadoop.hbase.mapreduce.TableSnapshotInputFormatImpl.setInput(TableSnapshotInputFormatImpl.java:355) at org.apache.hadoop.hbase.mapreduce.TableSnapshotInputFormat.setInput(TableSnapshotInputFormat.java:204) at org.apache.hadoop.hbase.mapreduce.TableMapReduceUtil.initTableSnapshotMapperJob(TableMapReduceUtil.java:335) at com.thomsonretuers.hbase.HBaseToFileDriver.run(HBaseToFileDriver.java:128) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) at com.thomsonretuers.hbase.HBaseToFileDriver.main(HBaseToFileDriver.java:75) Caused by: java.io.FileNotFoundException: File file:/tmp/hbase-cloudera/hbase/.hbase-snapshot/FundamentalAnalyticSnapshot/.snapshotinfo does not exist
I know i am doing some mistake in not exporting snapshot to correct dir .
Please help me .
Thanks,
Sudarshan
Created 06-01-2017 02:26 PM
MapReduce over HBase Snapshots expects the snapshot to exist in the HBase installation, not the exported version of it.
You are also providing a path with file:// where you probably want hdfs://.
Created 06-01-2017 03:54 PM
I did not get your point snapshot to exist in the HBase installation.Do i have to move snapshot somewhere ?
When i take snapshot will this not automatically available in the Hbase directory .
Also i changes the restorePath as hdfs://quickstart.cloudera:8020/hbase.
java.lang.IllegalArgumentException: java.net.URISyntaxException: Relative path in absolute URI: hdfs://quickstart.cloudera:8020bd03e5d6-bb0a-46ac-b900-65ae0fe0a439
Created 06-03-2017 07:04 PM
Solved it after using correct path
Create snapshot
snapshot 'FundamentalAnalytic','FundamentalAnalyticSnapshot'
Export Snapshot to local hdfs
hbase org.apache.hadoop.hbase.snapshot.ExportSnapshot -snapshot FundamentalAnalyticSnapshot -copy-to /tmp -mappers 16
Driver Job Configuration to rum mapreduce on Hbase snapshot
String snapshotName="FundamentalAnalyticSnapshot"; Path restoreDir = new Path("hdfs://quickstart.cloudera:8020/tmp"); String hbaseRootDir = "hdfs://quickstart.cloudera:8020/hbase"; TableMapReduceUtil.initTableSnapshotMapperJob(snapshotName, // snapshot name scan, // Scan instance to control CF and attribute selection DefaultMapper.class, // mapper class NullWritable.class, // mapper output key Text.class, // mapper output value job, true, restoreDir);
Also running mapreduce on Hbase snapshot will skip scan on Hbase table and also there will be no impact on region server.