Reply
New Contributor
Posts: 2
Registered: ‎04-24-2017

HBase import job does not complete

[ Edited ]

Issue -
The hbase import job fails in the end of import. If the splits were say 4000 and the mappers created for import were 4000, the last set of tasks which have to run will get struck in NEW State and never completes. The number of tasks which get struck varies between 1 to 8 in the 4-5 times we have tried. There are no errors/killed tasks for the compelted ones. There are no errors in the container log or task logs which say something obvious was a problem. The region server error also does not show any errors at the time this happens. Once this happens the import job never completes and gets struck in the same state and the only option would be to kill the job using the yarn application kill command.

 

We initially thought that the problem with the files which were exported due to which the import was failing, upon importing the files into the table individually (1 by one using the import command) the import goes through.

 

Setup -

Source Cloudera Cluster -
Cloudera 4.7
HDFS hadoop-2.0.0+1612
Hbase hbase-0.94.15+119
Number of nodes 5

 

Target Cloudera Cluster -
Cloudera 5.5.4
Hadoop hadoop-2.6.0+cdh5.5.4+1072
Hbase hbase-1.0.0+cdh5.5.4+318
Number of nodes 7

 

Operation Being carried out -
Export from Source HBase table and Import into Target Base table

 

Tools used for Export/Import
org.apache.hadoop.hbase.mapreduce.Export
org.apache.hadoop.hbase.mapreduce.Import

 

Commands used for Export/Import
Source Cluster -
hbase org.apache.hadoop.hbase.mapreduce.Export TABLE_NAME hdfs://master:8020/hbasebackup/TABLE_NAME

Target Cluster -
hbase -Dhbase.import.version=0.94 org.apache.hadoop.hbase.mapreduce.Import TABLE_NAME hdfs://master2:8020/hbasebackup/TABLE_NAME


Yarn Configuration -
Node managers - 5
VCodes - 80
Memory - 70.5GB
Scheduling Policy - DRF (The default configuration, nothing has been changed here)

 

Export input details
1. Folder size of the exported data - 554G
2. Total number of sequence files in the exported folder - 741
3. Size of these files - varies between 290MB to 1.1 GB

 

How was the exported files transferred from the source to the target cluster
1. Once the table data was exported on the source cluster, it was moved to local file system on the source cluster using copyToLocal command of hadoop.
2. The files were transferred to the target cluster using rsync.
3. The files were moved to hdfs on the target cluster using the copyFromLocal command of hadoop.

 

 

 

Highlighted
New Contributor
Posts: 2
Registered: ‎04-24-2017

Re: HBase import job does not complete

Announcements