Created 10-25-2018 06:54 AM
Hi,
We have setup HBase database on aws ec2 instances using Ambari. We have 2 ec2 t2.larges instance, one contain zookeeper master and contains Name and Data Node.
Now there is a daily crone job, runs on salve node that takes incremental backup and save it to hdfs. It works perfectly but after some days, almost after 15-20 days, the backup map reduce job started fail with Out Of Heap Memory Error. If we check our instance memory state we almost have 4.5GB in cache and 2.7GM in available. If it fails once, it will continue to fail with Out of Memory Error for forever.
Resolution that we have followed in last two occasion was:
And it comes back to normal. We have not able to figure out exact issue, and it is almost time to go into production. We cannot go into with proper backup setup. Need community help to find best way to backup our database.
Kindly suggest where we can be wrong with our implementation.
PS: Previously we had tried to take backup and directly save it to S3 with no success. Here is the issue we were facing with old method.
Created 10-25-2018 04:03 PM
@Josh Elser, If I understand correctly, increasing Java Heap size can resolve issue but will this a proper fix? Instead of 20 days, it may can failed after 40 days. We have though to make full backup after every 15 days; is this the right solution?
Created 10-25-2018 03:48 PM
The more backups you have, the more information the mapreduce needs to read to execute correctly.
Similarly, the more incremental backups you have without a full backup means more files that the MR job needs to read/hold.
The solution is to increase the Java heap that you provide to the job. You should also do some learning to understand the difference between Java heap and physical memory as your analysis implies that you don't understand the difference between them.
Created 10-25-2018 04:03 PM
@Josh Elser, If I understand correctly, increasing Java Heap size can resolve issue but will this a proper fix? Instead of 20 days, it may can failed after 40 days. We have though to make full backup after every 15 days; is this the right solution?
Created 11-12-2018 07:45 AM
Hi,
Today again after 18 days, my backup job has failed. I have tried some different techniques to fix it, but all in vain. Only solution I got till now is, take a fresh full backup.
I did make some observations this time on memory usage. I am attaching my hbase backup log hbase-backup-logtxt.zip and linux top command screenshot screen-shot-2018-11-12-at-122551-pm.png. If some can kindly look onto it and provide some feedback. It will be very helpful for the purpose.
Created 11-12-2018 04:34 PM
It sounds like you are using the software incorrectly. The expectation is that you run a full backup more than once, not only once. You run full backups so that the number of WALs to be tracked for incremental backups is limited.
Run a new full backup every couple of weeks.
Created 11-12-2018 11:55 AM
Hhostgator coupons offers website hosting services around the world in more than 202 countries. Furthermore, you can get services of web hosting, reseller, and VPS hosting according to your needs. Additionally, They offer dedicated servers hosting services.
Created 05-13-2019 09:14 AM
We have solved this by use of merge backup. Now we have two backup files, one is full backup and other one is merged incremented backup files.