Support Questions

Find answers, ask questions, and share your expertise

Is there a limit to the number of snapshots I can have for a table in HBase?

avatar

The title really says it all. Is there a point at which HBase will fall over if I take too many snapshots for an individual table or a point at which I will notice issues restoring/exporting snapshots if I take too many? So, for example, if I take daily snapshots then do I need to start cleaning them up after 30 days, 6 months, a year?

1 ACCEPTED SOLUTION

avatar
Rising Star
@Brandon Wilson

mqureshi's explanation is correct, technically, you can have unlimited number of snapshots in hbase, but it will put much pressure on hdfs.

It would not only just occupy some disk space, but it would create a huge amount of hfiles that might slow down the NameNode. Let's assume that you have a 10-CF HTable with 50k regions, each CF has 5 hfiles in average, which means you would have totally 2.5million hfiles for this single table. The first time you create a snapshot, all 2.5m hfiles will be referenced. When you do another snapshot in the next day(after some routine compactions, of course), another 2 or more million new hfiles will probably be referenced. Remember: old hfiles would not be removed until the snapshot is removed. In this case, you will get more than 15 million referenced hfiles after a week, which would be a really bad news for namenode.

View solution in original post

2 REPLIES 2

avatar
Super Guru

@Brandon Wilson

Theoretically, there isn't a limit on number of snapshots but, like everything there is a price to pay. Snapshot as you know only captures metadata information at point in time.

Now imagine you created one snapshot every minute (taking an extreme example to explain what will happen). Two hours later you have 120 snapshots.

hfiles are immutable. Guess what happens. The moment snapshot is taken, snapshot will contain a reference to hfiles at that point in time. HBase snapshot doesn't make any copies of data. That only happens when you are restoring from snapshot. But what do you think happens when a compaction or deletion is triggered? If snapshot has reference to those immutable hfiles, then they are moved to an archiving folder. They are not really deleted. Because you might decide to restore from that snapshot.

If you have lots of compactions and updates then each snapshot might be pointing to different hfiles. This means, your snapshots will affect your storage.

So, you do not have a theoretical limit on number of snapshots But, snapshots if used aggressively are not entirely free of cost. You might end up using a significant amount of storage.

So, doing that 30 days, 6 months or a year will require significant storage overhead.

avatar
Rising Star
@Brandon Wilson

mqureshi's explanation is correct, technically, you can have unlimited number of snapshots in hbase, but it will put much pressure on hdfs.

It would not only just occupy some disk space, but it would create a huge amount of hfiles that might slow down the NameNode. Let's assume that you have a 10-CF HTable with 50k regions, each CF has 5 hfiles in average, which means you would have totally 2.5million hfiles for this single table. The first time you create a snapshot, all 2.5m hfiles will be referenced. When you do another snapshot in the next day(after some routine compactions, of course), another 2 or more million new hfiles will probably be referenced. Remember: old hfiles would not be removed until the snapshot is removed. In this case, you will get more than 15 million referenced hfiles after a week, which would be a really bad news for namenode.