Created 12-02-2015 07:19 PM
When I look at HDFS audit logs, I see hbase user from HBaseMaster node accessing hdfs files and the entry I see in audit log is with 'cmd=listStatus'. We regularly see about 3 million of them per hour and we have seen a hike of 6 million of them per hour which probably may have crashed NN. Any idea what HBaseMaster is doing here or if we can reduce any of this load on NN?
Created 12-02-2015 07:37 PM
HBase master reads the list of files of the regions of tables in a couple of cases:
(1) CatalogJanitor process. This runs every hbase.catalogjanitor.interval (5mins by default). This is for garbage collecting regions that have been split or merge. The catalog janitor checks whether the daugther regions (after a split) still has references to the parent region. Once the references are compacted, parent can be deleted. Notice that this process should only access recently split or merged regions.
(2) HFile/WAL cleaner. This runs every hbase.master.cleaner.interval (1 min by default). This is for garbage collecting data files (hfiles) and WAL files. Data files in HBase can be referenced by more than one region, table and shared across snapshots and live tables and there is also a minimum time (TTL) that the hfile/WAL will be kept around. That is why the master is responsible for doing reference counting and garbage collecting the data files. This is possibly the most expensive NN operation among the other ones in this list.
(3) Region Balancer. The balancer takes locality into account for balancing decisions. That is why the balancer will do file listing to find the locality of blocks of files in the regions. The locality of files is kept in a local cache for (hard coded unfortunately) 240 minutes.
Created 12-02-2015 07:37 PM
HBase master reads the list of files of the regions of tables in a couple of cases:
(1) CatalogJanitor process. This runs every hbase.catalogjanitor.interval (5mins by default). This is for garbage collecting regions that have been split or merge. The catalog janitor checks whether the daugther regions (after a split) still has references to the parent region. Once the references are compacted, parent can be deleted. Notice that this process should only access recently split or merged regions.
(2) HFile/WAL cleaner. This runs every hbase.master.cleaner.interval (1 min by default). This is for garbage collecting data files (hfiles) and WAL files. Data files in HBase can be referenced by more than one region, table and shared across snapshots and live tables and there is also a minimum time (TTL) that the hfile/WAL will be kept around. That is why the master is responsible for doing reference counting and garbage collecting the data files. This is possibly the most expensive NN operation among the other ones in this list.
(3) Region Balancer. The balancer takes locality into account for balancing decisions. That is why the balancer will do file listing to find the locality of blocks of files in the regions. The locality of files is kept in a local cache for (hard coded unfortunately) 240 minutes.