We are using the HBase-indexer (KeyValue Store Indexer service) for indexing in NRT into Solr new/modified HBase data.
Information : The service (KeyValue Store Indexer) is configured with 10GB of memory on 6 hosts - the 6 data-nodes.
Recently, we have observed in one of our clusters that all the resident service "KeyValue Store Indexer" were down.
Some hours and many tests later, we have reach the conclusion that the service encounter an out of memory.
When we restart the service, it encounter an OOM within seconds.
I made an head-dump of one of the service, and it shows an accumulation of objects. I interpret this like this :
- The service boot
- It tried to load in memory every sepEvent/WAL entry available
- The JVM blows up when reaching the max heap size
My question is :
- Is there a way to "throttle" the number of sepEvents/WAL entries consummed by the service KeyValue Store Indexer in order to "manage" the memory ?
There is one particular indexer that seems to have an issue.
| Host | Queue size | Size all HLogs | Current HLog | Age last | TS last | Peer | | | (incl. current) | (excl. current) | progress | shipped op | shipped op | count | | xxxxxxxx.xxx.xxxxx.xxx.xxx.xxxx,60020,14853716 | 208 | 12024.3 MB | 35 % | (enable jmx) | unknown | unkno |
Seems like the indexer do not manage to process the "queue". The size of the queue correspond to 208 oldWALs referenced and not purged from HBase (because they are referenced by the indexer).
In these WALs there is only little data regarding the corresponding HBase table.
I guess there is in one of these WALs some corrupt data that the indexer do not manage to process and get stuck ? And until it is processed, the WALs will keep to pill up.
I'll raise a support ticket about that. Because it's clearly not easy to spot the initial issue leading to this. And how it can be fixed without deleting/reinstalling the indexer.
For the moment we have not managed to narrow down the issue with the support and I was forced to clean the plateform.
So the investigation are suspended right now.