We have a CDH 5.3.0 cluster (in this cluster we have 4 data-nodes).
Yarn is configured to allocate up to :
- 60Gb of memory per "NodeManager"
- 32vCPU per "NodeManager"
- 6Gb of heap max for a map task (the opt parameters equals 80%)
- 6Gb of heap max for a reduce task (the opt parameters equals 80%)
We are using Oozie workflows for regenerating the content of hbase tables before re-indexing the content of the hbases table into Solr (using hbase lily indexer : batch).
The oozie workflow is composed of several "shell action" (and only shell action).
Theses "shell action" are essentialy small jobs launching "real" map/reduce jobs (hive query, managing hbase tables, creating solr collection, hbase lily indexer bootstrap).
The issue we are encountering is the following :
During the workflow execution, the shell action that "manage the indexation of the data" reach an error saying :
- "Container running beyond physical limit"
I understand the error and I know that I could workaround this issue by setting more memory to the single "map" task launched by oozie for executing the shell.
But, the single "map" task is already configured to use up to 6Gb of memory (yarn configuration).
And there is no way that the "small" shell execution would take that amount of memory in normal condition. I mean I expect the processing of the shell action to fit in 10Mb of memory (or even less).
This shell launch 5 actions (none of them as memory consuming).
- 1 : it desactivate the hbase replication on a column family/table (using the hbase shell cli)
- 2 : it create a Solr collection (using solrctl cli)
- 3 : it launch the indexation job using hbase lily indexer (the actual job is runned in an other yarn application!)
- 4 : it deploy an hbase lily indexer NRT (if needed) using the hbase-indexer cli
- 5 : it reactivate the hbase replication on a column family/table (using the hbase shell cli) if needed.
Why do these 5 actions consume more that 6Gb of memory sometimes ?
Does someone already encoutered this ? Is there some parameter for "fixing" this ?
Against all expectation, we are investigating around the step "2".
In this step we are using the "solrctl" utility. And this utility seems to "suck up" A LOT of memory for a short period.
For listing the "instancedir" (instancedir --list) or "collection" (collecton --list) it can use several Gb of memory (which is unexpected).
Thank you for your reply.
This could effectively be an other solution.
We have identified "who" was using this gigantic amount of memory and we also have found a workaround.
The problem is related to the "solrctl" utility. This utility (in CDH5.3 at least) use A LOT of memory when invoked.
In fact, the culprit is the zkcli.sh launched for reading the information stored in zookeeper
Seems like there is a "huge optimization problem" with this because the utility can use several GB of memory for a short period of time (several seconds). For example, it can allocate up to 8GB of memory for listing 280 collections.
This is the root cause of our issue because the need in GB can be greater than the 6GB of memory allocated to the map task.
We have seen in newer version of CDH (CDH 5.5) that a new parameter was added to the utility in order to set some JVM arguments for the zkcli which enable to put an -xmx to the zkcli. And fun fact, we can use an -xmx of 128MB > still working and faster !
We have "backported" this evolution of CDH 5.5 in our CDH 5.3, and this is working like a charm now.
Thank you for the reply.
I know of the configuration that could be made at the workflow level.
But our problem was with "solrctl" as stated. This utility is not optimized.
We have provided a FIX to Cloudera support on "solrctl" (with this fix, solrctl run with a very low memory profile). It was 'accepted' by the engeneering team but not sure when (or even if) it will be available in CDH.
I have not tested the latest CDH version. So I don't know if it was integrated or not yet.
By the way, I am "mathieu.d" if you were wondering (I changed my account).