Reply
Posts: 140
Topics: 7
Kudos: 15
Solutions: 14
Registered: ‎07-16-2015

Oozie - Shell action - Container running beyond physical limit

[ Edited ]

Hi,

 

We have a CDH 5.3.0 cluster (in this cluster we have 4 data-nodes).

Yarn is configured to allocate up to :

- 60Gb of memory per "NodeManager"

- 32vCPU per "NodeManager"

- 6Gb of heap max for a map task (the opt parameters equals 80%)

- 6Gb of heap max for a reduce task (the opt parameters equals 80%)

 

We are using Oozie workflows for regenerating the content of hbase tables before re-indexing the content of the hbases table into Solr (using hbase lily indexer : batch).

 

The oozie workflow is composed of several "shell action" (and only shell action).

Theses "shell action" are essentialy small jobs launching "real" map/reduce jobs (hive query, managing hbase tables, creating solr collection, hbase lily indexer bootstrap).

 

The issue we are encountering is the following :

During the workflow execution, the shell action that "manage the indexation of the data" reach an error saying :

- "Container running beyond physical limit"

 

I understand the error and I know that I could workaround this issue by setting more memory to the single "map" task launched by oozie for executing the shell.

But, the single "map" task is already configured to use up to 6Gb of memory (yarn configuration).

And there is no way that the "small" shell execution would take that amount of memory in normal condition. I mean I expect the processing of the shell action to fit in 10Mb of memory (or even less).

 

This shell launch 5 actions (none of them as memory consuming).

- 1 : it desactivate the hbase replication on a column family/table (using the hbase shell cli)

- 2 : it create a Solr collection (using solrctl cli)

- 3 : it launch the indexation job using hbase lily indexer (the actual job is runned in an other yarn application!)

- 4 : it deploy an hbase lily indexer NRT (if needed) using the hbase-indexer cli

- 5 : it reactivate the hbase replication on a column family/table (using the hbase shell cli) if needed.

 

Why do these 5 actions consume more that 6Gb of memory sometimes ?

Does someone already encoutered this ? Is there some parameter for "fixing" this ?

 

regards,

mathieu

Posts: 140
Topics: 7
Kudos: 15
Solutions: 14
Registered: ‎07-16-2015

Re: Oozie - Shell action - Container running beyond physical limit

Against all expectation, we are investigating around the step "2".

In this step we are using the "solrctl" utility. And this utility seems to "suck up" A LOT of memory for a short period.

 

For listing the "instancedir" (instancedir --list) or "collection" (collecton --list) it can use several Gb of memory (which is unexpected).

 

Still searching.

 

regards.

 

 

Posts: 1,508
Kudos: 260
Solutions: 230
Registered: ‎07-31-2013

Re: Oozie - Shell action - Container running beyond physical limit

Oozie shell launcher containers don't usually follow the cluster default configurations of task memory values. Have you tried specifically increasing the shell launcher 1-map tasks' configuration by adding the below to your Shell action's configuration (as an ex.)?

<property>
<name>oozie.launcher.mapreduce.map.memory.mb</name>
<value>4096</value>
</property>
<property>
<name>oozie.launcher.mapreduce.map.java.opts</name>
<value>-Xmx3200m</value>
</property>

Any property passed to an action, if prefixed with "oozie.launcher.", will apply to the launching map task instead of the sub-jobs it may run after. In shell's case, the launching map task forks out the command, so its memory gets used by the forked processes.
Backline Customer Operations Engineer
Highlighted
Posts: 140
Topics: 7
Kudos: 15
Solutions: 14
Registered: ‎07-16-2015

Re: Oozie - Shell action - Container running beyond physical limit

[ Edited ]

Thank you for your reply.

This could effectively be an other solution.

 

We have identified "who" was using this gigantic amount of memory and we also have found a workaround.

 

The problem is related to the "solrctl" utility. This utility (in CDH5.3 at least) use A LOT of memory when invoked.

In fact, the culprit is the zkcli.sh launched for reading the information stored in zookeeper

 

Seems like there is a "huge optimization problem" with this because the utility can use several GB of memory for a short period of time (several seconds). For example, it can allocate up to 8GB of memory for listing 280 collections.

This is the root cause of our issue because the need in GB can be greater than the 6GB of memory allocated to the map task.

 

We have seen in newer version of CDH (CDH 5.5) that a new parameter was added to the utility in order to set some JVM arguments for the zkcli which enable to put an -xmx to the zkcli. And fun fact, we can use an -xmx of 128MB > still working and faster !

 

We have "backported" this evolution of CDH 5.5 in our CDH 5.3, and this is working like a charm now.

 

New Contributor
Posts: 7
Registered: ‎06-14-2017

Re: Oozie - Shell action - Container running beyond physical limit

thank your point,
just edit workflow.xml file, add :

<workflow-app name="simple-ONE-wf" xmlns="uri:oozie:workflow:0.1">
<start to='ONE'/>
<action name="ONE">
<spark xmlns="uri:oozie:spark-action:0.1">
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<configuration>
<property>
<name>oozie.launcher.mapreduce.map.memory.mb</name>
<value>4096</value>
</property>
<property>
<name>oozie.launcher.mapreduce.map.java.opts</name>
<value>-Xmx3200m</value>
</property>
<property>
<name>oozie.launcher.mapreduce.map.java.opts</name>
<value>-XX:MaxPermSize=1g</value>
</property>
...
</configuration>
...
</action>

<kill name='kill'>
<message>Something went wrong: ${wf:errorCode('firstdemo')}</message>
</kill>
<end name='end'/>
</workflow-app>
Posts: 11
Topics: 0
Kudos: 1
Solutions: 0
Registered: ‎06-14-2017

Re: Oozie - Shell action - Container running beyond physical limit

[ Edited ]

Thank you for the reply.

 

I know of the configuration that could be made at the workflow level.

But our problem was with "solrctl" as stated. This utility is not optimized.

 

We have provided a FIX to Cloudera support on "solrctl" (with this fix, solrctl run with a very low memory profile). It was 'accepted' by the engeneering team but not sure when (or even if) it will be available in CDH.

I have not tested the latest CDH version. So I don't know if it was integrated or not yet.

 

http://community.cloudera.com/t5/Cloudera-Search-Apache-SolrCloud/solrctl-issue-implentation-issue/m...

 

By the way, I am "mathieu.d" if you were wondering (I changed my account).

 

Announcements