Support Questions

joe_harvyy · ‎10-01-2017

Hello,

There are several tools in HDP that doesn't use Yarn (Storm, HBase, etc). If I have OS, HBase, Storm and other tools taking resources on my cluster, how does Yarn know how much resources he owns for its application?

What are the best practices for multi-tenancy and isolation in this case?

How can I isolate IO with Yarn? is this something coming in future versions?

Thanks

bkosaraju · ‎10-02-2017

Hi @Joe Harvy,

Yarn/Other tenent Application not aware of any of the other tenents resource usage, this will be much bigger problem when there is swap defined, as the OS Terminates(technically "sacrifice" ) one of the process based out of age and amount of resources free up for the sacrifice.

So it become much critical to organize the applications in a multi tenant Environment.

there are multiple things needs to be considered while managing these kind of environments, such as memory CPU and Disk bottlenecks.

Memory Usage :

Interns of the Memory usage, we need to subtract the each component's maximum Heap allocation (-xmx ) and add additional resources such as 2G- for OS, 2GB -For DataNode, 2GB - Ambari Metrics etc then

for HBASE additional BucketCache(off heap) + Region Server Heap Size, and similar for Accumulo and Storm etc ..

After all subtracted from total memory, remaining can be allocated for Yarn, example of this has been well documented at HBASE cache configuration Here

CPU usage :

This is Bit tricky as, Configuration of this value upfront may not be straight forward. need to compute the SAR / Ambari Metrics information, with respect to CPU usage and allocate the remaining CPU for the Yarn.

At the same time verify the load average on the host, should not be exceed too high, in cases that should be controlled with amount of parallel work happening form apps/YARN according to the priority. - this is where yarn scheduler comes handy.

Disk Usage :

Have a keen eye on CPU wait IO, any of the increase in that value cased by the low disk latency, better option is not share the disk for multiple purposes ( ex : for data nodes other application activities ), will result in queuing up the resources.

Hope this helps!!

View solution in original post

bkosaraju · ‎10-02-2017