Reply
Explorer
Posts: 6
Registered: ‎11-27-2017

Impala Hadoop swap issues

[ Edited ]

Hi,

We have a CHD 5.8.3 cluster, each node has hdfs, yarn and Impalad processes.
We saw that we have red alerts on the Impala service on swap issues,
We did all the calculation with OS, YARN and HDFS process memory allocation per server so there will not be memory over allocation.
After it, we've found that Impala has JVM heap memory except for mem_limit so we set the max heap to 8G so it will be controlled (and remove 8G from the other components on the node).

But now we still see swap issues on Impala and other Hadoop services and our consumers complain about slowness and out of memory on Impala queries
Can you help us to figure why we have red alerts on Impala service in the Cloudera manager? What other reason can be for this?

Thanks

Expert Contributor
Posts: 152
Registered: ‎07-01-2015

Re: Impala Hadoop swap issues

You can and should disable swap at all. The older documentation suggested to disable (it is very bullet proof) but there is a risk, when memory runs out, something bad can happen. So in the lastest docs there is a recommendation to set swappiness to 1, so the system will swap out if it is really necessary.

But back to your calculations, did you considered also cache for OS? The typical mistake is that people just add together the memory of all running processes. If you want to really prevent swapping, buy more RAM and remove swap at all.
Even if you do not raise the memory for YARN/Impla, the whole cluster will benefit from that because of larger cache for data read/written to hard drives.
Explorer
Posts: 6
Registered: ‎11-27-2017

Re: Impala Hadoop swap issues

Hi Tomas79,

 

Right now we're on swappiness=1,

We don't want to set it to 0 and risk getting OOM so we set it low as possible.

 

We save in each server 8GB RAM for the OS and 1GB for HDFS and Yarn and Cloudera agent processes,

I guess it's enough right?

 

Another point besides this,

we have NUMA of 2 groups on each of our nodes, what is the best practice of Hadoop (Hive, Impala etc) about it? disable/enable it?

 

Thanks

Explorer
Posts: 6
Registered: ‎11-27-2017

Re: Impala Hadoop swap issues

Hi Guys,

anyone can shed a light on NUMA and Hadoop, should we disable/keep enabled?

Why is Impala process going to swap even though we defined mem_limit and max heap for its JVM?

 

Thanks for your help!

Posts: 455
Topics: 1
Kudos: 106
Solutions: 59
Registered: ‎04-22-2014

Re: Impala Hadoop swap issues

@hores,

 

I am not sure of Impala's stance on NUMA as it is out of my specialty area.  I can comment some on your other points, however.  Impala does not swap, the OS will swap out memory when needed as demands require if you have swappiness set to 1. In Cloudera Manager, you can view a chart of the host's memory usage.  Looking more closely at information regarding the memory pressure and I/O use on the OS (caused by all processes,  not just Impala) is a better way to guage what is happening here.   Swapping is a function of the kernel, so it is not aware of "Impala".  If you have swappiness set to "1" and you are still seeing swapping, that indicates the OS had demands put on it that required it to use swap to prevent OOM conditions.

 

So, in short, you are still seeing swapping since the OS deemed it necessary.  Looking more closely about the overall resource usage on the host or hosts in question is a good start.

Explorer
Posts: 6
Registered: ‎11-27-2017

Re: Impala Hadoop swap issues

Hi @bgooley and all,

 

We've put a script that reports every 2/3 seconds on the top memory process on the server and the OS memory and swap status.

 

From what I've seen there is still free and cached space in the OS when the OS is using swap,

it mostly happens when Impala memory is growing.

Maybe it's because Impala needs large continuous segment in the memory and can't have one or maybe Impala wants more memory than memory_limit + its JVM max heap size?

Is there something else I should notice in Hadoop (or more specific Impala) that needs memory and I've missed?

Are there parameters I should tweak and may help and why it helps?

 

Our users complain about slowness and out of memory errors on Impala queries they running this can be related to this? cause Cloudera showing the alerts of swap in red color so it seems important.

 

Thanks for your help,

Announcements