Created 10-07-2016 02:43 PM
I have a cluster with over 300 nodes. We have HA NN and three JN. I've always had the NN and JN on different hosts but recently got a suggestion to co-locate them, or at least co-locate two JN with NN, and the other JN on an unrelated host. I have co-located them in small clusters in the past, and in our dev environment they are co-located. What's the best practice?
Created 10-07-2016 03:04 PM
The journal nodes can be quite IO intensive, while the Namenode is generally more memory and CPU intensive. So one could justify co-locating them. BUT, when it comes to checkpointing, they could conflict. More importantly, delays in writing for the journal node will impact the namenode and result in higher RPC Queue times.
With a cluster that size, I would always want to run the namenode by itself. It's far too important to compromise it by co-locating it with another highly active service.
And regarding the Journal Node, don't store the journal directories on an LVM that's shared with the OS. Again, the Journal Node is quite IO intensive. And I've seen it project slowness back to the Namenode (in RPC queue times) when they are competing with the OS because they are sharing the same physical disks.
Created 10-07-2016 03:04 PM
The journal nodes can be quite IO intensive, while the Namenode is generally more memory and CPU intensive. So one could justify co-locating them. BUT, when it comes to checkpointing, they could conflict. More importantly, delays in writing for the journal node will impact the namenode and result in higher RPC Queue times.
With a cluster that size, I would always want to run the namenode by itself. It's far too important to compromise it by co-locating it with another highly active service.
And regarding the Journal Node, don't store the journal directories on an LVM that's shared with the OS. Again, the Journal Node is quite IO intensive. And I've seen it project slowness back to the Namenode (in RPC queue times) when they are competing with the OS because they are sharing the same physical disks.
Created 10-07-2016 06:42 PM
> or at least co-locate two JN with NN, and the other JN on an unrelated host.
That is definitely bad advice. We have 3 JNs so that we get high availability by writing to a majority. However if we keep 2 JNs in the same machine, then you will lose both at the same time. That is what we want to avoid in the first place.