Support Questions
Find answers, ask questions, and share your expertise
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Tasks take long only on one node

Tasks take long only on one node


I have a cluster with 8 worker nodes (DN,NM and RS). The dev team are running a MapReduce program using an Oozie workflow. This step in the workflow is a MR job to enter the data into HBase tables. There are basically two things that happen


1. Heavy load on had01 causes the Region Server to shut down. The other Region Servers are working fine but the issue only seems to be on this one. I see a lot of JVM pauses in the log (Non GC) and it loses connection to the ZooKeepers before shutting down.


2. In the case the RS doesn't shut down, I still see heavy load on this node (121.2, 97.3, 87.3) and the map tasks that run on this node take much much more longer than on other nodes.


Others nodes -> less than 2 mins

Had01 -> 7 + mins 


Other Observations:


1. Heavy I/O (700 MB/s - 2 GB/s)

2. Number of blocks on this node is twice when compared to the other 7 nodes.

3. HBase Web UI shows the Write Request Count for this node as 0 


Can someone point me where I can troubleshoot more? It only seems to happen when this step of the workflow is running. It comes back up anad is stable after this.

Don't have an account?
Coming from Hortonworks? Activate your account here