Member since
03-14-2014
4
Posts
0
Kudos Received
0
Solutions
12-24-2014
05:15 AM
Hi Guatam, Yes we run balancer on regular basis but seems we are hitting this bug. We have plans to upgrade CM stack but is the current issue related to balancer bugs? Is there some relation between skewed balancer and web metrics alerts? Thanks Sergey
... View more
12-24-2014
04:20 AM
Yes one of my idea is about skewed data usage across datanodes. I explored the data usage of nodes and noticed that those workers which triggers alerts have more block usage bellow is comparison of sane nodes with the alerting ones sane group Capacity Used Non DFS Used Remaining Blocks Block pool used 14.21 TB 1.64 TB 664.86 GB 11.92 TB 127220 1.64 TB (11.55%) 14.21 TB 6.14 TB 666.38 GB 7.42 TB 639918 6.14 TB (43.23%) 14.21 TB 4.99 TB 665.79 GB 8.57 TB 465164 4.99 TB (35.11%) 14.21 TB 7.06 TB 666.4 GB 6.49 TB 795556 7.06 TB (49.71%) 14.21 TB 4.74 TB 665.74 GB 8.82 TB 445655 4.74 TB (33.35%) 14.21 TB 7.95 TB 666.13 GB 5.61 TB 907730 7.95 TB (55.96%) 14.21 TB 6.13 TB 666.08 GB 7.43 TB 640631 6.13 TB (43.12%) group with issues Capacity Used Non DFS Used Remaining Blocks Block pool used 10.65 TB 8.96 TB 500.07 GB 1.2 TB 1175053 8.96 TB (84.13%) 10.65 TB 8.57 TB 499.76 GB 1.59 TB 1136687 8.57 TB (80.51%) 14.21 TB 8.94 TB 666.97 GB 4.62 TB 1209608 8.94 TB (62.89%) 10.65 TB 8.65 TB 500.16 GB 1.5 TB 1133144 8.65 TB (81.28%) 14.21 TB 8.98 TB 665.07 GB 4.58 TB 1225707 8.98 TB (63.19%) 10.65 TB 8.62 TB 499.82 GB 1.54 TB 1168257 8.62 TB (80.98%) 10.65 TB 8.94 TB 499.75 GB 1.22 TB 1172198 8.94 TB (83.98%) Notable that the ill ones have more blocks in the pool. Heap size for DataNode Default Group - 1gb
... View more
12-24-2014
01:12 AM
Dear all, Version: Cloudera Express 5.0.2 3 master nodes 15 workers Problem: "The health test result for DATA_NODE_WEB_METRIC_COLLECTION has become bad: The Cloudera Manager Agent is not able to communicate with this role's web server." When above alert pops up such record were noticed in datanode logs: "INFO org.apache.hadoop.util.JvmPauseMonitor: Detected pause in JVM or host machine (eg GC): pause of approximately 3121ms" Alerts are throwing from specific group of datanodes, not from all. What can be the problem here? Thanks in advance Sergey
... View more
Labels:
- Labels:
-
Apache Hadoop
-
Cloudera Manager