Member since
02-04-2016
189
Posts
70
Kudos Received
9
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
3746 | 07-12-2018 01:58 PM | |
7842 | 03-08-2018 10:44 AM | |
3754 | 06-24-2017 11:18 AM | |
23279 | 02-10-2017 04:54 PM | |
2287 | 01-19-2017 01:41 PM |
03-14-2016
07:02 PM
2 Kudos
Makes sense. Thanks!
... View more
03-14-2016
04:30 PM
1 Kudo
Thanks @Jitendra Yadav, All interesting suggestions, though I haven't been able to chase any of them to a root cause or solution so far.
... View more
03-14-2016
04:28 PM
3 Kudos
I'm trying to clean up some of the older nodes on our cluster. Most of our new data nodes have only 4 components: DataNode, RegionServer, MetricsMonitor, and NodeManager. I *think* I understand the purpose of each of these. However, many of our older data nodes have as many as 15 components, including DataNode, RegionServer, MetricsMonitor, NodeManager PLUS: HCat Client, HDFS Client, HiveClient, MapReduce2 Client, Oozie Client, Tez Client, Pig, Sqoop, YARN Client, Zookeeper Client. Can someone please point me to some documentation for the purpose of each of these clients? Will a data node still be leveraged for a Tez query if it doesn't have a Tez Client? How is the YARN client different from a NodeManager? What is a zookeeper client? And most importantly, how can I safely remove the unnecessary elements?
... View more
Labels:
03-10-2016
02:24 PM
4 Kudos
Our cluster suddenly got very slow, for no evident reason. A couple of our nodes seem to be the bottleneck, but it isn't clear why. They are not swapping, and THP is setup correctly. And we just rebalanced everything, so skew should not be an issue. When I look at the logs for slow queries, I see a lot of this. I'm not totally sure how to interpret what I see. "Slow ReadProcessor read fields took 165073ms" Does this mean that HDFS took almost 3 minutes to read a block of data? If so, does anyone know why this might be happening? 2016-03-09 12:30:22,269 ERROR [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Container complete event for unknown container id container_e30_1457112266933_0219_01_000303
2016-03-09 12:30:40,362 ERROR [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Container complete event for unknown container id container_e30_1457112266933_0219_01_000368
2016-03-09 12:30:40,362 ERROR [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Container complete event for unknown container id container_e30_1457112266933_0219_01_000367
2016-03-09 12:30:50,415 ERROR [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Container complete event for unknown container id container_e30_1457112266933_0219_01_000394
2016-03-09 12:30:50,415 ERROR [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Container complete event for unknown container id container_e30_1457112266933_0219_01_000393
2016-03-09 12:30:54,437 ERROR [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Container complete event for unknown container id container_e30_1457112266933_0219_01_000405
2016-03-09 12:31:00,467 ERROR [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Container complete event for unknown container id container_e30_1457112266933_0219_01_000418
2016-03-09 12:46:13,052 ERROR [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Container complete event for unknown container id container_e30_1457112266933_0219_01_000881
2016-03-09 12:49:32,997 WARN [ResponseProcessor for block BP-1450007529-AA.BB.CC.QQ-1415122377411:blk_1097162958_23469355] org.apache.hadoop.hdfs.DFSClient: Slow ReadProcessor read fields took 76652ms (threshold=30000ms); ack: seqno: 1587 status: SUCCESS status: SUCCESS status: SUCCESS downstreamAckTimeNanos: 551441, targets: [AA.BB.CC.DD:50010, AA.BB.CC.XX:50010, AA.BB.CC.ZZ:50010]
2016-03-09 12:50:56,503 WARN [ResponseProcessor for block BP-1450007529-AA.BB.CC.QQ-1415122377411:blk_1097162958_23469355] org.apache.hadoop.hdfs.DFSClient: Slow ReadProcessor read fields took 73728ms (threshold=30000ms); ack: seqno: 1593 status: SUCCESS status: SUCCESS status: SUCCESS downstreamAckTimeNanos: 650743, targets: [AA.BB.CC.DD:50010, AA.BB.CC.XX:50010, AA.BB.CC.ZZ:50010]
2016-03-09 12:53:07,327 WARN [ResponseProcessor for block BP-1450007529-AA.BB.CC.QQ-1415122377411:blk_1097162958_23469355] org.apache.hadoop.hdfs.DFSClient: Slow ReadProcessor read fields took 76607ms (threshold=30000ms); ack: seqno: 1601 status: SUCCESS status: SUCCESS status: SUCCESS downstreamAckTimeNanos: 518370, targets: [AA.BB.CC.DD:50010, AA.BB.CC.XX:50010, AA.BB.CC.ZZ:50010]
2016-03-09 12:54:59,399 WARN [ResponseProcessor for block BP-1450007529-AA.BB.CC.QQ-1415122377411:blk_1097162958_23469355] org.apache.hadoop.hdfs.DFSClient: Slow ReadProcessor read fields took 112072ms (threshold=30000ms); ack: seqno: 1603 status: SUCCESS status: SUCCESS status: SUCCESS downstreamAckTimeNanos: 1093642, targets: [AA.BB.CC.DD:50010, AA.BB.CC.XX:50010, AA.BB.CC.ZZ:50010]
2016-03-09 12:57:44,473 WARN [ResponseProcessor for block BP-1450007529-AA.BB.CC.QQ-1415122377411:blk_1097162958_23469355] org.apache.hadoop.hdfs.DFSClient: Slow ReadProcessor read fields took 165073ms (threshold=30000ms); ack: seqno: 1605 status: SUCCESS status: SUCCESS status: SUCCESS downstreamAckTimeNanos: 682741, targets: [AA.BB.CC.DD:50010, AA.BB.CC.XX:50010, AA.BB.CC.ZZ:50010]
2016-03-09 13:01:42,644 WARN [ResponseProcessor for block BP-1450007529-AA.BB.CC.QQ-1415122377411:blk_1097162958_23469355] org.apache.hadoop.hdfs.DFSClient: Slow ReadProcessor read fields took 64163ms (threshold=30000ms); ack: seqno: 2005 status: SUCCESS status: SUCCESS status: SUCCESS downstreamAckTimeNanos: 455684, targets: [AA.BB.CC.DD:50010, AA.BB.CC.XX:50010, AA.BB.CC.ZZ:50010]
2016-03-09 13:17:21,104 WARN [ResponseProcessor for block BP-1450007529-AA.BB.CC.QQ-1415122377411:blk_1097162958_23469355] org.apache.hadoop.hdfs.DFSClient: Slow ReadProcessor read fields took 57172ms (threshold=30000ms); ack: seqno: 3545 status: SUCCESS status: SUCCESS status: SUCCESS downstreamAckTimeNanos: 500979, targets: [AA.BB.CC.DD:50010, AA.BB.CC.XX:50010, AA.BB.CC.ZZ:50010]
2016-03-09 13:18:29,962 WARN [ResponseProcessor for block BP-1450007529-AA.BB.CC.QQ-1415122377411:blk_1097162958_23469355] org.apache.hadoop.hdfs.DFSClient: Slow ReadProcessor read fields took 68857ms (threshold=30000ms); ack: seqno: 3547 status: SUCCESS status: SUCCESS status: SUCCESS downstreamAckTimeNanos: 469307, targets: [AA.BB.CC.DD:50010, AA.BB.CC.XX:50010, AA.BB.CC.ZZ:50010]
2016-03-09 13:21:00,774 WARN [ResponseProcessor for block BP-1450007529-AA.BB.CC.QQ-1415122377411:blk_1097162958_23469355] org.apache.hadoop.hdfs.DFSClient: Slow ReadProcessor read fields took 43896ms (threshold=30000ms); ack: seqno: 3569 status: SUCCESS status: SUCCESS status: SUCCESS downstreamAckTimeNanos: 430916, targets: [AA.BB.CC.DD:50010, AA.BB.CC.XX:50010, AA.BB.CC.ZZ:50010]
2016-03-09 13:22:55,910 WARN [ResponseProcessor for block BP-1450007529-AA.BB.CC.QQ-1415122377411:blk_1097162958_23469355] org.apache.hadoop.hdfs.DFSClient: Slow ReadProcessor read fields took 88569ms (threshold=30000ms); ack: seqno: 3581 status: SUCCESS status: SUCCESS status: SUCCESS downstreamAckTimeNanos: 537758, targets: [AA.BB.CC.DD:50010, AA.BB.CC.XX:50010, AA.BB.CC.ZZ:50010]
2016-03-09 13:27:24,885 WARN [ResponseProcessor for block BP-1450007529-AA.BB.CC.QQ-1415122377411:blk_1097162958_23469355] org.apache.hadoop.hdfs.DFSClient: Slow ReadProcessor read fields took 108859ms (threshold=30000ms); ack: seqno: 3617 status: SUCCESS status: SUCCESS status: SUCCESS downstreamAckTimeNanos: 640831, targets: [AA.BB.CC.DD:50010, AA.BB.CC.XX:50010, AA.BB.CC.ZZ:50010]
2016-03-09 14:35:57,241 WARN [ResponseProcessor for block BP-1450007529-AA.BB.CC.QQ-1415122377411:blk_1097162958_23469355] org.apache.hadoop.hdfs.DFSClient: Slow ReadProcessor read fields took 512355ms (threshold=30000ms); ack: seqno: 3619 status: SUCCESS status: SUCCESS status: SUCCESS downstreamAckTimeNanos: 537965, targets: [AA.BB.CC.DD:50010, AA.BB.CC.XX:50010, AA.BB.CC.ZZ:50010]
2016-03-09 14:57:26,177 WARN [ResponseProcessor for block BP-1450007529-AA.BB.CC.QQ-1415122377411:blk_1097162958_23469355] org.apache.hadoop.hdfs.DFSClient: Slow ReadProcessor read fields took 568933ms (threshold=30000ms); ack: seqno: 3621 status: SUCCESS status: SUCCESS status: SUCCESS downstreamAckTimeNanos: 637309, targets: [AA.BB.CC.DD:50010, AA.BB.CC.XX:50010, AA.BB.CC.ZZ:50010]
... View more
Labels:
- Labels:
-
Apache Hadoop
03-04-2016
09:03 PM
1 Kudo
Finally have the balancer running fairly well. In the end, we were not able to get good results using the UI link from Ambari. Running via CLI with some in-line parameters is working well for us: hdfs balancer -Ddfs.balancer.movedWinWidth=5400000 -Ddfs.balancer.moverThreads=1000 -Ddfs.balancer.dispatcherThreads=200 -Ddfs.datanode.balance.bandwidthPerSec=100000000 -Ddfs.balancer.max-size-to-move=10737418240 -threshold 20 1>/tmp/balancer-out.log 2>/tmp/balancer-debug.log
... View more
02-26-2016
07:51 PM
1 Kudo
No orange circle. No indication to restart services. FYI.
... View more
02-26-2016
07:47 PM
1 Kudo
Thanks @rgangappa, that seems to be what I needed. @Artem Ervits, that original instructions document probably needs this. Thanks!
... View more
02-26-2016
07:31 PM
1 Kudo
We are on ambari 2.1.1 We run HA, w/o Kerberos or any security. I restarted ambari server and then restarted all the monitors and it didn't help. I don't know about firewalls on our server. Where would I look for this? Thanks!
... View more
02-26-2016
07:03 PM
2 Kudos
I moved our ambari metrics server from a data node to an edge node. I followed all of the directions outlined here: http://docs.hortonworks.com/HDPDocuments/Ambari-2.2.0.0/bk_ambari_reference_guide/content/ch_moving_the_ambari_metrics_collector.html Now, ambari metrics kind of 'works', but not completely. The main Ambari dashboard looks good - all of the metrics are populated. But if I drill into HDFS, YARN, HBase, etc - only some of the metrics are there. Many say "No data available". I tried re-starting ambari server, but no luck. I tried ctrl+shift+r to force a clean refresh of the GUI, and that didn't help either. I know that all the nodes are pointing to the right place in /etc/ambari-metrics-monitor/conf/metric_monitor.ini Where can I look to figure this out?
... View more
Labels:
- Labels:
-
Apache Ambari
02-26-2016
03:46 PM
1 Kudo
Thanks. FYI - there's an extra space character in "- i" on both of the curl commands
... View more