About zack_riesland

zack_riesland · ‎03-14-2016

Makes sense. Thanks!

zack_riesland · ‎03-14-2016

Thanks @Jitendra Yadav, All interesting suggestions, though I haven't been able to chase any of them to a root cause or solution so far.

zack_riesland · ‎03-14-2016

I'm trying to clean up some of the older nodes on our cluster. Most of our new data nodes have only 4 components: DataNode, RegionServer, MetricsMonitor, and NodeManager. I *think* I understand the purpose of each of these. However, many of our older data nodes have as many as 15 components, including DataNode, RegionServer, MetricsMonitor, NodeManager PLUS: HCat Client, HDFS Client, HiveClient, MapReduce2 Client, Oozie Client, Tez Client, Pig, Sqoop, YARN Client, Zookeeper Client. Can someone please point me to some documentation for the purpose of each of these clients? Will a data node still be leveraged for a Tez query if it doesn't have a Tez Client? How is the YARN client different from a NodeManager? What is a zookeeper client? And most importantly, how can I safely remove the unnecessary elements?

zack_riesland · ‎03-10-2016

Our cluster suddenly got very slow, for no evident reason. A couple of our nodes seem to be the bottleneck, but it isn't clear why. They are not swapping, and THP is setup correctly. And we just rebalanced everything, so skew should not be an issue. When I look at the logs for slow queries, I see a lot of this. I'm not totally sure how to interpret what I see. "Slow ReadProcessor read fields took 165073ms" Does this mean that HDFS took almost 3 minutes to read a block of data? If so, does anyone know why this might be happening? 2016-03-09 12:30:22,269 ERROR [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Container complete event for unknown container id container_e30_1457112266933_0219_01_000303 2016-03-09 12:30:40,362 ERROR [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Container complete event for unknown container id container_e30_1457112266933_0219_01_000368 2016-03-09 12:30:40,362 ERROR [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Container complete event for unknown container id container_e30_1457112266933_0219_01_000367 2016-03-09 12:30:50,415 ERROR [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Container complete event for unknown container id container_e30_1457112266933_0219_01_000394 2016-03-09 12:30:50,415 ERROR [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Container complete event for unknown container id container_e30_1457112266933_0219_01_000393 2016-03-09 12:30:54,437 ERROR [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Container complete event for unknown container id container_e30_1457112266933_0219_01_000405 2016-03-09 12:31:00,467 ERROR [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Container complete event for unknown container id container_e30_1457112266933_0219_01_000418 2016-03-09 12:46:13,052 ERROR [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Container complete event for unknown container id container_e30_1457112266933_0219_01_000881 2016-03-09 12:49:32,997 WARN [ResponseProcessor for block BP-1450007529-AA.BB.CC.QQ-1415122377411:blk_1097162958_23469355] org.apache.hadoop.hdfs.DFSClient: Slow ReadProcessor read fields took 76652ms (threshold=30000ms); ack: seqno: 1587 status: SUCCESS status: SUCCESS status: SUCCESS downstreamAckTimeNanos: 551441, targets: [AA.BB.CC.DD:50010, AA.BB.CC.XX:50010, AA.BB.CC.ZZ:50010] 2016-03-09 12:50:56,503 WARN [ResponseProcessor for block BP-1450007529-AA.BB.CC.QQ-1415122377411:blk_1097162958_23469355] org.apache.hadoop.hdfs.DFSClient: Slow ReadProcessor read fields took 73728ms (threshold=30000ms); ack: seqno: 1593 status: SUCCESS status: SUCCESS status: SUCCESS downstreamAckTimeNanos: 650743, targets: [AA.BB.CC.DD:50010, AA.BB.CC.XX:50010, AA.BB.CC.ZZ:50010] 2016-03-09 12:53:07,327 WARN [ResponseProcessor for block BP-1450007529-AA.BB.CC.QQ-1415122377411:blk_1097162958_23469355] org.apache.hadoop.hdfs.DFSClient: Slow ReadProcessor read fields took 76607ms (threshold=30000ms); ack: seqno: 1601 status: SUCCESS status: SUCCESS status: SUCCESS downstreamAckTimeNanos: 518370, targets: [AA.BB.CC.DD:50010, AA.BB.CC.XX:50010, AA.BB.CC.ZZ:50010] 2016-03-09 12:54:59,399 WARN [ResponseProcessor for block BP-1450007529-AA.BB.CC.QQ-1415122377411:blk_1097162958_23469355] org.apache.hadoop.hdfs.DFSClient: Slow ReadProcessor read fields took 112072ms (threshold=30000ms); ack: seqno: 1603 status: SUCCESS status: SUCCESS status: SUCCESS downstreamAckTimeNanos: 1093642, targets: [AA.BB.CC.DD:50010, AA.BB.CC.XX:50010, AA.BB.CC.ZZ:50010] 2016-03-09 12:57:44,473 WARN [ResponseProcessor for block BP-1450007529-AA.BB.CC.QQ-1415122377411:blk_1097162958_23469355] org.apache.hadoop.hdfs.DFSClient: Slow ReadProcessor read fields took 165073ms (threshold=30000ms); ack: seqno: 1605 status: SUCCESS status: SUCCESS status: SUCCESS downstreamAckTimeNanos: 682741, targets: [AA.BB.CC.DD:50010, AA.BB.CC.XX:50010, AA.BB.CC.ZZ:50010] 2016-03-09 13:01:42,644 WARN [ResponseProcessor for block BP-1450007529-AA.BB.CC.QQ-1415122377411:blk_1097162958_23469355] org.apache.hadoop.hdfs.DFSClient: Slow ReadProcessor read fields took 64163ms (threshold=30000ms); ack: seqno: 2005 status: SUCCESS status: SUCCESS status: SUCCESS downstreamAckTimeNanos: 455684, targets: [AA.BB.CC.DD:50010, AA.BB.CC.XX:50010, AA.BB.CC.ZZ:50010] 2016-03-09 13:17:21,104 WARN [ResponseProcessor for block BP-1450007529-AA.BB.CC.QQ-1415122377411:blk_1097162958_23469355] org.apache.hadoop.hdfs.DFSClient: Slow ReadProcessor read fields took 57172ms (threshold=30000ms); ack: seqno: 3545 status: SUCCESS status: SUCCESS status: SUCCESS downstreamAckTimeNanos: 500979, targets: [AA.BB.CC.DD:50010, AA.BB.CC.XX:50010, AA.BB.CC.ZZ:50010] 2016-03-09 13:18:29,962 WARN [ResponseProcessor for block BP-1450007529-AA.BB.CC.QQ-1415122377411:blk_1097162958_23469355] org.apache.hadoop.hdfs.DFSClient: Slow ReadProcessor read fields took 68857ms (threshold=30000ms); ack: seqno: 3547 status: SUCCESS status: SUCCESS status: SUCCESS downstreamAckTimeNanos: 469307, targets: [AA.BB.CC.DD:50010, AA.BB.CC.XX:50010, AA.BB.CC.ZZ:50010] 2016-03-09 13:21:00,774 WARN [ResponseProcessor for block BP-1450007529-AA.BB.CC.QQ-1415122377411:blk_1097162958_23469355] org.apache.hadoop.hdfs.DFSClient: Slow ReadProcessor read fields took 43896ms (threshold=30000ms); ack: seqno: 3569 status: SUCCESS status: SUCCESS status: SUCCESS downstreamAckTimeNanos: 430916, targets: [AA.BB.CC.DD:50010, AA.BB.CC.XX:50010, AA.BB.CC.ZZ:50010] 2016-03-09 13:22:55,910 WARN [ResponseProcessor for block BP-1450007529-AA.BB.CC.QQ-1415122377411:blk_1097162958_23469355] org.apache.hadoop.hdfs.DFSClient: Slow ReadProcessor read fields took 88569ms (threshold=30000ms); ack: seqno: 3581 status: SUCCESS status: SUCCESS status: SUCCESS downstreamAckTimeNanos: 537758, targets: [AA.BB.CC.DD:50010, AA.BB.CC.XX:50010, AA.BB.CC.ZZ:50010] 2016-03-09 13:27:24,885 WARN [ResponseProcessor for block BP-1450007529-AA.BB.CC.QQ-1415122377411:blk_1097162958_23469355] org.apache.hadoop.hdfs.DFSClient: Slow ReadProcessor read fields took 108859ms (threshold=30000ms); ack: seqno: 3617 status: SUCCESS status: SUCCESS status: SUCCESS downstreamAckTimeNanos: 640831, targets: [AA.BB.CC.DD:50010, AA.BB.CC.XX:50010, AA.BB.CC.ZZ:50010] 2016-03-09 14:35:57,241 WARN [ResponseProcessor for block BP-1450007529-AA.BB.CC.QQ-1415122377411:blk_1097162958_23469355] org.apache.hadoop.hdfs.DFSClient: Slow ReadProcessor read fields took 512355ms (threshold=30000ms); ack: seqno: 3619 status: SUCCESS status: SUCCESS status: SUCCESS downstreamAckTimeNanos: 537965, targets: [AA.BB.CC.DD:50010, AA.BB.CC.XX:50010, AA.BB.CC.ZZ:50010] 2016-03-09 14:57:26,177 WARN [ResponseProcessor for block BP-1450007529-AA.BB.CC.QQ-1415122377411:blk_1097162958_23469355] org.apache.hadoop.hdfs.DFSClient: Slow ReadProcessor read fields took 568933ms (threshold=30000ms); ack: seqno: 3621 status: SUCCESS status: SUCCESS status: SUCCESS downstreamAckTimeNanos: 637309, targets: [AA.BB.CC.DD:50010, AA.BB.CC.XX:50010, AA.BB.CC.ZZ:50010]

zack_riesland · ‎03-04-2016

Finally have the balancer running fairly well. In the end, we were not able to get good results using the UI link from Ambari. Running via CLI with some in-line parameters is working well for us: hdfs balancer -Ddfs.balancer.movedWinWidth=5400000 -Ddfs.balancer.moverThreads=1000 -Ddfs.balancer.dispatcherThreads=200 -Ddfs.datanode.balance.bandwidthPerSec=100000000 -Ddfs.balancer.max-size-to-move=10737418240 -threshold 20 1>/tmp/balancer-out.log 2>/tmp/balancer-debug.log

zack_riesland · ‎02-26-2016

No orange circle. No indication to restart services. FYI.

zack_riesland · ‎02-26-2016

Thanks @rgangappa, that seems to be what I needed. @Artem Ervits, that original instructions document probably needs this. Thanks!

zack_riesland · ‎02-26-2016

We are on ambari 2.1.1 We run HA, w/o Kerberos or any security. I restarted ambari server and then restarted all the monitors and it didn't help. I don't know about firewalls on our server. Where would I look for this? Thanks!

zack_riesland · ‎02-26-2016

I moved our ambari metrics server from a data node to an edge node. I followed all of the directions outlined here: http://docs.hortonworks.com/HDPDocuments/Ambari-2.2.0.0/bk_ambari_reference_guide/content/ch_moving_the_ambari_metrics_collector.html Now, ambari metrics kind of 'works', but not completely. The main Ambari dashboard looks good - all of the metrics are populated. But if I drill into HDFS, YARN, HBase, etc - only some of the metrics are there. Many say "No data available". I tried re-starting ambari server, but no luck. I tried ctrl+shift+r to force a clean refresh of the GUI, and that didn't help either. I know that all the nodes are pointing to the right place in /etc/ambari-metrics-monitor/conf/metric_monitor.ini Where can I look to figure this out?

zack_riesland · ‎02-26-2016

Thanks. FYI - there's an extra space character in "- i" on both of the curl commands

Online	Offline
Last Visited	‎06-10-2019 05:13 PM

Member Since	‎02-04-2016 01:07 PM
Last Visited	‎06-10-2019 05:13 PM
Posts	189
Kudos received	70

Cloudera Community

Re: Help with spark partition syntax (scala)

Re: Can I control naming patterns for HDFS chunks

Re: How to connect to Spark2 Thrift Server via JDB...

Re: Hive: Convert int timestamp to date

Re: How to clear temp data from dataflow / nifi?

Re: How to remove clients from data node

Re: Solution for "slow readprocessor" warnings

How to remove clients from data node

Solution for "slow readprocessor" warnings

Re: Help with exception from HDFS balancer

Re: Help after moving Ambari metrics server

Re: Help after moving Ambari metrics server

Re: Help after moving Ambari metrics server

Help after moving Ambari metrics server

Re: What are the steps for moving a zookeeper serv...